home

Natural language processing

Some interests

These are some of the topics that interest me in particular:
  1. Language interfaces: basically, the conversion of natural language input into semantics and the other way back. There are multiple issues involved in this, from parsing text to developing a comprehensive and manageable semantic representation to designing the best forms of interaction with the user to the selection of suppporting resources
  2. Automatic sense disambiguation: this issue, obviously, is closely related to 1). Here what I like discovering is how the network properties based on probabilistic and semantic features can help in deducing sense. Also I believe it is possible to build very robust systems that employ clues from multilingual resources.
  3. Parsing using hybrid methods
  4. Mining from unstructured data: one of the most challenging things here is to develop robust, highly scalable methods and tools for mining unstructured data. One of the subtopics here is dealing with completely unstructured data. Another is the intelligent use of meta-data attached to text, as in Wikipedia. Algorithms that intelligently use the network structure of hybrid data can provide a huge amount of implicit information.

You can find a list of some publications of mine here

I describe very briefly the concept of dynamic lexical relations here

General NLP publications I have found useful

Foundations of Statistical Natural Language Processing, by Manning & Schütze. ISBN: 978-0262133609 A very general and already old but still good reference to statistical approaches in NLP 1999
CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing, by Lunde. ISBN: 978-0596514471 A general introduction to processing CJK languages (plus Vietnamese) 2009
Statistical Machine Translation, by Philipp Koehn. ISBN: 978-0521874151 Excellent introduction into statistical approaches for MT 2010
Taming Text: How to Find, Organize, and Manipulate It, by Ingersoll & alia. ISBN: 978-1933988382 This gives some general references to work with some features from tools like Solr and other tools, less so on entity extraction 2013
Taming Text: How to Find, Organize, and Manipulate It, by Ingersoll & alia. ISBN: 978-1933988382 This gives some general references to work with some features from tools like Solr and other tools, less so on entity extraction 2013
Coli MIT The MIT CL journal Periodical

Some general online resources for NLP

Unicode All about those Unicode characters
IBM's ICU library The general library for low-level Unicode and globalization processing
UIMA IBM's framework for annotating unstructured content
SENSEVAL The standard for sense disambiguation


Andrés Domínguez Burgos, 2017 ©