Natural Language Processing is a field at the intersection of computer science, Artificial Intelligence and linguistic. The main goal of NLP is to "understand" natural language in order to perform some useful tasks, like question answering.
Some examples of NLP applications:
- Spell checking, keyword search, finding synonyms
- Extracting information from websites such as time, product price, dates, location, people or company names
- Classifying texts
- Texts summarisation
- Finding similar texts
- Sentimental analysis
- Machine translation
- Search
- Spoken dialog systems
- Complex query answering
- Speech recognition
Texts could be analyzed on different levels: phonemes, morphemes, words, sub-sentences, sentences, paragraphs and whole documents.
From linguistic point of view, analysis could be done on these levels:
- Syntax (what is grammatical)
- Semantic (what does it mean)
- Pragmatics(what does it do)
There are a lot of smart algorithms, which were developed for various tasks:
- Hidden Markov Models(for speech recognition)
- Conditional Random Fields (for part of speech tagging)
- Latent Dirichlet Allocation (for topic modeling)
NLP is hard. First of all, because of:
- ambiguity - more than one possible(precise) interpretation (e.g. "Foreigners are hunting dogs"),
- vagueness - does not specify full information
- uncertainty - due to imperfect statistical mod
In mid-2010 Neural Nets become successful in NLP. Why did it happen?
I'll describe the main ideas of deep learning techniques for NLP in the next post :)