Friday, May 13, 2016

Distributed Word Representation

Today we will talk about the main "building block" in deep learning application for NLP - vectors.
Every part - phoneme, word,  sub-sentence, sentence, even the whole document could be represented as a vector. I found it really cool.
How to get this representation?
The most straightforward way is to build a word-documents matrix. This matrix will be sparse, so the next step should be a dimensionality reduction (e.g SVD).
The main problem here is an expensive computation (computation cost scales quadratically for n x m matrix O(m x n x n) when (n < m))
Another approach is to learn vector representation directly from the data. This algorithm (named word2vec) was suggested in 2013 by Mikolov. Actually, word2vec is a two algorithms: CBOW(continuous bag of words) and Skip-Gram. In CBOW you are predicting the word, based on words before and  after. In Skip-Gram the task is opposite - context prediction based on words.

With this approach, you can very quickly learn words representation(e.g words representation for all words in English wiki (~80 GB unzipped texts) could be learnt in ~ 10 hours with office laptop).
You could directly measure  the similarity between  result vectors (and get a similarity between words context  e.g. 'stock market' = 'thermometer',  with similarity equal to 0.72). Also, you could use the vectors as building blocks for more complex neural nets.
This approach unlocks really cool new operations, like adding or subtraction word representations which look like  adding or subtraction context of words.

Or even cooler:
Iraq - Violence = Jordan
President - Power = Prime Minister
Guys from Instagram applied this technique for obtaining meanings of emoji.

Interested in this topic? You can read more here:
Mikolov original paper:
Instagram Engineering Blog:
Cool examples (I used them above)

Friday, May 6, 2016

Deep Learning for Visual Question Answering

Today I found a great article about some specific type of question answering. A picture worth a thousand words:

[picture from the original article]

 It is not a big surprise - technically it as all about LSTM. Enjoy reading: