Every part - phoneme, word, sub-sentence, sentence, even the whole document could be represented as a vector. I found it really cool.
How to get this representation?
The most straightforward way is to build a word-documents matrix. This matrix will be sparse, so the next step should be a dimensionality reduction (e.g SVD).
The main problem here is an expensive computation (computation cost scales quadratically for n x m matrix O(m x n x n) when (n < m))
Another approach is to learn vector representation directly from the data. This algorithm (named word2vec) was suggested in 2013 by Mikolov. Actually, word2vec is a two algorithms: CBOW(continuous bag of words) and Skip-Gram. In CBOW you are predicting the word, based on words before and after. In Skip-Gram the task is opposite - context prediction based on words.
With this approach, you can very quickly learn words representation(e.g words representation for all words in English wiki (~80 GB unzipped texts) could be learnt in ~ 10 hours with office laptop).
You could directly measure the similarity between result vectors (and get a similarity between words context e.g. 'stock market' = 'thermometer', with similarity equal to 0.72). Also, you could use the vectors as building blocks for more complex neural nets.
This approach unlocks really cool new operations, like adding or subtraction word representations which look like adding or subtraction context of words.
Iraq - Violence = Jordan
President - Power = Prime Minister
Guys from Instagram applied this technique for obtaining meanings of emoji.
Example:
Interested in this topic? You can read more here:
Mikolov original paper:
http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Instagram Engineering Blog:
http://instagram-engineering.tumblr.com/post/117889701472/emojineering-part-1-machine-learning-for-emoji
Cool examples (I used them above) http://byterot.blogspot.co.uk/2015/06/five-crazy-abstractions-my-deep-learning-word2doc-model-just-did-NLP-gensim.html
Very informative blog, thanks. Hadoop Big Data Classes in Pune
ReplyDeleteThis is good information and really helpful for the people who need information about this.
ReplyDeleteData Science Training in Delhi
Data Science Training institute in Delhi
At a high level, you can control all of these with extensive administrative controls accessible via a secure Web client.For more information visit
ReplyDeleteAWS training in chennai | AWS training in annanagar | AWS training in omr | AWS training in porur | AWS training in tambaram | AWS training in velachery
It's late finding this act. At least, it's a thing to be familiar with that there are such events exist. I agree with your Blog and I will be back to inspect it more in the future so please keep up your act.data science course
ReplyDeleteVery good points you wrote here..Great stuff...I think you've made some truly interesting points.Keep up the good work.data science course in Hyderabad
ReplyDelete"Thanks for the Information.Interesting stuff to read.Great Article.
ReplyDeleteI enjoyed reading your post, very nice share.data science training"
I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
ReplyDeleteData Science course in Hyderabad
Thankyou for this wondrous post, I am happy I watched this site on yippee.
ReplyDeletedata scientist training in hyderabad
Really awesome blog, Informative and knowledgeable content. Thanks for sharing this stuff with us. Keep sharing more and Thank you.
ReplyDeleteData Science Online Course in Hyderabad
betmatik
ReplyDeletekralbet
betpark
mobil ödeme bahis
tipobet
slot siteleri
kibris bahis siteleri
poker siteleri
bonus veren siteler
35RY
kütahya
ReplyDeletetunceli
ardahan
düzce
siirt
ZADHV
This blog is an insightful journey into the topic! The author's expertise shines through, making complex concepts easy to understand. The engaging writing style kept me hooked from the beginning to the end. Looking forward to more enlightening reads from this blog! data science course kochi
ReplyDeleteشركة مكافحة حشرات بالجبيل RxlgV6TBf1
ReplyDelete