Friday, April 17, 2015

Words similarity

Finding words (sentences, documents) with the same meaning is general problem for NLP (Natural Language Processing). Deep learning helps improve this field of science.
For example, word2vec approach helps you derive from text corpus some things with relationship like "man to king" as "women to ?". And "?" should be replaced by "queen". It's amazing stuff. In addition, you can train not just similarity word-to-word but also word-to-sequence of words.
Here is some examples from model  which were trained on  Google News corpus:






































Paper with description:
Distributed Representations of Words and Phrases and their Compositionality"

Open-source implemenatation.
https://code.google.com/p/word2vec/

Tuesday, April 7, 2015

Thursday, April 2, 2015

Spark for Data Science

In June you can learn with EdX "how to apply data science techniques using parallel programming in Apache Spark to explore big (and small) data". It will be "Introduction to Big Data with Apache Spark" course from Berkeley.