Thursday, October 29, 2015

Search query understanding

Nice presentation about search queries:
And amazing article about search experience optimization.

Wednesday, September 30, 2015

Memory Networks is on GitHub now

Great news! Facebook made Memory Networks project public. Memory Networks is the research project, which  implemented kind of human long-term memory.  
Talk about Memory Networks:



Here you can find the link to github project: https://github.com/facebook/MemNN

Thursday, July 16, 2015

ICML 2015 Word Cloud

Nice visualisation from Andrew Collier. It is word cloud of 300 most popular words from accepted ICML2015 papers.
http://www.exegetic.biz/blog/wp-content/uploads/2015/07/word-cloud.png
Methodology of this word cloud generation: http://www.exegetic.biz/blog/2015/07/constructing-word-cloud-for-icml-2015/
List of presented papers can be found here: http://icml.cc/2015/?page_id=825

Tuesday, July 14, 2015

Recommendation papers from ArXiv

Sometimes you found some idea and wondering, why you are not implemented it earlier.
The idea:The arXiv is a repository of over 1 million preprints in physics, mathematics and computer science. So, it is possible to train recommender for papers and optimize searching process.
Here you can find full description: https://blog.lateral.io/2015/07/harvesting-research-arxiv/

And motivating image:


[source http://physicsbuzz.physicscentral.com/2012/08/risks-and-rewards-of-arxiv-reporting.html]

Monday, May 11, 2015

Emoji natural language processing

In Instagram Engineering Blog you can read about NLP techniques for discovering "context" of emoji
(like this one on the picture). They use word2vec for mapping every emoji to metric space and t-SNE as visualisation tool.
Full texts of articles:

Emojineering Part 1: Machine Learning for Emoji Trends

Emojineering Part 2: Implementing Hashtag Emoji
Emoji Wiki

Friday, April 17, 2015

Words similarity

Finding words (sentences, documents) with the same meaning is general problem for NLP (Natural Language Processing). Deep learning helps improve this field of science.
For example, word2vec approach helps you derive from text corpus some things with relationship like "man to king" as "women to ?". And "?" should be replaced by "queen". It's amazing stuff. In addition, you can train not just similarity word-to-word but also word-to-sequence of words.
Here is some examples from model  which were trained on  Google News corpus:






































Paper with description:
Distributed Representations of Words and Phrases and their Compositionality"

Open-source implemenatation.
https://code.google.com/p/word2vec/

Tuesday, April 7, 2015

Thursday, April 2, 2015

Spark for Data Science

In June you can learn with EdX "how to apply data science techniques using parallel programming in Apache Spark to explore big (and small) data". It will be "Introduction to Big Data with Apache Spark" course from Berkeley.

Friday, March 13, 2015

Tuesday, January 20, 2015

Process Mining: Data Science in Action by Coursera


This is  my short feedback for Coursera Process Mining course (by Wil van der Aalst from Eindhoven University of Technology)
Name of the course sounds very interesting, but the main task is quite simple.
It's about building behaviour model based on events log(you need to consider overfitting and underfitting). That's mean: your model should explain majority of cases and be general enough for explaining new cases.
Main tools, recommended in the course, are Disco and ProM. They allow building models according to different notations(e.g. BPMN) and making visualisations.
Two main aspects of process mining are organisational and social aspects:
Organisational aspects tasks:
  • discover typical workflow actions(for customers, employees, etc)
  • analyse of time spent for every tasks
  • "bottlenecks" mining
Social aspects tasks:
  • discover users groups and users relations within process
  • analysis of time spending for every worker, customer, etc
In addition, in both aspects you can recommend next steps or forecast time of completion future tasks.

Lectors slideshare: http://www.slideshare.net/wvdaalst
Next session: April-May 2015


Tuesday, January 13, 2015

Friday, January 9, 2015

Dive into Deep Learning

If you are interested in deep learning - you can try UFLDL (Unsupervised Feature Learning and Deep Learning) tutorial from Stanford University.
If this topic is really new for you - it is better to start from https://www.coursera.org/course/ml. After this session course will change format into self-study.