Data Scientist's Diary

Sunday, December 10, 2017

NIPS 2017

NIPS (Conference on Neural Information Processing Systems) is a top conference in machine learning. I visited NIPS 2016 in Barcelona and it was a really awesome experience. This year it was in the USA (4/12 - 9/12).
The demand for the machine learning conference growths dramatically. Here is a funny picture of
Alex Lebrun‏@lxbrun with a title "NIPS conference registrations, 2002 through 2017":

This year conference had 6000 attendees. Visitors need to stay in the registration queue for approx. 2 hours. But, maybe, thanks to that we have the ability to see the video from NIPS on the official NIPS FB page: https://www.facebook.com/nipsfoundation/.

I was really impressed by Peter Abeel keynote:

You can find more video here: https://www.facebook.com/nipsfoundation/ and NIPS papers here: https://papers.nips.cc/
Also, here is a github repo with a lot of materials: https://github.com/hindupuravinash/nips2017/

Wednesday, November 29, 2017

Model results explanation

Explaining the prediction of your model is a really crucial thing. Previously, it was a trade-off between accuracy and interpretability

But, now you can use LIME, explanation technique proposed by Ribeiro and al. at 2016. This technique learns interpretable model around the prediction. More details are in the original paper: https://arxiv.org/abs/1602.04938.

Sunday, August 28, 2016

According to Class Central investigation, Johns Hopkins Data Science specialisation (which cover basic data science flow parts with applications in R) generated ~3.5M dollars (between April 2014 and February 2015).

Also, very interesting numbers:

• 1.76 million course sign-ups
• 71,589 Signature Track verified certificates were awarded
• 917 students completed all 9 courses and signed up for the first capstone course
• 478 students successfully completed the first capstone course

Tuesday, June 28, 2016

R interface for Apache Spark

Great tool, which allows you to use advantages of Apache Spark and R functionality (with RStudio).

http://spark.rstudio.com/index.html

Friday, May 13, 2016

Distributed Word Representation

Today we will talk about the main "building block" in deep learning application for NLP - vectors.
Every part - phoneme, word, sub-sentence, sentence, even the whole document could be represented as a vector. I found it really cool.
How to get this representation?
The most straightforward way is to build a word-documents matrix. This matrix will be sparse, so the next step should be a dimensionality reduction (e.g SVD).
The main problem here is an expensive computation (computation cost scales quadratically for n x m matrix O(m x n x n) when (n < m))
Another approach is to learn vector representation directly from the data. This algorithm (named word2vec) was suggested in 2013 by Mikolov. Actually, word2vec is a two algorithms: CBOW(continuous bag of words) and Skip-Gram. In CBOW you are predicting the word, based on words before and after. In Skip-Gram the task is opposite - context prediction based on words.

With this approach, you can very quickly learn words representation(e.g words representation for all words in English wiki (~80 GB unzipped texts) could be learnt in ~ 10 hours with office laptop).
You could directly measure the similarity between result vectors (and get a similarity between words context e.g. 'stock market' = 'thermometer', with similarity equal to 0.72). Also, you could use the vectors as building blocks for more complex neural nets.
This approach unlocks really cool new operations, like adding or subtraction word representations which look like adding or subtraction context of words.

Or even cooler:
Iraq - Violence = Jordan
President - Power = Prime Minister
Guys from Instagram applied this technique for obtaining meanings of emoji.
Example:

Interested in this topic? You can read more here:
Mikolov original paper:
http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Instagram Engineering Blog:
http://instagram-engineering.tumblr.com/post/117889701472/emojineering-part-1-machine-learning-for-emoji
Cool examples (I used them above) http://byterot.blogspot.co.uk/2015/06/five-crazy-abstractions-my-deep-learning-word2doc-model-just-did-NLP-gensim.html

Friday, May 6, 2016

Deep Learning for Visual Question Answering

Today I found a great article about some specific type of question answering. A picture worth a thousand words:

[picture from the original article]

It is not a big surprise - technically it as all about LSTM. Enjoy reading: http://avisingh599.github.io/deeplearning/visual-qa/

Tuesday, April 19, 2016

Natural Language Processing. Brief intro

For the last year, I'm working with Natural Language Processing (mostly with Deep Learning). And I've decided to write a set of blog posts with the description of the most trend ideas in the field. So, let's start from the very beginning.

Natural Language Processing is a field at the intersection of computer science, Artificial Intelligence and linguistic. The main goal of NLP is to "understand" natural language in order to perform some useful tasks, like question answering.

Some examples of NLP applications:

Spell checking, keyword search, finding synonyms
Extracting information from websites such as time, product price, dates, location, people or company names
Classifying texts
Texts summarisation
Finding similar texts
Sentimental analysis
Machine translation
Search
Spoken dialog systems
Complex query answering
Speech recognition

Texts could be analyzed on different levels: phonemes, morphemes, words, sub-sentences, sentences, paragraphs and whole documents.

From linguistic point of view, analysis could be done on these levels:

Syntax (what is grammatical)
Semantic (what does it mean)
Pragmatics(what does it do)

There are a lot of smart algorithms, which were developed for various tasks:

Hidden Markov Models(for speech recognition)
Conditional Random Fields (for part of speech tagging)
Latent Dirichlet Allocation (for topic modeling)

NLP is hard. First of all, because of:

ambiguity - more than one possible(precise) interpretation (e.g. "Foreigners are hunting dogs"),
vagueness - does not specify full information
uncertainty - due to imperfect statistical mod

In mid-2010 Neural Nets become successful in NLP. Why did it happen?

I'll describe the main ideas of deep learning techniques for NLP in the next post :)