Friday, December 26, 2014

Visualizing Walking Using Smartphone Accelerometers

Really cool visualization based on UCI Machine Learning Repository datasets:

Friday, December 12, 2014

Algorithm choosing

We have a lot of algorithms. 
Great schema, which working not only for scikit, provide a simple decision process:

Image source

Friday, November 28, 2014

Helping Santa's Helpers Kaggle Competition

You can help elves in Santa Workshop pack toys in most efficient way and win $20,000.
Only 40 days left!
More about  compettition:

Friday, November 21, 2014

What is Spark?

Here you can read a bit old, but great article from IBM Developers.

Wednesday, November 12, 2014

Plotting multiple graphs on one page

R + ggplot2 is my favourite tools for building plots.
Today I need to have few graphs on one page.
Solution was found.

You can easily build plots, like that:

Wednesday, November 5, 2014

Coursera: Mining Massive Datasets

Extremely useful course for data scientist  - Mining Massive Datasets by by Jure Leskovec, Anand Rajaraman and Jeff Ullman.

Course Syllabus

Week 1:
Link Analysis -- PageRank

Week 2:
Locality-Sensitive Hashing -- Basics + Applications
Distance Measures
Nearest Neighbors
Frequent Itemsets

Week 3:
Data Stream Mining
Analysis of Large Graphs

Week 4:
Recommender Systems
Dimensionality Reduction

Week 5:
Computational Advertising

Week 6:
Support-Vector Machines
Decision Trees
MapReduce Algorithms

Week 7:
More About Link Analysis --  Topic-specific PageRank, Link Spam.
More About Locality-Sensitive Hashing

In addition, you can buy or download for free Mining Massive Datasets book from Mining Massive Datasets web-page .

Friday, October 17, 2014

Updating R on Ubuntu

[Tested on Ubuntu 14.04 and R 3.1.1, old R version 3.0.2]

 sudo add-apt-repository ppa:marutter/rrutter  
 sudo apt-get update  
 sudo apt-get upgrade  
 sudo apt-get install r-base r-base-dev  

Thursday, September 25, 2014

Wednesday, September 24, 2014

Monday, September 22, 2014

"Hello world" post

Hello, my dear diary.
Here I am going to describe my daily routine as a data scientist.