With a deluge of machine learning resources both online and offline, a newbie in this field would simply get awestruck and might get stranded due to indecisiveness. There are people who are good at spotting what to read/follow and what not . Particularly, this post is for ML enthusiasts who are not able to find a good way to understand and use ML but this is what they had always wanted to wet their hands into.
[Hilary mason's video on ML ] for Hackers gives a great introductory feel of the ML area in 30 minutes
People who think a rigorous background of stochastic, optimization and linear algebra is utmost necessary to start with might not always be correct. Most important thing is to get started and the other mathematical fundamentals can be learnt on the fly. But, yes some prior knowledge might be helpful. A person cannot learn swimming unless he/she dives into the water, no matter how much you have read about swimming. Same analogy can be used here. But, one should be cautious in their approach. I have seen many of them having run away from ML for reasons like its just statistics, too much maths, etc. Some even get to learn things but do not know where to use it. These factors would essentially kill their enthusiasm.
Therefore, a good balance between theory and practical is necessary. One should try to apply the various ML stuffs learnt and once people start applying ML there are non-ending “WOWs”.
So, where does one start ?
I would recommend people to go through an advanced track of Andrew Ng’s online ML course on coursera(Andrew Ng’s online ML course on coursera) to begin with. It is fairly broad and its thorough. This course has a good balance between learning and its application. This would not only strengthen the basics but will also try to make you program and apply them.
The stanford CS229 course(stanford CS229 course) by Andrew Ng offers more depth and is much better for understanding the internals of ML.
Along with Andrew Ng’s course one also needs to work a bit on algebra and probability to take a bigger leap.
Another great set of video lectures is by Prof. Yaser S. Abu-Mostafa, from caltech. The course is titled Learning from Data. I personally consider this course superior than Andrew Ng’s course due to the content as well as Prof.’s approach towards ML.
(Mathematicalmonk’s channel on youtube) is another comprehensive resource on Machine learning. Along with the probability primer lectures, this really becomes very helpful in covering a broad range of topics with good mathematical fundamentals.
Other than these video resources, there are quite a few good introductory books on ML:
- One of my favorites is [PRML book by Bishop]
- [Tom Mitchell's book ] is another widely accepted book.
- More mathematical but a nice read is [Pattern Classification by Duda and Hart]
Now that one has gathered good fundamentals on ML and is aware of various terminologies and jargons, one could explore various areas based on their own interest.
But, at this point of time one needs to decide whether one wants to merely use existing ML algorithms/tools or do they want to code the algorithms themselves. None of the two is inferior to the other. But, people deciding to write the new/existing ML algorithms need to be aware of internals behind the curtain. This is where Andrew Ng’s course lacks immensely. Andrew’s course is more like tool gatherer’s approach and in many ways good for ML enthusiasts but not desirable for all.
ML tool gatherer category of people also need to evolve to large scale machine learning because of its relevancy in the current era. For this, Programming collective intelligence by Toby Segaran is a great resource . A good tool to start experimenting with large scale data is mahout(mahout.apache.org).
Machine Learning for Hackers is another great practical book.
I ♥ data ::
- To carry out experiment themselves one would be able to find a plethora of data repositories listed here: http://www.quora.com/Data/Where-can-I-find-large-datasets-for-modeling-confidence-during-the-financial-crisis-which-is-open-to-the-public
- To assess oneself and have fun, one could as well start competing on Kaggle(www.kaggle.com).
Machine learning is a kind of decision making and hence, more thorough knowledge on related fields like optimization and Game theory needs to be developed. A strong mathematical background on algebra and stochastic also needs to be acquired along with exploring statistical learning theory to its limits. A background on information theory is also helpful.
A list of literature surveys, reviews, and tutorials on Machine Learning and related topics like computational biology, NLP, etc. have been compiled along with link to papers @ www.mlsurveys.com . Deep Learning, SVM, SGD, Bayesian statistics, Recommender engines and Mapreduce for ML are some of the key hot topics in ML in the current scene.