Journal Club: Week of 12/4/2015

A Few Useful Things to Know about Machine Learning
Pedro Domingos
Department of Computer science and Engineering University of Washington

This was an excellent read. I highly suggest this paper to novices and expert alike. Pedro goes through all the mysticism and what he calls “folk knowledge” in this paper. Knowledge that would takes years of machine learning you uncover. Pedro breaks down machine learning to simple concepts and shows the reader how to deal with them. Do not be mistaken, this is not a tutorial. You will not learn any new algorithms or application. You will only learn how to better use the ones you know. That being said, I believe it is best to go into this paper with a little background so you are not lost by what Pedro is explaining.
Pedro explores major pitfalls of people who are first learning machine learning as well as seasoned pros. I particularly liked his section on overfitting and the section on how to approach problems. ‘Start simple first” is a common piece of advice, but Pedro backs it up with examples and graphs showing how different methods perform. His advice on more data vs a clever-er model is invaluable. I highly suggest reading this paper, it is a quick and powerful read.


PLS-regression: a Basic Tool of Chemometrics
Svante Wold, Michael Sjostrom, Lennart Eriksson
Institute of Chemistry Umea University


Another paper on PLS, this one a little more current and a little more practical. Like Geladi’s paper on PLS, this paper goes in depth with PLS within the scope of chemistry and engineering, so its right up my alley.  After reading it, not all of my questions were answered butI felt like I had a better grasp on the algorithm. One thing I really liked about this paper was the diagnostics and the interpretation.
The paper is structured around an Amino Acid example. This serves as a good basis and testing ground as the provide the raw data for anyone to test on. The power of this paper is in the last couple of sections. The authors guide the reader through each step of interpreting the results. It goes through initial results to essential plots. Each plot gets its own subsection, however, they are not all given the same importance. The explanations on some of them are very brief, restricted to only one or two paragraphs.
If you are only going to read one section of this paper flip to the second to last page and read “Summary; How to develop and interpret a PLSR model.” Here the authors give a very quick overview which will get you on your feet and give you a basic understanding of what is going on. It makes as a good reference as well.