Journal Club: Week of 11/13/2015

Got two more for you this week. One on Machine Learning and the other on multivariate. Check them out.

Supervised Machine Learning: A Review of Classification Techniques
By S.B. Kotsiantis
University of Peloponnese (2007)

This paper serves as a review of a subset of supervised machine learning algorithms with a focus on classification. Because of the vast amount of algorithms present the author breaks down the paper into key features of the algorithms. First the author gives a brief overview of machine learning in general, why and how it is used. What I liked most about this paper is that even before any algorithms are mentioned the author talks about general issues with classifiers and algorithm selection. This prepares the reader and removes the notion of the “silver bullet” algorithm.
The article is well organized. Kotsiantis starts with the most intuitive of machine learning algorithms, decision trees, and works his way up to new and more recent (well for 2007 at least) techniques. Each section goes over a multitude of techniques within the subheading, for example Statistical Learning algorithm contains Naïve Bayes and Bayesian Networks. I liked this organization as it guides the reader into more complex techniques. One thing that lacks is the depth. Most techniques are rushed over and not fully explained, but this paper’s purpose is not to outline precise steps to implement each technique but rather to familiarize the reader with existence of certain techniques.
Another criticism I have of the paper is that it seems to feel a little dated. This is of no fault of the author of course, but nevertheless a more recent paper may be worthwhile to follow up on. There is a table in the paper comparing the different techniques in terms of speed, tolerance, and other parameters which is very useful. However it might need to be checked for accuracy as it might be outdated.

Partial Least Squares Regression: A Tutorial
By Paul Geladi and Bruce R Kowalski
Analytica Chimica Acta, 185 (1986) 1-17

Here is an oldie but a goodie. When first learning about Partial Least Squares (PLS, or sometimes called projection onto latent sturctures) there was a vast amount of papers, but none really drove the point home for me. I went back to one paper that was constantly being cited, this paper from 1986. This paper provides a very clear tutorial on how to get PLS up and running. This paper assumes you have an understanding of linear algebra. Starting with data preprocessing, the paper states what form your data needs to be in and how to get it into that form.
The paper takes a detour however. It first goes over exisiting methods like multiple linear regression and principal component regression before it begins to explain PLS. This was good and bad for me as I was solely interested in PLS, nevertheless, the other tutorials gave insight and quick rudimentary ways of using other regression methods. However, I was here for the PLS. The paper immediately dives into building the PLS model. Take care reading this section as the explanation is sparse. Overall, it’s not the best tutorial, however it has two invaluable take aways. Figure 9 in the paper shows a geometrical representation of all the outputs and inputs the PLS model uses. It shows exactly the dimensions of each and how they relate to each other.
The other is the sample PLS algorithm. In the appendix of the paper there is almost a pseudocode like description of the PLS algorithm. Using this, I was able to get a PLS program up and running in less than an hour. This algorithm clearly shows every step that must be taken and exactly how to do it. This is the main reason why I would recommend this paper. There are others out there that explain PLSR better, but this paper allows for a rapid implementation of PLS.

-Marcello