Journal Club: Week of 11/06/2015

Welcome to the first week of my journal club! I’ve gotten into the habit of looking for new and exciting papers to read. I made it a goal to read at least one new paper each week and I thought I’d share. The subject matter is not only on optimization but on various data analysis techniques, machine learning, multivariate controls, and other topics. Most of these papers are available online for free.


Multivariate Analysis
by Herve Abdi
University of Texas at Dallas


This paper serves as a what is best put as a catalog on multivariate techniques. It is by no means exhaustive, but contains a decent amount of techniques an brief blurbs on usage and statistical technique. The paper is organized by ones amount and format of ones data. First the author looks at techniques focused around one data set then he expands into two data sets. The two data sets section is split up into two categories. The first category assumes that one data set is trying to predict the other, the second category assumes that they are just different sets of variables. I like this organization as it makes it achieves the authors goal as a catalog. When I am looking for possible techniques, I can first look toward my data to rapidly see which techniques are not suitable.
When talking about the statistical techniques, the author goes into jsut enough details. The reviews rarely go above 2-3 paragraphs. One major criticism I have of this review is that it may be too brief. The author goes over how the techniques work, but barely touches upon possible issues or, more importantly, most appropriate usage. The author limits his discussion on usage to maximum 1 or 2 sentences.  However, this may be by design as the author mentions up front in the abstract that  “choice of the proper technique for a given problem is often difficult.” Nevertheless, this article starts as a jumping off point. If one desires to know more about the technique in question this paper at least gives a basis to expand upon.
Overall the paper is short, but provides enough insight for a reader to begin exploring possible options for multivariate analysis.


The paper can be found here.


Principal Component Analysis: Concept, Geometrical Interpretation, Mathematical Background, Algorithms, History, Practice
by K.H. Esbensen and  P. Geladi


This chapter by Esbense and Geladi fully guides the reader through the ins and outs of Principal Component Analysis (PCA). Because PCA is a basis and starting point for many multivariate methods, one needs a strong fundamental understanding. This chapter provides that and more. The chapter uses a geometrical interpretation of PCA which helps the reader to better understand what the algorithm does to decompose a series of variables and observations. Out of all the papers ive read on PCA, this chapter helped me the most.
this chapter includes an abundance diagrams which step the reader through all the projections PCA makes to our data set. Esbensen and Geladi take a matrix of variables and observations X, represented as a rectangle, and they run it through PCA algorithm.  This algorithm decomposes the matrix X into the two vectors, the loading , P, and scores , T, vectors. This is represented as the rectangle X decomposing into two smaller rectangles, T and P. They then go on to represent the “master equation” of PCA in the same way. This allows the reader to quickly grasp how PCA works visually. This is reinforced in the next section where PCA is represented as a change in coordinate axes. Finally, if geometric interpretations are not your thing, the authors include a simple algebraic approach, which stems into an algorithm for PCA. The algorithm is laid out briefly by the authors. I would have liked a more through step by step guidance, but this is satisfactory and enough to get a basic PCA program up and running.
Finally the chapter ends with an example and limitations. The example shows sample outputs and interpretation of the data which I found very beneficial. However, the example section is the weakest of the paper. I would have liked the authros to go into more detail of tha analysis and what conclusions you can make, these are briefly addressed (these may be contained in another section or chapter). Another thing I would have liked the actual data set to play around with, but i realize this was probably an expert chapter and shortened for space. Nevertheless, this paper is my go to for any questions or issues, but not on analysis of PCA.


Check out these two articles for an intro into multivariate!
Two new articles next week!
-Marcello