Journal Club: Week of 12/4/2015

A Few Useful Things to Know about Machine Learning
Pedro Domingos
Department of Computer science and Engineering University of Washington

This was an excellent read. I highly suggest this paper to novices and expert alike. Pedro goes through all the mysticism and what he calls “folk knowledge” in this paper. Knowledge that would takes years of machine learning you uncover. Pedro breaks down machine learning to simple concepts and shows the reader how to deal with them. Do not be mistaken, this is not a tutorial. You will not learn any new algorithms or application. You will only learn how to better use the ones you know. That being said, I believe it is best to go into this paper with a little background so you are not lost by what Pedro is explaining.
Pedro explores major pitfalls of people who are first learning machine learning as well as seasoned pros. I particularly liked his section on overfitting and the section on how to approach problems. ‘Start simple first” is a common piece of advice, but Pedro backs it up with examples and graphs showing how different methods perform. His advice on more data vs a clever-er model is invaluable. I highly suggest reading this paper, it is a quick and powerful read.

 

PLS-regression: a Basic Tool of Chemometrics
Svante Wold, Michael Sjostrom, Lennart Eriksson
Institute of Chemistry Umea University

 

Another paper on PLS, this one a little more current and a little more practical. Like Geladi’s paper on PLS, this paper goes in depth with PLS within the scope of chemistry and engineering, so its right up my alley.  After reading it, not all of my questions were answered butI felt like I had a better grasp on the algorithm. One thing I really liked about this paper was the diagnostics and the interpretation.
The paper is structured around an Amino Acid example. This serves as a good basis and testing ground as the provide the raw data for anyone to test on. The power of this paper is in the last couple of sections. The authors guide the reader through each step of interpreting the results. It goes through initial results to essential plots. Each plot gets its own subsection, however, they are not all given the same importance. The explanations on some of them are very brief, restricted to only one or two paragraphs.
If you are only going to read one section of this paper flip to the second to last page and read “Summary; How to develop and interpret a PLSR model.” Here the authors give a very quick overview which will get you on your feet and give you a basic understanding of what is going on. It makes as a good reference as well.
-Marcello

FOLLOW THE MONEY: FEDERAL LEGISLATURE PART 4

A quick refresher for those just joining us. I took campaign donation data from followthemoney.org. This website makes campaign donations very easy to parse and work with. I gathered the data for all campaign donation to either Senators or Congressmen regardless of whether they were elected or not. With this data I was able to see patterns with regards to political parties, candidate’s office, and others. In this part we will take a look how each state compares to each other. First lets take a look at overall donations for 2014.

2014 Camaign Donations to Legislators Grand Total

Don’t try to pull too many grand conclusions from the above graph. Like I mentioned when talking about winners and losers in elections, donations per candidate (or here per capita) give more insight. The above graph shows what is basically a population map. The more populated state show up in a darker green than the less populated states. This poses an unfair advantages for states like California and New York. People in less populated states have to donate more per person than people in higher populated states. So in order to get a fairer comparison we need to normalize our donations. I have calculated donations per capita for each state.

2014 Camaign Donations to Legislators Grand Total

That’s much better. As you can see the maps are wildly different and does not resemble a population map in any way. States like NY, NJ, MA, and CA are no longer top tier, but rather toward the bottom. Interestingly enough, states that have less people in them seem to have much greater donations per person, Alaska is a notable example. Why do these states get way more contributions than others? One possible explanation are that some of theses states are swing states. Swing states (like New Hampshire above) are very closely divided between the Republicans and the Democrats. These states should naturally garnish more donations as the races should be more exciting and volatile. Speaking of parties which states gave more to the Democrats and which gave more to the Republicans.

The Elephants and the Donkeys

Nothing too surprising here. Most republican states have more donations toward republican candidates and the same for democratic states. However, there are a few confused states. Arizona, Colorado, and New Mexico are generally considered republican states, but the Democrats raised a lot more money. The opposite goes for Wisconsin, Michigan, and Pennsylvania typical Democratic states. This map reinforces some geographical trends. The northeast coast and west coast are usual democratic strongholds.

A quick word on the interactive graphs above. These graphs were made using plotly and python. Plotly makes it very easy to make d3.js type graphs and interactive web apps. Recently plotly went open source which is great news for all of us. If you are looking to quickly make interactive graphs plotly should be your first stop (unless you are really good with d3). This ends the exploratory portion of Follow The Money, next up is the final report. Enjoy the interactive maps!

-Marcello

Posted in