This Ain’t a Scene… It’s an Arms Race


Gun control has become a hot topic recently in the United States. Due to the increase of deaths at the ends of firearms there have been a lot of studies showing how guns flow through America. I wondered what of the larger weaponry. Items like missiles, tanks, and jet fighters Who is buying these heavy duty weaponry? Or do governments just produce their own weapons?

My intuition led me to believe that most heavy weaponry would be produced by China and USA and would be headed toward warzones like Syria, parts of Africa, and parts of the Middle East. These conflict zones surely need the most weaponry. In order to explore this hypothesis I needed data. Luck for me, there is an entire database full of heavy weaponry purchasing and selling. The Stockholm International Peace Research Institute monitors major weapons acquisitions [1]. Using this database I could trace who the major players are and where arms are moving.

After a little clean up I was able to make the plot at the top of the post. In that plot are the major trades ( that were recorded) starting in 1975, however the bulk of the trades were from the 2000’s onward. There are a few different symbols flying around there. The picture below has example of all the icons. All icons were found on the link below [2].

arms symbols

Starting from the top left we have:  Ships, Missiles, Radar Tech, Armored Vehicles, Air Defense, Hand Held Rockets, Aircraft, Military Tech, Engines, and Naval weaponry.  As you can see in the map above, lots of arms are bought and sold by the world.

Some of the more interesting points are where the arms are going and where they are coming from. USA is a big exporter and importer of arms. As expected a lot of arms flow into the Middle East. Hardly any heavy weaponry flows toward South America or, surprisingly, Africa (I guess I’ve seen Lord of War too many times).  A good amount of arms are also making their way toward South East Asia and not surprisingly, South Korea.

It would be interesting to further explore this data to see if the next conflict arises near where many of the arms are flowing toward. Or even if past arms data coincided with the Iraq/Afghanistan war.

The map above was created in D3, and, as we know, I am a very new javascript programmer so I relied heavily on the tutorial found here [3]. This post made it easier to get a map up and running and to make the animations and plotting smooth.

– Marcello

[1] http://www.sipri.org/databases/armstransfers/armstransfers

[2]http://www.freepik.com/free-icon/

[3]http://www.tnoda.com/blog/2014-04-02

Network of Mediciation Side Effects

I recently stumbled upon a database[1] of prescription and generic medicine that contains all of the side effects listed on thier labels. As we all know from all of those prescription medication commercials (looking at you cialis) the side effects take up about half the commercial. I wondered if certain side effects always showed up together. Kinda how cough and cold are always packaged together. I downloaded and scrubbed the database. From there I broke it up into two groups, the map above and the map below.

The network above has all of the most LIKELY side effects from the medication. This was defined as occurring more than 60% of the time in people who took the medication.  This narrowed the 200+ side effects to under 100. I with my very rudimentary medical training grouped these effects into certain categories (bones, blood based, mental, etc). The stroke width of the bonds between two side effects are determined by how many times they both show up together on a side effects list. I created the force graph in d3 with help from two great sources [2][3]. You can manipulate this web and if you double click a point it highlights its neighbors that it is usually paired with.

The network below is the top 50 most COMMON side effects. These side effects appeared most on the labels of all the prescription meds. Because these side effects were so frequent they were connected to everything and produced a rather boring glob of points (as you can see below now). I further restricted the bonds by only keeping links that appeared over 200 times. This produced a (slightly) less intricate web. You can move the slider to break and form bonds, weaker bonds break first. This network also allows for double clicking.

Pretty cool stuff. This might be useful for prediction as certain side effects are always linked together. Future steps might be a more rigorous grouping rather than my less-than informed medical opinion.

All the visualization credit goes to the bottom two sources. All the above is based of the work of the two below. They helped me immensely as I am a novice in javascript. Also a quick shoutout to the [4] source as it was a complete pain to get d3 working with wordpress.

-Marcello

[1]http://sideeffects.embl.de/

[2]https://bl.ocks.org/mbostock/4062045

[3] http://www.coppelia.io/2014/07/an-a-to-z-of-extra-features-for-the-d3-force-layout/

[4]https://www.datamaplab.com/posts/embedding-javascript-visualization-wordpress/

Journal Club: Week of 1/15/16

Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods

William S. Cleveland; Robert McGill
Journal of the American Statistical Association

This paper is a little different from the previous papers I’ve read. It does not outline new optimization methods or review machine learning algorithms. This paper just looks out how we display data, which has become a huge field in the most recent years. Cleveland and McGill wrote this paper before the wave of data analysts and data scientists, back in 1984. However, there findings are still very relevant and super useful.
The paper outlines a study on perception. The authors sought to determine which charts and graphs are the most easily interpret-able and most accurate. They broke down the interpretation and structure of graphs into 10 “elementary perceptual tasks” which describe features that graphs use to separate data. The ten tasks are as follows ranked in best to worst:
1.position along a common scale
2. position along non aligned scales
3. Length, Direction, Angle
4. Area
5. volume, curvature
6. shading, color saturation
This is quite an interesting list especially when looked at in terms of todays graphical documents. One of my favorite visualizations is the cholorpleth maps which rely almost entirely on color saturation, however this task fared the worst! Keeping these tasks in mind the authors iterate through common graphs. Some scored better than others, Bar charts topped most of their tests, while the widely hated pie chart scored toward the bottom.
This paper is full of graphs and charts to show their findings and examples of how some graphs fail and other succeed. A particularly interesting example is the distance between two curves. On the left the show a matrix of two curves and asked their participants to estimate the distance between the two curves at various points. On the right is the actual difference of the two curves. I found that even after reading the paper I would stumble on my perception of the two curves.
This paper is excellent. I highly suggest that anyone who makes graphs gives it a quick read. The graphs look a little dated, but nevertheless contain tons of information. It even has some recommendations on common graphs to replace with graphs that better display information.
One thing the paper does not capture is the recent trend to make graphs as pretty as possible. There is an obvious trade off that the creator must decode. Do I want to make a pretty graph that entices clicks or a utilitarian graph which conveys the most amount of information? Reading this paper brings us a little closer to a happy compromise.
Posted in

Follow The Money

Campaign finances are becoming a prominent issue in today’s elections. We have candidates like Jeb Bush who are receiving record breaking amounts of donations from private citizens and private companies alike. On the other hand we have candidates like Bernie Sanders who only receives small donations from citizens. Regardless of your opinion on which end of the spectrum candidates should behave toward campaign donations, they are nevertheless an important part of US elections. When discussing campaign donations it is almost always about presidential candidates, but what about our legislators. They only time I ever heard about donations to legislators is when there is a huge scandal.  Do they pull in as much money as presidential candidates? Do they receive more money from the average citizen or the average corporation? Do legislators of a certain party pull in more than another?

To achieve this I needed data on campaign donations for all the federal legislators. Luckily for me I am not the first to look for this data. There are quite a few places to go to for this information, but I wanted a place with an easy to understand API and something reliable. This led me to followthemoney.org. Here there is a very soft “API”, but nevertheless super useful and easy to parse. I took the data for all legislators from the past 5 years for any candidate that ran for either the Senate or the House of Representatives. Using their API, I exported their data in csv format. From there the preprocessing and the analysis was all preformed in python (anaconda distribution).

Before we jump into the analysis we need to know a little more about campaign contributions themselves. There are federal contribution limits imposed to limit how much people (and corporations, parties, PACS, etc..)  can donate. There are a few ways to get around these limits however and recent legislature that has helped to facilitate that.

a

The two recent major decisions that we need to know for analysis are McCutcheon v. FEC and Citizens united v. FEC.  Both of these decisions deal with how much people can contribute to candidates.  Citizens United v. FEC[1] prohibited the government from restricting political expenditures by a non-profit organization. This however is subtly different than direct campaign donations, this type of expenditure[2] endorses the candidate but is made independently from the candidate. This we have to keep in mind when discussing PACs, as this is a heavily used tactic to funnel large sums of money into a candidate. The other decision was McCutcheon v. FEC[3], this decision removed the aggregate limits on campaign contributions. These decisions have brought in new spending and new ways to spend. It is more important

STATE OF THE UNION

Now that we know who can donate and how much they can donate, let’s see who is on the receiving end. One more important thing we need to know is when all of our candidates are up for election. As we all know Senators serve a total of six years, with 1/3 of the Senate up for reelection every two years.  Congressmen on the other hand only serve two years and are up for election every two years. Elections fall on the even numbered years, so our important data will fall on 2010, 2012, and 2014. All other years are reserved for special elections, like if a senator dies midterm.  For this first part I am using the data from the 2014 elections.

In 2014, the average Senator/Congressman pulled in a little over a million dollars in campaign donations. This average is a little skewed as some legislators pulled under a thousand and some over 10 million.  The range of the campaign donations is set by these two men, Mitch McConnell and John Patrick Devine. Mitch McConnel, a republican and more importantly the Majority leader of the Senate, pulled in a whooping $30 million, while John Patrick (whose googling revealed no pertinent results) pulled in a less than stellar $40. Naturally, John Patrick lost his race, while Mitch McConnel is our current majority leader and has held his office in the senate since 1985.

ca

Speaking of winners and losers, who makes more? Naturally, I believe that the average winner should pull in a great deal more than the average loser.  And well, that’s pretty much how it goes. Winners bring in an average of $2 million while losers can only muster up about half a mill. This is however, excluding a group of politicians who chose to withdraw from the race. Looking at those politicians they close the gap, but not by much they pull in half of that as winner, $1 million.

cap2

Before we dive into state by state trends lets see how the Senate do against the house of reps. Below the graph sums up this subsection.

Capture3

The House pulls in a lot more donations than the Senate. However, this may be due to the sheer amount of people that run for the House. This brings us to a good point. Donations are heavily influenced by the amount of people who run and the amount of people who donate. To avoid making everything based simply on population rather than underlying trends. Most of the following graphs will be averages or per capita when necessary.

Keeping that in mind, let’s see how all 50 states line up. Below is a graph on donations per capita.

 

2014 Camaign Donations to Legislators Grand Total

That’s much better. As you can see this is obviously not a population map. States like NY, NJ, MA, and CA are not top tier, but rather toward the bottom. Interestingly enough, states that have less people in them seem to have much greater donations per person, Alaska is a notable example. Why do these states get way more contributions than others? One possible explanation are that some of theses states are swing states. Swing states (like New Hampshire above) are very closely divided between the Republicans and the Democrats. These states should naturally garnish more donations as the races should be more exciting and volatile. In coarser terms, campaign money is more valuable in these states.

Before we go any further, we have to go into whose donating, lets take a look nationwide as to who is donating the most. Is it mostly large sums, or small donations?

PEOPLE, PACS, AND THINGS

Speaking of small donations, who actually donates to campaigns? I personally have never, my naïve and uninformed idea of campaign donations are just giant faceless corporations throwing money at candidates. Let’s take a peek at average joes like you and me and how much they spend. Below you can see two maps of the US, one for 2012 and one for 2014.  Hover over each state to see which citizen donated the most and how much they donated, the color scale lets you compare states to each other.

Top Donators 2012 & 2014
These two maps display the top donators for each state in 2012 and 2014. As you can see in 2012 Texas and Connecticut dominated in terms of individual donators. These points may have skewed the data however, as Linda from Connecticut was actually running in the campaign herself, personally funding her run.  David from Texas was the lieutenant governor of Texas during that time. These do not seem like ordinary people.
2014 paints a more familiar (and relate-able) campaign. The donations are much lower than in 2012, but similar trends emerge. New York, California and Texas are all toward the top in terms of individual donators. With “fly-over” states toward the bottom.

Now what about those big faceless corporations. Here are two more maps, however these are only for the year 2014. The map on the left shows the  the top Industry for that state the chart on the right shows the top ten Industries that donate the most nationwide.

Top Industries 2014

Again we have what looks to be a population map. It seems like states with the most people have the highest individual donators whether from citizens or corporations. One thing that stood out to me were the biggest donors. Real estate and medical professionals we the top players in most states. Much less surprising was that Oil & Gas donated the most where, you guessed it, there is Oil & Gas.

Finally, what about groups who donate based on different ideology? Some examples of these groups are pro-Israel, Pro-Life/Pro-Choice, environmental policy as well as many others. The bar chart on the left shows a nationwide average of which ideologies get the most money. The map on the left shows the most popular ideology per state.

Top Ideaologies 2014
Unfortunately the way followthemoney.org structures its data makes looking at ideologies a little boring. General Liberal and Conservative ideologies are grouped. Obviously these dominate nationwide. However, there are some other ideologies that creep up after these two power houses. Big issues like Foreign policy and environment garner some money. These ideologies do not nearly donate as much as some of the smaller industries. Excluding Liberal and Conservative, ideologies donate ten fold less than industries.

 

PARTY FOUL

So far we have skipped over the two most important groups in American politics, the Republicans and the Democrats. How do the parties compare? Seeing that the country is pretty divided on party allegiance I’d expect donations to each party be relatively the same. One thing I’d also expect is that third party candidates don’t pull in even the same magnitude as the two major parties.

Capture
Well that seems about right. Democrats and Republicans pull in around the same amount each year, while third party candidates are not even close. This was to be expected as third-party candidates rarely have the same pull nor presence as candidates from the two major parties.

Now what about statewide. During presidential elections most states are glazed over. This is because they are usually deeply entrenched in one party of the other. Below is a map of which party got each states electoral votes. Next to that is which party got more money in the 2014 elections.

130110_prezmap_328

from: Politico.com

 

The Elephants and the Donkeys

The first graph from Politico shows which party each state voted for. The one below is which party received more donation in each state. The two maps look quite similar.  Both the east and west coast mirror each other to an extent. The midwest also aligns with donations. Donations to legislators in each state may be a good predictor into where the electoral votes end up.  Or, more possibly, states that were going to vote for a certain party donate to that party more. 

 

Some states receive a lot more attention than other when it comes time for presidential elections. Currently I am only looking at federal legislator’s donations, but I wonder if they reflect presidential politics as well. Certain states I will refer to as swing states. These states are not as deeply entrenched as others. The swing states for 2014 were: Nevada, Colorado, Iowa, Wisconsin, Ohio, New Hampshire, Virginia, North Carolina, and Florida. The map below highlights states that have the closest spending between the Democrats and the Republicans.

Battleground States

Most of the swing states have very similar donations between the two parties. Swing states like Virginia, Florida, and Nevada have very close donations totals. Virginia actually has the closest out of all of the states. On the other end, states like California, Texas, and New York have the greatest difference in donations. This makes sense as these states are deeply entrenched in one party, just look at Texas the donations are completely lopsided. There is some good news in this map. Most states are relatively close when it comes to donations to both parties.

MONEY MONEY MONEY MONEY

Political Donations are a critical component of the United States government. Looking at the donations many of my previous assumptions were confirmed and many were discredited. However, one must have a critical eye on the data presented. The analysis is only as good as the data collected. I believe it is integral to have reliable and vetted donation data as it holds many insights. I’d like to thank followthemoney.org for their data and commitment. If you liked this analysis please check out their website and explore the data yourself! Maybe even consider donating!

 

-Marcello

 

[1] https://en.wikipedia.org/wiki/Citizens_United_v._FEC

[2] https://en.wikipedia.org/wiki/Independent_expenditure

[3] https://en.wikipedia.org/wiki/McCutcheon_v._FEC

Journal Club: Week of 12/4/2015

A Few Useful Things to Know about Machine Learning
Pedro Domingos
Department of Computer science and Engineering University of Washington

This was an excellent read. I highly suggest this paper to novices and expert alike. Pedro goes through all the mysticism and what he calls “folk knowledge” in this paper. Knowledge that would takes years of machine learning you uncover. Pedro breaks down machine learning to simple concepts and shows the reader how to deal with them. Do not be mistaken, this is not a tutorial. You will not learn any new algorithms or application. You will only learn how to better use the ones you know. That being said, I believe it is best to go into this paper with a little background so you are not lost by what Pedro is explaining.
Pedro explores major pitfalls of people who are first learning machine learning as well as seasoned pros. I particularly liked his section on overfitting and the section on how to approach problems. ‘Start simple first” is a common piece of advice, but Pedro backs it up with examples and graphs showing how different methods perform. His advice on more data vs a clever-er model is invaluable. I highly suggest reading this paper, it is a quick and powerful read.

 

PLS-regression: a Basic Tool of Chemometrics
Svante Wold, Michael Sjostrom, Lennart Eriksson
Institute of Chemistry Umea University

 

Another paper on PLS, this one a little more current and a little more practical. Like Geladi’s paper on PLS, this paper goes in depth with PLS within the scope of chemistry and engineering, so its right up my alley.  After reading it, not all of my questions were answered butI felt like I had a better grasp on the algorithm. One thing I really liked about this paper was the diagnostics and the interpretation.
The paper is structured around an Amino Acid example. This serves as a good basis and testing ground as the provide the raw data for anyone to test on. The power of this paper is in the last couple of sections. The authors guide the reader through each step of interpreting the results. It goes through initial results to essential plots. Each plot gets its own subsection, however, they are not all given the same importance. The explanations on some of them are very brief, restricted to only one or two paragraphs.
If you are only going to read one section of this paper flip to the second to last page and read “Summary; How to develop and interpret a PLSR model.” Here the authors give a very quick overview which will get you on your feet and give you a basic understanding of what is going on. It makes as a good reference as well.
-Marcello

FOLLOW THE MONEY: FEDERAL LEGISLATURE PART 4

A quick refresher for those just joining us. I took campaign donation data from followthemoney.org. This website makes campaign donations very easy to parse and work with. I gathered the data for all campaign donation to either Senators or Congressmen regardless of whether they were elected or not. With this data I was able to see patterns with regards to political parties, candidate’s office, and others. In this part we will take a look how each state compares to each other. First lets take a look at overall donations for 2014.

2014 Camaign Donations to Legislators Grand Total

Don’t try to pull too many grand conclusions from the above graph. Like I mentioned when talking about winners and losers in elections, donations per candidate (or here per capita) give more insight. The above graph shows what is basically a population map. The more populated state show up in a darker green than the less populated states. This poses an unfair advantages for states like California and New York. People in less populated states have to donate more per person than people in higher populated states. So in order to get a fairer comparison we need to normalize our donations. I have calculated donations per capita for each state.

2014 Camaign Donations to Legislators Grand Total

That’s much better. As you can see the maps are wildly different and does not resemble a population map in any way. States like NY, NJ, MA, and CA are no longer top tier, but rather toward the bottom. Interestingly enough, states that have less people in them seem to have much greater donations per person, Alaska is a notable example. Why do these states get way more contributions than others? One possible explanation are that some of theses states are swing states. Swing states (like New Hampshire above) are very closely divided between the Republicans and the Democrats. These states should naturally garnish more donations as the races should be more exciting and volatile. Speaking of parties which states gave more to the Democrats and which gave more to the Republicans.

The Elephants and the Donkeys

Nothing too surprising here. Most republican states have more donations toward republican candidates and the same for democratic states. However, there are a few confused states. Arizona, Colorado, and New Mexico are generally considered republican states, but the Democrats raised a lot more money. The opposite goes for Wisconsin, Michigan, and Pennsylvania typical Democratic states. This map reinforces some geographical trends. The northeast coast and west coast are usual democratic strongholds.

A quick word on the interactive graphs above. These graphs were made using plotly and python. Plotly makes it very easy to make d3.js type graphs and interactive web apps. Recently plotly went open source which is great news for all of us. If you are looking to quickly make interactive graphs plotly should be your first stop (unless you are really good with d3). This ends the exploratory portion of Follow The Money, next up is the final report. Enjoy the interactive maps!

-Marcello

Posted in

Journal Club: week of 11/20/2015

A Few Useful Things to Know about Machine Learning
Pedro Domingos
Department of Computer science and Engineering University of Washington

This was an excellent read. I highly suggest this paper to novices and expert alike. Pedro goes through all the mysticism and what he calls “folk knowledge” in this paper. Knowledge that would takes years of machine learning you uncover. Pedro breaks down machine learning to simple concepts and shows the reader how to deal with them. Do not be mistaken, this is not a tutorial. You will not learn any new algorithms or application. You will only learn how to better use the ones you know. That being said, I believe it is best to go into this paper with a little background so you are not lost by what Pedro is explaining.

Pedro explores major pitfalls of people who are first learning machine learning as well as seasoned pros. I particularly liked his section on overfitting and the section on how to approach problems. ‘Start simple first” is a common piece of advice, but Pedro backs it up with examples and graphs showing how different methods perform. His advice on more data vs a clever-er model is invaluable. I highly suggest reading this paper, it is a quick and powerful read.

PLS-regression: a Basic Tool of Chemometrics
Svante Wold, Michael Sjostrom, Lennart Eriksson
Institute of Chemistry Umea University

Another paper on PLS, this one a little more current and a little more practical. Like Geladi’s paper on PLS, this paper goes in depth with PLS within the scope of chemistry and engineering, so its right up my alley. After reading it, not all of my questions were answered butI felt like I had a better grasp on the algorithm. One thing I really liked about this paper was the diagnostics and the interpretation.

The paper is structured around an Amino Acid example. This serves as a good basis and testing ground as the provide the raw data for anyone to test on. The power of this paper is in the last couple of sections. The authors guide the reader through each step of interpreting the results. It goes through initial results to essential plots. Each plot gets its own subsection, however, they are not all given the same importance. The explanations on some of them are very brief, restricted to only one or two paragraphs.

If you are only going to read one section of this paper flip to the second to last page and read “Summary; How to develop and interpret a PLSR model.” Here the authors give a very quick overview which will get you on your feet and give you a basic understanding of what is going on. It makes as a good reference as well.

-Marcello

FOLLOW THE MONEY: FEDERAL LEGISLATURE PART 3

I took a quick look candidate donations limited to New Jersey, now I’ve moved nation wide. Lets see if the trends that were in New Jersey were typical of the whole nation or just Jersey. I restricted the data to just 2014 to make it a little more manageable. As always lets look at Dems verse Repubs.

all states leg party

Here we see the party breakdown, along with the elusive third party. If it wasn’t obvious already the de facto two party system completely eclipses all third party hopes. Dems and Repubs trump the cumulative third party total by a magnitude difference.  Moreover Republicans candidates across the nation raise more money than their democratic counterparts. This caught me by surprise as I thought totals would lean a little democratic, but more or less even. Lets take a peak at the office breakdown.

2014 was a big election year for the House, and a lesser year for the Senate. My prediction would put House campaign donations way ahead of the Senate.

all states leg office

Yup that looks about right. Not as big a spread as I would of guess, but this follows from the years context. One thing to note, with this dataset I kept all candidates, even if they lost. This should give a more complete look at ALL donations to candidates not just the ones that have been elected. So I wonder who raised more, the winners or the losers?

win lose

The above graph is misleading. You may want to say that people who won their elections raised more money, and you would be right if you looked at it cumulatively. However, to get anything meaningful out of this graph we need to look at per elected official. It could be that there are simply more candidates that won than lost, leading to the spread.

per poli

Now this is surprising, even per candidate the politicians who were elected raised almost 5 times that of those who lost. Out of the 1415 candidates, 936 of them lost, and 474 of them won. Only 3 withdrew and 2 were “unknown”. Finally, lets look at the industries again.

all state industry

 

Here we see uncoded donations eclipsing the rest of the other industries per usual. As a reminder, Uncoded actually includes PAC donations as well as individual donations. This is why uncoded always comes in as the largest category.

On a federal level it looks like New Jersey is pretty much in line with all the states. However, the whole point of getting data for every state is to be able to compare them. Stay tuned for part 4

 

Marcello

P.S. heres a preview

statescolor

 

Journal Club: Week of 11/13/2015

Got two more for you this week. One on Machine Learning and the other on multivariate. Check them out.

Supervised Machine Learning: A Review of Classification Techniques
By S.B. Kotsiantis
University of Peloponnese (2007)

This paper serves as a review of a subset of supervised machine learning algorithms with a focus on classification. Because of the vast amount of algorithms present the author breaks down the paper into key features of the algorithms. First the author gives a brief overview of machine learning in general, why and how it is used. What I liked most about this paper is that even before any algorithms are mentioned the author talks about general issues with classifiers and algorithm selection. This prepares the reader and removes the notion of the “silver bullet” algorithm.
The article is well organized. Kotsiantis starts with the most intuitive of machine learning algorithms, decision trees, and works his way up to new and more recent (well for 2007 at least) techniques. Each section goes over a multitude of techniques within the subheading, for example Statistical Learning algorithm contains Naïve Bayes and Bayesian Networks. I liked this organization as it guides the reader into more complex techniques. One thing that lacks is the depth. Most techniques are rushed over and not fully explained, but this paper’s purpose is not to outline precise steps to implement each technique but rather to familiarize the reader with existence of certain techniques.
Another criticism I have of the paper is that it seems to feel a little dated. This is of no fault of the author of course, but nevertheless a more recent paper may be worthwhile to follow up on. There is a table in the paper comparing the different techniques in terms of speed, tolerance, and other parameters which is very useful. However it might need to be checked for accuracy as it might be outdated.

Partial Least Squares Regression: A Tutorial
By Paul Geladi and Bruce R Kowalski
Analytica Chimica Acta, 185 (1986) 1-17

Here is an oldie but a goodie. When first learning about Partial Least Squares (PLS, or sometimes called projection onto latent sturctures) there was a vast amount of papers, but none really drove the point home for me. I went back to one paper that was constantly being cited, this paper from 1986. This paper provides a very clear tutorial on how to get PLS up and running. This paper assumes you have an understanding of linear algebra. Starting with data preprocessing, the paper states what form your data needs to be in and how to get it into that form.
The paper takes a detour however. It first goes over exisiting methods like multiple linear regression and principal component regression before it begins to explain PLS. This was good and bad for me as I was solely interested in PLS, nevertheless, the other tutorials gave insight and quick rudimentary ways of using other regression methods. However, I was here for the PLS. The paper immediately dives into building the PLS model. Take care reading this section as the explanation is sparse. Overall, it’s not the best tutorial, however it has two invaluable take aways. Figure 9 in the paper shows a geometrical representation of all the outputs and inputs the PLS model uses. It shows exactly the dimensions of each and how they relate to each other.
The other is the sample PLS algorithm. In the appendix of the paper there is almost a pseudocode like description of the PLS algorithm. Using this, I was able to get a PLS program up and running in less than an hour. This algorithm clearly shows every step that must be taken and exactly how to do it. This is the main reason why I would recommend this paper. There are others out there that explain PLSR better, but this paper allows for a rapid implementation of PLS.

-Marcello

Follow The Money: Federal Legislature Part 2

Last part we took a look at campaign donations to New Jersey State legislatures. Now we are moving on up to the US House and Senate. The stakes are a little higher, the politicians have more power, and hopefully full of campaign donations. Luckily for me we have Followthemoney.org on our side.

All data collected for the following graphs was using followthemoney.org’s API. This made it easy to tabulate and graph all the recorded donations. First up is Democrats Vs Republicans.

fed leg party

Follows state legislature pretty closely. Democrats stomp republicans in terms of donations, however, this may be due to our data source rather than reality. 2014 and 2010 show close donation totals, while 2012 shows a blowout. 2013 seems to be completely missing republican data. That or only Democrats won.

One important qualification to make on this data set is that it only represents donations to candidates who won their elections. We need context for 2013 as it is an off year election there must be some special circumstance. Luckily wikipedia is here to help out. Apparently during this time, sadly a senator  passed away and a special election was held. As we suspected, a democratic candidate won. This may have contributed to the lopsided data. Now lets see if office maters at all.

fed leg office

Depending on the year it looks like office matters quite a bit. The special Senate election in 2013 influenced all campaign spending that year. 2010 was similar to 2013, but completely dominated by House campaign donations. As you probably know, house seats are up every 2 years. In the data above, house donations are all in the same range except in 2013, where there is no election. Senate elections on the other hand are every 2 years, but only 1/3 of the seats are up. New Jersey Senators were up for reelection in both 2012 and 2014 but not in 2010, explaining the lack of donations. Finally lets look at industry donations in 2012.

fed leg industry

Here we see uncoded donations eclipsing the rest of the other industries. After seeing uncoded in part 1 I investigated. Uncoded actually includes a PAC donations as well as individual donations. This is why uncoded always comes in as the largest category.  I did some quick calculations to see what % was from individuals like you and me and what % came from corporations and other PACs.

Individual  $  14,760,750.00
Non-Individual  $    1,412,439.00
Grand Total  $  16,173,189.00

Overwhelmingly the donations stemmed from Individuals. That is super surprising for me.  There’s a lot more visualizations I can do with this data, but before that, we have to go nationwide.

-Marcello

find the data here:NJfedDon