





See original visualization here: http://setosa.io/blog/2014/07/26/markov-chains/
After finishing up my last post about modelling artists and their probability to release consecutive Best new music albums (see part 1 here), I got to thinking about what else I could use the data that I scraped. I had all album reviews from 2003 to present including the relevant metadata, artist, album, genre, author of the review, and date reviewed. I also had the order in which they were reviewed.Then, with Markov chains still fresh in my mind, I got to thinking, do albums get reviewed in a genre based pattern? Are certain genre’s likely to follow others?Using the JavaScript code from http://setosa.io/blog/2014/07/26/markov-chains/, I plugged in my labels (each of the genres) and the probability of state change (moving from one genre to another) which resulted in the 9 node chain at the top of the post. If you let the chain run a little while you will notice a few patterns. The most obvious pattern is that all roads lead to Rock. For each node the probability of the next album being a rock album is close to 50%. This is because not all genres are equally represented and also because of the way Pitchfork labels genres. Pitchfork can assign up to 5 genres to an album it reviews. With up to 5 possibilities to get a spot, some genres start to gain a lead on others. Rock, for instance, is tacked on to other genres more frequently than any other genre. This causes our markov chain to highly favor going to Rock rather than other genres like Global and Jazz which are not tacked onto other as frequently. So if you are the betting type, the next album Pitchfork will review is probably a rock album.-MarcelloSee original visualization here: http://setosa.io/blog/2014/07/26/markov-chains/
I am an avid Pitchfork reader, it is a great way to keep up to date on new music. Pitchfork lets me know what albums to listen to and what not to waste my time. It’s definitely one source I love to go to when I need something new.One way Pitchfork distills down all the music they review and listen to is to award certain albums (an more recently tracks) as “Best New Music.” Best New Music, or BNM as I’ll start calling it, is pretty self explanatory. BNM is awarded to albums (or reissues) that are recently released, but show an explemplary effort. BNM is loosely governed by scores (lowest BNM was a 7.8), but I noticed that I would see some of the same artists pop up over the years. This got me to wondering. If an artist gets a BNM is their next album more likely to be BNM or meh?We need data. Unfortunately Pitchfork doesn’t have an API and no one has developed a good one, so that lead me to scrape all the album info. Luckily, all album reviews are listed on this page http://pitchfork.com/reviews/albums/. To get them all I simply iterated through each page and scraped all new albums. I scraped the artist name, album name, genre, main author of the review, and year released. BNM started back in 2003 so I had a natural endpoint. In order to go easy on Pitchforks servers I built in a little rest between requests (don’t get to mad Pitchfork).Now that I have the data, how should I model it? We can think of BNM and “meh” as two possible options or “states” for albums (ignoring completely scores). Markov Chains allows us to model these states and how the artists flow through them. Each pass through the chains represents a new album being released. A conventional example is weather. Imagine there are only rainy days and sunny days. If it rained yesterday there may be a stronger probability that it might rain tomorrow, however the weather could also change to sunny, but at a lower probability. Same goes for sunny days. For my model, just replace sunny days with BNM and rainy days with meh.Sunny “S” ,Rainy “R”, and the probabilities of swapping or staying the course
With all my data, I was able to calculate the overall Markov models. I took all artists that that had at least 1 BNM album, 2 albums minimum, and at least 1 album after the BNM album. This insures that these probabilities actually mean anything. I can only tell what the probability of staying BNM is if you have at least one more album after your first BNM. Once I distilled all the artists down using the above criteria getting the probabilities was easy. I simply iterated through each artists discography, classifying the “state” change between them (meh to meh, meh to BNM, BNM to BNM, BNM to meh)
Finally, with all the numbers crunched I plugged them in to the visualization at the top. NOTE: all the visualizations were NOT created by me. I simply plugged in my calculated probabilities and labels. The original visualization along with a fantastic explanation of markov chains can be found at http://setosa.io/blog/2014/07/26/markov-chains/. The visualization and all the code behind it was created by him NOT me. As I said before I only supplied the probabilities.
If you look at the size of the arrows you can tell the relative probability of each state change. As you can see BNM are pretty rare and artists don’t stay that way for long (thin arrow). What is much more common, as you probably guessed, are meh albums leading to more meh albums (thick arrow). As you can see, it is more likely that an artist will produce a meh album after BNM. What is interesting is that it is more likely to release a BNM after a BNM than it is to go from meh to BNM These conclusions seem pretty obvious, in retrospect, however since we lumped all artists together we might be missing some nuance.
Now the above metrics are for all artists, but it it probably unfair to lump in Radiohead (who churns out BNM like its nothing) to the latest EDM artist. I redid my analysis only this time further splitting all the artists by their genre. Below are the three most interesting genres.
METAL
POP/R&B
RAP
A While back I went out to dinner with a bunch of my buddies from highschool. We inevitably started talking about all the people that went to our highschool and what they were doing today. Eventually we started talking about one of our friends that has actually became instagram famous. As the night waned, one of my friends came up to me and said “You know all of his/her instagram followers are fake.” I immediately went to their account and started clicking on some of the followers. Sure enough they started to look a little fishy. However, as a data scientist I wasn’t 100% convinced. Down the rabbit hole I went.
Gun control has become a hot topic recently in the United States. Due to the increase of deaths at the ends of firearms there have been a lot of studies showing how guns flow through America. I wondered what of the larger weaponry. Items like missiles, tanks, and jet fighters Who is buying these heavy duty weaponry? Or do governments just produce their own weapons?
Campaign finances are becoming a prominent issue in today’s elections. We have candidates like Jeb Bush who are receiving record breaking amounts of donations from private citizens and private companies alike. On the other hand we have candidates like Bernie Sanders who only receives small donations from citizens. Regardless of your opinion on which end of the spectrum candidates should behave toward campaign donations, they are nevertheless an important part of US elections. When discussing campaign donations it is almost always about presidential candidates, but what about our legislators. They only time I ever heard about donations to legislators is when there is a huge scandal. Do they pull in as much money as presidential candidates? Do they receive more money from the average citizen or the average corporation? Do legislators of a certain party pull in more than another?
To achieve this I needed data on campaign donations for all the federal legislators. Luckily for me I am not the first to look for this data. There are quite a few places to go to for this information, but I wanted a place with an easy to understand API and something reliable. This led me to followthemoney.org. Here there is a very soft “API”, but nevertheless super useful and easy to parse. I took the data for all legislators from the past 5 years for any candidate that ran for either the Senate or the House of Representatives. Using their API, I exported their data in csv format. From there the preprocessing and the analysis was all preformed in python (anaconda distribution).Before we jump into the analysis we need to know a little more about campaign contributions themselves. There are federal contribution limits imposed to limit how much people (and corporations, parties, PACS, etc..) can donate. There are a few ways to get around these limits however and recent legislature that has helped to facilitate that.That’s much better. As you can see this is obviously not a population map. States like NY, NJ, MA, and CA are not top tier, but rather toward the bottom. Interestingly enough, states that have less people in them seem to have much greater donations per person, Alaska is a notable example. Why do these states get way more contributions than others? One possible explanation are that some of theses states are swing states. Swing states (like New Hampshire above) are very closely divided between the Republicans and the Democrats. These states should naturally garnish more donations as the races should be more exciting and volatile. In coarser terms, campaign money is more valuable in these states.
Before we go any further, we have to go into whose donating, lets take a look nationwide as to who is donating the most. Is it mostly large sums, or small donations?PEOPLE, PACS, AND THINGSSpeaking of small donations, who actually donates to campaigns? I personally have never, my naïve and uninformed idea of campaign donations are just giant faceless corporations throwing money at candidates. Let’s take a peek at average joes like you and me and how much they spend. Below you can see two maps of the US, one for 2012 and one for 2014. Hover over each state to see which citizen donated the most and how much they donated, the color scale lets you compare states to each other.Now what about those big faceless corporations. Here are two more maps, however these are only for the year 2014. The map on the left shows the the top Industry for that state the chart on the right shows the top ten Industries that donate the most nationwide.
Again we have what looks to be a population map. It seems like states with the most people have the highest individual donators whether from citizens or corporations. One thing that stood out to me were the biggest donors. Real estate and medical professionals we the top players in most states. Much less surprising was that Oil & Gas donated the most where, you guessed it, there is Oil & Gas.
Finally, what about groups who donate based on different ideology? Some examples of these groups are pro-Israel, Pro-Life/Pro-Choice, environmental policy as well as many others. The bar chart on the left shows a nationwide average of which ideologies get the most money. The map on the left shows the most popular ideology per state.PARTY FOULSo far we have skipped over the two most important groups in American politics, the Republicans and the Democrats. How do the parties compare? Seeing that the country is pretty divided on party allegiance I’d expect donations to each party be relatively the same. One thing I’d also expect is that third party candidates don’t pull in even the same magnitude as the two major parties.
from: Politico.com
The first graph from Politico shows which party each state voted for. The one below is which party received more donation in each state. The two maps look quite similar. Both the east and west coast mirror each other to an extent. The midwest also aligns with donations. Donations to legislators in each state may be a good predictor into where the electoral votes end up. Or, more possibly, states that were going to vote for a certain party donate to that party more.
Some states receive a lot more attention than other when it comes time for presidential elections. Currently I am only looking at federal legislator’s donations, but I wonder if they reflect presidential politics as well. Certain states I will refer to as swing states. These states are not as deeply entrenched as others. The swing states for 2014 were: Nevada, Colorado, Iowa, Wisconsin, Ohio, New Hampshire, Virginia, North Carolina, and Florida. The map below highlights states that have the closest spending between the Democrats and the Republicans.Most of the swing states have very similar donations between the two parties. Swing states like Virginia, Florida, and Nevada have very close donations totals. Virginia actually has the closest out of all of the states. On the other end, states like California, Texas, and New York have the greatest difference in donations. This makes sense as these states are deeply entrenched in one party, just look at Texas the donations are completely lopsided. There is some good news in this map. Most states are relatively close when it comes to donations to both parties.
MONEY MONEY MONEY MONEYPolitical Donations are a critical component of the United States government. Looking at the donations many of my previous assumptions were confirmed and many were discredited. However, one must have a critical eye on the data presented. The analysis is only as good as the data collected. I believe it is integral to have reliable and vetted donation data as it holds many insights. I’d like to thank followthemoney.org for their data and commitment. If you liked this analysis please check out their website and explore the data yourself! Maybe even consider donating! -Marcello [1] https://en.wikipedia.org/wiki/Citizens_United_v._FEC[2] https://en.wikipedia.org/wiki/Independent_expenditure[3] https://en.wikipedia.org/wiki/McCutcheon_v._FECI took a quick look candidate donations limited to New Jersey, now I’ve moved nation wide. Lets see if the trends that were in New Jersey were typical of the whole nation or just Jersey. I restricted the data to just 2014 to make it a little more manageable. As always lets look at Dems verse Repubs.
Got two more for you this week. One on Machine Learning and the other on multivariate. Check them out.
Supervised Machine Learning: A Review of Classification TechniquesLast part we took a look at campaign donations to New Jersey State legislatures. Now we are moving on up to the US House and Senate. The stakes are a little higher, the politicians have more power, and hopefully full of campaign donations. Luckily for me we have Followthemoney.org on our side.
All data collected for the following graphs was using followthemoney.org’s API. This made it easy to tabulate and graph all the recorded donations. First up is Democrats Vs Republicans.Individual | $ 14,760,750.00 |
Non-Individual | $ 1,412,439.00 |
Grand Total | $ 16,173,189.00 |
Overwhelmingly the donations stemmed from Individuals. That is super surprising for me. There’s a lot more visualizations I can do with this data, but before that, we have to go nationwide.
-Marcellofind the data here:NJfedDon