Huge success! My visualization went viral. What NFL or CFL team do you want to be a fan of?

My project that attracted definitely the most visitors, made for a huge betting company as a subcontractor. The idea (not mine!) was, I must say, brilliant and simple: gather some demographic and econometric data about the cities that have one or more teams in the NFL (American Football league) and the CFL (Canadian Football League), mash that with some game data and push it all in a sortable table.

This is the final result so you can immediately see what we are talking about.

The Data

I was given a csv file containing a bunch of rows pertaining to the cities that have a team in the NFL/CFL league and the various metrics like:

  • traffic index (the higher the number the worst the traffic situation, so to speak)
  • CO2 emissions
  • price of a McDonalds meal (lower the better)
  • Average precipitation
  • Ticket price
  • Number of touchdowns (the more the better) and so on…

I used pandas and jupyter for data cleaning and the pretty basic processing. The notebook itself isn’t particularly interesting and I might have left it hanging on my github repo, but I will not show it here. There are just a couple of key takeaways from this part of the work.

The first job was to make a new column for every column and calculate the rank for each row inside the column. This was pretty easy using the Pandas built-in rank function:

nfl_data['touchdowns_rank'] = nfl_data['Touchdowns for'].rank(method='first', ascending = False)

This was repeated for every other column, making sure to pass ascending as True or False accordingly - for some “features” bigger is better, for others it’s the opposite.

The second step was making a total rank column - a single column containing the sum of all the calculated ranks:

nfl_ranks['sum_of_ranks'] = 
    nfl_ranks['touchdowns_rank'] + 
    nfl_ranks['ticket_rank'] + 
    nfl_ranks['precipitation_rank'] + 
    nfl_ranks['beer_rank'] + 
    nfl_ranks['mcd_rank'] + 
    nfl_ranks['pollution_rank']

After ranking the teams according to this sum of ranks column, I obtained the total rank of the teams: which was better overall. Then I repeated the same for the Canadian Football teams (CFL) and for the whole dataset, NFL and CFL combined.

Finally, I was left was left with three datasets: all_teams.csv, nfl.csv and CFL.csv containing only the ranks for each category and for the total (overall). This phase was over, I did some testing and moved on to the second.

I am intentionally omitting some less glamorous parts of the work: the processing of the team logos, manually fixing the city names etc.

The Visualization and the CSS/HTML

The data visualization in this case is very simple - order and color are used to convey quality (or rank) for different teams according to different categories. The columns containing the category names are clickable elements that, once clicked, sorted the data according to that feature. I must admit that we had a back and forth about the possibility of including an animation, but since the other requirement was to make the table responsive and remove a couple of less important columns for small devices, we ended up without sorting animations, leaving just the color and the position. What was interesting is the fact that I was in charge of the whole html and css design and the deadline was very tight. I received some very good looking sketches from the company’s designer, a great color palette and started dabbing with D3.js.

As usual, I will explain just the interesting bits. For making the color scheme D3.js compliant, I made a color scale from 3 (given) colors:

function makeColorScale(num_teams) {
  let middle = Math.round(num_teams/2);
  console.log("MAKING COLOR SCALE WITH:", num_teams, middle);
  return d3.scale
    .linear()
    .domain([num_teams, middle, 1])
    .range(["#b8327d", "#f4f4f4", "#689736"]);
}

For binding the sort events to the column headers:

columns = [
  "beer_rank",
  "touchdowns_rank",
  "total_rank",
  "mcd_rank",
  "pollution_rank",
  "precipitation_rank",
  "ticket_rank"
];

for (col of columns) {
  sortCol(col);
}

function sortCol(id) {
  let div_id = "#" + id;

  selectedColumn = d3.select(div_id);
  selectedColumn.on("click", function() {
    change_state(id);
    parse(sort_key, reverse, league);
    $(".arrow_up").removeClass("arrow_up");
    $(".arrow_down").removeClass("arrow_down");

    if (reverse == true) {
      $(this)
        .children("span")
        .addClass("arrow_up");
    } else {
      $(this)
        .children("span")
        .addClass("arrow_down");
    }
  });
}

Finally, the page has a dropdown selector that enables the user to switch datasets: NFL, CFL or all of them. Had I had a little more time, I would have implemented a filter and a single dataset, but timing was imperative, so I made this simple not elegant switch - you can see it in the source code.

Takeaways

The combination of pandas/jupyter for data processing and D3.js for web visualization is a killer. There is at least one book entirely dedicated to this stack and there are countless examples of use on the web. I managed to experience at least a part of this power, although my back-end here is static whereas the true power of python + D3.js lies in the interactivity of dynamic data. Scott Murray’s book about D3.js, a couple of tutorials from The Net Ninja and some basic Flask/Django knowledge could give you a great head start for making interactive data driven web applications in no time.

As always, this project could have been handled differently, but not much. The idea was straightforward and simple enough and it brought so much attention and backlinks that I couldn’t gather them all when it came out.

URL: https://freethrow.github.io/NFL-sorter/index.html


Pictures