A2-ErnestoRamirez
he first decided to look for datasets based on things that I was interested (Reddit,Movies, YouTube, Facebook for example). As I searched for hours, I wasn't really able to find great datasets, at least ones that were already in CSV. So then I decided I would try and convert the data formats to one that Tableau could use.
I found this facebook dataset[1]. It stored it's data in Matlab files. However because they were in sparce matricies, I couldn't export them to CSV via Matlabs default function so I abandoned that.
I found this Youtube dataset[2]. It contained very few fields for me to really look into so I decided not procceed with the dataset.
I settled looking at the NBA data from DraftExpress.com[3]. The data took a bit of wrangling using Data Wrangler[4], Microsoft's Notepad, and Excel. My process for getting my data into a CSV started with scraping information on the website and putting it to Data Wrangler. I would the copy the CSV format Data and save it in a texted editor as a CSV file. I would repeat this process for multiple pages. Then combined my information using it in Excel. The combined data was used in Tableau. The Final Dataset had the following fields for the Team average for a season as well as for the top 100 Players season average (Players also had a name field) Team Name, Games Played, Points, Field Goals, Field Goal Attempts, Field Goal %,2 Points, 2 Points Attempts, 2 Point %, 3 Points, 3 Points Attempts, 3 Point %, Free Throws Made, Free Throws Attempted, Free Throw %,Offensive Rebounds, Defensive Rebounds, Total Rebounds, Assists, Steals, Blocks, Turn Overs, and Personal Fouls. My initial question is "Are any two field correlated when looking at a teams season average statics?" I chose to use 2 Point% and 3 Point % instead of the average amount made in efforts to normalize these variables. To try and answer this question, I decided to use a scatter plot matrix to try to quickly narrow my search. If any subplot seemed like it had a correlation, I would do perform a linear regression test to see if they were.
The graphs that stood out to have a possible correlation for me were: Wins & Field Goal %, Wins & 3 Point %,Field Goal % & 3 Point %, Field Goal % & Assists. I performed a linear regression on each of these plots using Tableau and got the following r^2 values.
Wins & Field Goal %: 0.428
Wins & 3 Point %: 0.421
Field Goal % & 3 Point %: 0.361
Field Goal % & Assists: 0.439
It turned out that none of these plots had a very high correlation. Because the Golden State Warrior's Stephen Curry broke the record for Most 3 Pointers in a Season, I decided to rephrase my question to be: "Are any team statistics correlated to a teams Average Number of 3 Points made?" I used a small multiples plot to try and explore this.
The graphs that stood out to have a possible correlation for me were: 3 Points Made & Average Points Scored, 3 Points Made & Average # of Assists, 3 Points Made & 3 Point %, 3 Points Made & # of Wins. I performed a linear regression on each of these plots using Tableau and got the following r^2 values.
3 Points Made & Average Points Scored: 0.348
3 Points Made & Average # of Assists: 0.097
3 Points Made & 3 Point %: 0.159
3 Points Made & # of Wins: 0.285
However, none these plots also had a significantly high correlation. I figured looking at an entire team statistics would obscure a player's individual performance so I decided to look specifically at the 100 player's of the NBA.
Final Visualization
So I rephrased my question to be:"Are their correlations that help explain how Stephen Curry made so many 3 pointers in the 2015-16 regular season?" To explore this, I made a small multiples plot again. Caption: Small Multiples Plot for the Top 100 Player's of the NBA based on Average Points Scored.
The plot uses the Top 100 Player's in the NBA (based on average points per game) as data points. In the plot, the correlation between the 3 Point Shots attempted and Average 3 Point Shots made Per Game stands out. The r^2 value for this plot is 0.92 showing a high correlation between the two factors. When thinking about it, this correlation is somewhat to be expected as the more shots taken, the more are shots are likely to be made. Something that this graph seems to point out is that other NBA player scan also make a similar number of 3 Point Shots to Stephen Curry by taking more shots. While the plot does answer the final question posed, it does not discredit Stephan Curry for breaking the record this season as there many factors that play into a player being free to take a 3 Point shot.