A2-ErnestoRamirez

From cs448b-wiki
Jump to: navigation, search

he first decided to look for datasets based on things that I was interested (Reddit,Movies, YouTube, Facebook for example). As I searched for hours, I wasn't really able to find great datasets, at least ones that were already in CSV. So then I decided I would try and convert the data formats to one that Tableau could use.

I found this facebook dataset[1]. It stored it's data in Matlab files. However because they were in sparce matricies, I couldn't export them to CSV via Matlabs default function so I abandoned that.

I found this Youtube dataset[2]. It contained very few fields for me to really look into so I decided not procceed with the dataset.

I settled looking at the NBA data from DraftExpress.com[3]. The data took a bit of wrangling using Data Wrangler[4], Microsoft's Notepad, and Excel. My process for getting my data into a CSV started with scraping information on the website and putting it to Data Wrangler. I would the copy the CSV format Data and save it in a texted editor as a CSV file. I would repeat this process for multiple pages. Then combined my information using it in Excel. The combined data was used in Tableau. The Final Dataset had the following fields for the Team average for a season as well as for the top 100 Players season average (Players also had a name field) Team Name, Games Played, Points, Field Goals, Field Goal Attempts, Field Goal %,2 Points, 2 Points Attempts, 2 Point %, 3 Points, 3 Points Attempts, 3 Point %, Free Throws Made, Free Throws Attempted, Free Throw %,Offensive Rebounds, Defensive Rebounds, Total Rebounds, Assists, Steals, Blocks, Turn Overs, and Personal Fouls. My initial question is "Are any two field correlated when looking at a teams season average statics?" I chose to use 2 Point% and 3 Point % instead of the average amount made in efforts to normalize these variables. To try and answer this question, I decided to use a scatter plot matrix to try to quickly narrow my search. If any subplot seemed like it had a correlation, I would do perform a linear regression test to see if they were.

Ernesto Ramirez Exploritory SPM.png

The graphs that stood out to have a possible correlation for me were: Wins & Field Goal %, Wins & 3 Point %,Field Goal % & 3 Point %, Field Goal % & Assists. I performed a linear regression on each of these plots using Tableau and got the following r^2 values.

Wins & Field Goal %: 0.428
Wins & 3 Point %: 0.421
Field Goal % & 3 Point %: 0.361
Field Goal % & Assists: 0.439

It turned out that none of these plots had a very high correlation. Because the Golden State Warrior's Stephen Curry broke the record for Most 3 Pointers in a Season, I decided to rephrase my question to be: "Are any team statistics correlated to a teams Average Number of 3 Points made?" I used a small multiples plot to try and explore this.

Ernesto Ramirez 3 Point Small Multiple.png

The graphs that stood out to have a possible correlation for me were: 3 Points Made & Average Points Scored, 3 Points Made & Average # of Assists, 3 Points Made & 3 Point %, 3 Points Made & # of Wins. I performed a linear regression on each of these plots using Tableau and got the following r^2 values.

3 Points Made & Average Points Scored: 0.348
3 Points Made & Average # of Assists: 0.097
3 Points Made & 3 Point %: 0.159
3 Points Made & # of Wins: 0.285

However, none these plots also had a significantly high correlation. I figured looking at an entire team statistics would obscure a player's individual performance so I decided to look specifically at the 100 player's of the NBA.


Final Visualization

So I rephrased my question to be:"Are their correlations that help explain how Stephen Curry made so many 3 pointers in the 2015-16 regular season?" To explore this, I made a small multiples plot again. Ernesto Ramirez Player 3PT small multipls.png Caption: Small Multiples Plot for the Top 100 Player's of the NBA based on Average Points Scored.

The plot uses the Top 100 Player's in the NBA (based on average points per game) as data points. In the plot, the correlation between the 3 Point Shots attempted and Average 3 Point Shots made Per Game stands out. The r^2 value for this plot is 0.92 showing a high correlation between the two factors. When thinking about it, this correlation is somewhat to be expected as the more shots taken, the more are shots are likely to be made. Something that this graph seems to point out is that other NBA player scan also make a similar number of 3 Point Shots to Stephen Curry by taking more shots. While the plot does answer the final question posed, it does not discredit Stephan Curry for breaking the record this season as there many factors that play into a player being free to take a 3 Point shot.