A2-EmmaTownleySmith

From cs448b-wiki
Jump to: navigation, search

Choosing a Domain

I decided to start with Higher Education (specifically undergraduate education), since there are a lot of potentially interesting questions there that could be answered by a large data set in this area. Since every legitimate higher ed institution is required to keep records of their enrollment and student data, I thought that data sets in this area would be less likely to need extensive cleaning and altering.

Initial Question(s)

Some initial questions I had:

  • How has the cost of undergraduate education changed over time (inflation-adjusted)?
  • How has the rise in the cost of undergrad education affected its accessibility? (could potentially be viewed through the lens of median US household income)
  • How has the distribution of bachelor’s degree majors changed over time?
  • Related -- how have the jobs that people take out of undergrad changed over time?

Hunting for Data

The first place I went to look around was the National Center for Education Statistics most recent data tables, to see if any of their data sets were a good match for some of my initial questions.

I started with the XLS file for "Bachelor's degrees conferred by postsecondary institutions, by field of study: Selected years, 1970-71 through 2013-14." This data set is well-formatted and does a great job of answering my question about the distribution of degrees over time, so I decided to make my first visualization with Tableau.

Viz 1: National Distribution of Majors Over Time

I turned on the Data Interpreter (magic!) and set columns to “Field of Study” and rows to “Measure Values” -- the selected years available from data. I ended up with this “small multiples” visualization of how each of the thirty-two listed majors has changed in popularity from 1970-2014. (Graphs Photoshop’d together by me from the initial Tableau view shown below.)

A2 tableau view.jpg
A2 viz1.jpg

It’s hard to see what’s going on here at this size… but what is evident is that some degrees have an interesting distribution history, and others, less so. (Degree titles added by me in Photoshop, for clarity.)

A2 change comparison.jpg

This helped me formulate my next question: what bachelor’s degree majors have changed the MOST in popularity over time, and how?

Viz 2: Percent Change in National Major Distribution Over Time

For this, I turned back to the data in Excel, which had the raw numbers of degrees offered each set of recorded years.

A2 raw data sample.jpg

I calculated the percentage change between 1970 and 2014 for each degree using the following formula: ((Degrees given in 2014 - Degrees given in 1970) / Degrees given in 2014) * 100

All of the degrees had changed in popularity somewhat, so I didn’t get a lot of information out of the column of Excel numbers. I decided to make a quick scatter plot to see if that would help me parse…

A2 column sample.jpg
A2 excel scatter.jpg

But that wasn’t telling me much, other than showing me the one obvious outlier of the group. (For the curious, it was Library Science that had an almost 700% change, from more than 1000 degrees/year in 1970 to fewer than 200 degrees/year in 2014.) I plugged my Excel sheet back into Tableau for some help…

A2 tableau bar.jpg

Much better! I found the bar chart format that Tableau picked for me a lot easier to read, so I could start picking out some groups. I used a calculated field to look at which majors became more popular (green) vs. less popular (red):

A2 bars redgreen.jpg

I played with some of Tableau’s other viz forms too, but it seemed like the bar chart was most effective for showing what I was interested in… the difference in the percent changes.

A2 circletest.jpg

These results are not too surprising. Given advances in technology, it makes sense to me that Library Science is disappearing and Computer Science is seeing an increase in popularity. What surprised me was the relatively even distribution of “increasing popularity” among Computer Science, Fitness Studies, Homeland Security, Communication... At this point, I realized that I had another question.

Are degrees changing in popularity at Stanford differently from how they change in popularity nationally?

Viz 3: Is Stanford different from National?

Unfortunately, major data over time for Stanford is not readily available in any nice XLS/CSV or other such format. The only place to access it currently is through this visualization (http://stanfordvisualized.soraven.com/).

Stumped, I looked around for comparable data to see if I could answer a similar question. Are degrees changing in popularity in Ivy League universities different from how they change in popularity nationally? But alas, this dreamed-of dataset of Ivy League majors over time did not exist.

I decided to try and meaningfully narrow the scope of the question. What majors exist in both the Stanford dataset and the national dataset that were some of the most popular in either 1970 or 2014?

Manually comparing the two lists of majors, Stanford and nationals share:

  • Computer and information sciences
  • Biology
  • Psychology
  • English
  • History

I manually gathered the 1970 and 2014 data for these few majors from the stanfordvisualized data above, and threw it into Tableau to create this chart to try and answer our earlier question about Stanford:

A2 stanford diff.jpg

This seemed to confirm my suspicions -- Stanford Computer Science growing on par with national trends, but other more historically popular majors suffering losses.

However, this visualization seemed a little extreme or misleading. There’s some variation year-to-year in the number of different Stanford majors, and the percentage change wasn’t exactly showing the “trend” I wanted to see. It was just making a statement: these majors have become much less popular since 1970.

Viz 4: Is Stanford different from National? (Common Data Set)

After a little more searching around, I found Stanford’s Common Data Set, which includes information about the distribution of majors graduated each year. Ah-ha! These categories more or less line up with national data major categories from 1999 onward. I wouldn't be able to compare the distribution all the way back to the '70s, but I felt 10+ years of data was still a worthy comparison to make.

For the national data set, I calculated the percentage for each major category so it would be directly comparable to the Stanford Common Data Set. I manually collected the Stanford data from the Common Data Set HTML pages, and added it to my national excel sheet for comparison. I plugged into Tableau, and got this:

A2 first stanfordnatl graph.jpg

Too much information… and hard to see the trends. I decided to eliminate the majors that represented less than 5% of graduates to make it easier to see trends on recognizable majors like English, Social Sciences/History, and Computer Science.

Viz 5: Getting to Simplicitly

[Alas, forgot to screenshot the '< 5% majors eliminated' graph. It wasn't that interesting anyway.]

Still a little too complex… what I really wanted to see was trends over time, and how Stanford did (or didn’t) diverge from national major trends. I went back to paper and did some thumbnail sketches of different possible ways to show my final visualization:

Majors experiments.jpg

I settled on the one in the bottom right -- a small multiples visualization of change over time by major. This one made the most sense because the major was the focal point I was really interested in… not the composition of majors in any given year.

I made one for Computer Science to test the idea, and thought it was telling an interesting story:

A2 first CS graph.jpg

So I made the small multiples into my final graph (Tableau + Photoshop). (With a fun final question: Could going to Stanford influence what you choose to study?)

I was personally surprised by the results. Inundated with School of Engineering information ever since I arrived at Stanford, I would have assumed that Stanford Humanities and Social Sciences graduate numbers would be lower than the national average. It turns out that Stanford graduates many more ‘Social Sciences or History’ majors than expected by the national stats!

Final Viz

Final test.jpg

Caption: Popularity of Select Majors in the U.S. and at Stanford University (2001-2014)

Description: Stanford, in the heart of Silicon Valley, has always been known as a technology and entrepreneurship hotspot. How does this come across in what Stanford students choose to study? We matched Stanford’s Common Data Set information about majors to national trends in the last decade. A chi-squared test revealed that there’s a statistically significant difference (p < 0.005) between the percentage of graduates from Stanford and the percentage of National graduates for Social Sciences & History majors, and for Computer Science majors. For Biological/Life Sciences and English, there is no statistically significant difference. Of course, this is only part of the story -- students may gravitate upon admission to certain popular majors like Computer Science because of the Silicon Valley context. Further study of incoming undergraduates 'intent to study' vs. graduating degree would be needed to investigate.