A2-ZachMaurer
Contents
- 1 Original Intentions
- 2 Entry #1 - Examining Connectivity Growth
- 3 Entry #2 - Where is Connectivity Lagging?
- 4 Entry #3 - How frequently do these events occur? (Or are recorded?)
- 5 Entry #4 - What are the most fatal types of events?
- 6 Entry #5 - How are phone and internet access related to protests?
- 7 Entry #6 - Refining the Final Visualization
- 8 Entry #7 - Final Visualization: Have protests increased with internet and mobile access?
Original Intentions
I'm interested in humanitarian issues and I wanted to work with a dataset that was directly involved with this topic. I also knew that I wanted to learn about different tools for joining/manipulating datasets, so I started to think about different interesting combinations of ideas out there. After browsing data.hdx.org (repository for lots of UN, etc. datasets) for a while I happened on an ACLED for political violence in African countries. I decided to combine this with Mobile Phone Subscription data and Internet Use data from the World Bank for my assignment.
Based on these 3 datasets:
- ACLED Conflict Data (1997-2015)
- Mobile Phone Subscribers Per 100 People (1960-2014)
- Internet Users Per 100 People (1960-2013)
I set out to investigate the following questions:
- Have protests increased with internet and mobile access?
- Who are the primary perpetrators of violent acts?
- Has religion played an increasing role at different times?
- Has religion changed in its prevalence across Africa?
- Have incidences become more of less fatal?
Entry #1 - Examining Connectivity Growth
First, I decided to plot mobile phone and internet connectivity for each year. (Color represents different countries, however, I didn't provide a legend because it's 54 entries long.)
At this step, I wanted to test my assumptions that phone and internet prevalence was increasing over time. Furthermore, I wanted to get a sense for how uniform and how rate of change was across different countries.
Some specific steps that I went through:
- Considered a number of different ways to manipulate the data, including some of the methods mentioned in class.
- Ended up choosing to learn a new set of tools: iPython Notebooks, pandas and Python3
- Really impressed by how quickly I was able to start using these different tools.
- Had to do some initial country name manipulation and matching to filter phone and internet datasets by African countries.
- Checked for any mismatches between labels and manually added to the list (e.g. “CONGO” vs. “CONGO, REPUBLIC OF”)
- Limited the date ranges with pandas.
- Very hard time figuring out how to display on tableau, eventually placing pills labelled "Measure Names/Values" on the left axes caused the lines to be overlaid on each other.
- Noticed that mobile phone subscriptions per person exceeded 2 in some cases. This is not necessarily an error in the data, since in many developing countries individuals own multiple different phones for different uses.
Entry #2 - Where is Connectivity Lagging?
After producing the previous two visualizations, I noticed that there was a sort of doubly-clustered growth pattern. The countries were very roughly split in half, one group was growing much faster than the other in terms of cell phone and internet connectivity.
I thought that this might be valuable information for later investigations, so I mapped the bottom 50% of the internet and phone connectivity datasets.
This required me to export two lists of countries from the phone and internet datasets from Tableau to csv. From there, I loaded up the csv into an iPython Notebook and wrote a python script to do a short set difference comparison on the lists.
Some observations:
- Not surprising that central african countries have the slowest growth in connectivity, given development conditions relative to other african countries.
- Interesting to note how internet growth was slow in the north but not the south and vice versa for phone growth.
Entry #3 - How frequently do these events occur? (Or are recorded?)
To answer this question, I had to group the data by year and by type of political event.
- In the graph below, it was interesting to note how "protests and riots" generally followed a similar trend as "violence against civilians" and "battles", but specifically how the number of protests recently has far eclipsed the number of events in the past couple years.
Entry #4 - What are the most fatal types of events?
At this point, I wanted to get an overall sense of how violence was distributed across different countries and what were the causes for political violence in the aggregate. In pandas, I grouped each years ACLED data by event type and created annual totals for number of events that occurred and how many fatalities were caused by each type of event.
This process gave me the following five graphs.
- The first two are graphs of the data transformation described above. In the second graph, I've removed the fatalities data-points that likely corresponded to the Angola Civil War and Eritrea-Ethiopia War in '98-99 to get a better sense of the trends in the bulk of the data. It was astonishing to see the huge number of fatalities as a result of these events, relative to other historical records.
- The third graph was experimenting with a different representation of similar data.
- The fourth graph is a small multiple variation of the first two, where I re-organized the data in terms of Country. Although this graph was too large to view here (I've only included a crop), it was interesting to scan through and see how many countries experienced spikes at different times or had gradually increasing trends.
I iterated on a number of different "big" questions for my final visualization for this assignment. However, partially due to the amount of time I spent learning how to use pandas, I ended up having to rule a number of them out because there wasn't enough time for me to do certain types of analysis. For example, I had thought I might be able to manually classify different actors involved in the ACLED conflict data based on religious categories to see how religious-poltical violence has changed over time. Unfortunately, there were close to 3000 unique groups, which made that task infeasible.
So, since my data was mostly in the proper format for answering questions related to phone/internet connectivity and protests, I started to think about the best way of displaying that information.
- Based on the previous graphs, I knew that displaying 50+ countries on a single graph was not particularly effective. However, focusing in on a single country seemed to ignore the scale of the dataset that I was working with. So, I decided to treat the entire dataset as a set of tuples mapping number of protests in a given year to internet/phone access. This choice started to give me some sort of relationship which I believed I could work on clarifying and displaying effectively.
- To produce this graph, I had to (1) pivot the original tables into a "stacked" form in pandas, (2) double-check that all country names matched, (3) linearly interpolate for any missing phone/internet access values and then (4) manually copy and past the values together in Excel so Tableau would digest my data properly.
- The next two graphs were experiments looking for any clustering around countries or year. I realized that a multi-hue color scheme would not work well for encoding aspects of this data because its too jumbled.
- From the second graph, I realized that the Arab Spring protests in Egypt were a massive outlier in terms of frequency. Certainly, an interesting and relevant point for this overall topic, but since I was more concerned about overall trends related to phone and internet access, this seemed like an outlier data point.
Entry #6 - Refining the Final Visualization
The two smaller thumbnails below are iterations on my final visualization. The main things that I learned at this point was:
- A color gradient could be used effectively to connote the passing of time (i.e. the "year" value in my data). This helps communicate the increasing trend of protests and connectivity over time.
- I tried sampling just the top 10 protesting countries to see if that reduced the noise near the origin. However, I came to the conclusion that the datapoints filter out too many values. Instead, I decided to just use an exponential trend line to reflect the relationship between the two. Hopefully, the less steep than expected slope of the trend line would indicate that there are a number of values densely clustered near the origin or x-axis.
Entry #7 - Final Visualization: Have protests increased with internet and mobile access?
It is clear that the annual frequency of protests in African countries is increasing with internet and mobile phone access over the past 18 years. Although the trend line plots a more conservative than expected rate of change due to the concentration of data points near the origin and x-axis, it is clear that there are a number of recent instances where greater phone and internet access have accompanied greater annual protest frequencies. This conservative rate of change may suggest that internet and mobile phone access are not strong causal factors for increasing numbers of protests. Instead, they may only increase the scale and momentum that protest movements develop over time. This hypothesis could be supported by the presence of extreme outlier data points representing the 2013-2014 Arab Spring protests in Egypt and the housing protests in South Africa (excluded from view to show detail in bulk of data) some of which had almost five times the number of protests in one year (~1800) than the top data-point displayed on this graph. To further understand the role of phone and internet connectivity in protest activity in African countries over time, it would be valuable to supplement this analysis with datasets estimating the participation rate for protests over time. I assumed that an exponential trend line would be appropriate due to the exponential connections that a phone/internet medium allows between individuals in a network. However, it would be important to investigate how the size of protests has changed over time in comparison to phone and internet access to understand this relationship in more detail.