Thanks to Kaggle and encyclopedia-titanica for the dataset.
This is the last question of Problem set 5. In this problem you will use real data from the Titanic to calculate conditional probabilities and expectations.
On April 15, 1912, the largest passenger liner ever made collided with an iceberg during her maiden voyage. When the Titanic sank it killed 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships. One of the reasons that the shipwreck resulted in such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others.
The titanic.csv file contains data for 887 of the real Titanic passengers. Each row represents one person. The columns describe different attributes about the person including whether they survived ($S$), their age ($A$), their passenger-class ($C$), their sex ($G$) and the fare they paid ($X$).
[Quetion12] Write a program in C, C++, Java or Python that reads the data file and finds the answers to the following questions:
You only have to submit your answers, not your program. As such you could get away with calculating these statistics by hand. Use a program. This is a warm up to problem set 6 where you will write machine learning algorithms (in C, C++, Java or Python) that read data and perform more advanced calculations.
Aside: In making this problem I learned that there were somewhere between 80 and 153 passengers from present day Lebanon (then Ottoman Empire) on the Titanic. That would be 7% of the people aboard.
See if you can find something suprising in the dataset. Can you predict p? Can you find interesting correlations?