Data Manipulation Guide

This page provides some basic instructions on how to manipulate different data formats.

Running SQL queries on .csv format

This will be very similar to Assignment 2 SQL set up code

Converting from .xlsx to .csv format

Converting from .csv to SQLite .db format

Doing this is a bit more involved than converting .xlsx files to .csv files, but you're going to be learning some pretty handy skills. Or you may decide that you don't really want to deal with this (although we highly recommend going through the exercise). In that case, there are online CSV to SQLite database converters, like this one.

Table Schema

To convert .csv files to a format consumable by SQLite, we need to first create the schema of the table. The schema basically contains information about the table name, the field names and field types inside the table. To visualize this, let's take the Football dataset that we've been using.

First, you have to have some understanding of the data. Recall that the Football dataset has 7 columns: Year, Week, Home, HomeScore, Away, AwayScore and Prediction. Back in spreadsheet days, we used to be able to just load the CSV in Google Sheets or Microsoft Excel and have them figure out what the data types of each columns are. Unfortunately, that's not the case anymore.

The schema of the Football table is written as follows:


    CREATE TABLE Football(year INT, week INT, home TEXT, homescore INT, away TEXT, awayscore INT, prediction REAL);
    

The CREATE TABLE syntax, and where to write the table name, field names and field types should hopefully be self-explanatory.

SQLite Prompt

Now that we undertand our table schema, how do we create that neat Football.db file that we have used in the Jupyter notebooks? Using our running example of using Football.csv, follow the instructions below.

  1. Download the Football.csv file into some directory (i.e. Downloads).
  2. Open Terminal if you're on Mac OS X, or Command Prompt if you're on Windows.
  3. Navigate to the directory where you downloaded the .csv file. Assuming you've downloaded it in your Downloads folder, for Macs, you'd typically write cd ~/Downloads, and for Windows, you'd write cd Downloads.
  4. Type sqlite3 Football.db, and you will be presented with the SQLite prompt. It will look something like this:
    
        SQLite version 3.8.10.2 2015-05-20 18:17:19
        Enter ".help" for usage hints.
        sqlite>
            
    If you're presented with errors that say the command cannot be found, then you might need to add sqlite3 to your PATH. This might happen if you use Windows. You can read this page to understand how to add it to the PATH, or you can go to office hours and we'll help you get set up.
  5. Type (or copy-paste without the "sqlite>" part) the following:
    
        sqlite> CREATE TABLE Football(year INT, week INT, home TEXT, homescore INT, away TEXT, awayscore INT, prediction REAL);
        sqlite> .separator ","
        sqlite> .import Football.csv Football
          
  6. Now, if you've remembered, the Football.csv file has a header containing the field names, but we don't want that in our table. Alas, it's already been loaded inside the table. When you try to query the table, it awkwardly shows the table:
    
        sqlite> .header on
        sqlite> .mode column
        sqlite> SELECT * FROM Football LIMIT 3;
        year        week        home        homescore   away        awayscore   prediction
        ----------  ----------  ----------  ----------  ----------  ----------  ----------
        Year        Week        Home        HomeScore   Away        AwayScore   Prediction
        1998        1           Green_Bay   38          Detroit     19          9.5
        1998        1           Chicago     23          Jacksonvil  24          -8.5
          
  7. The solution is simply to delete the first row, or remove the first line of the .csv before importing it in. Here, we go with the first solution.
    
        sqlite> DELETE FROM Football WHERE year = 'Year';
        sqlite> SELECT * FROM Football LIMIT 3;
        year        week        home        homescore   away        awayscore   prediction
        ----------  ----------  ----------  ----------  ----------  ----------  ----------
        1998        1           Green_Bay   38          Detroit     19          9.5
        1998        1           Chicago     23          Jacksonvil  24          -8.5
        1998        1           Minnesota   31          Tampa Bay   7           3.5
          
  8. And now, your database should be good. You can use the .exit command to exit the session (there's a dot before the word "exit", like the other commands). The Football.db file should have been generated at the same directory, and you can use the database like you've used it for Assignment 2.

The keen reader will have some questions on the instructions above - Feel free to ask them on Piazza!