Data Update notes - 3/16/11

The new data is now in the schema named 'mimir'. The 'minerva' schema will be renamed 'mimir_pre_1_2011'.

Updated Datasets
The following tables have have been updated to contain more recent data:
courtesy_appointments, dissertations, dissertation_members, pta/pta_roles, spo_projects/spo_project_segments/spo_roles, student_programs/students, and woods and biox (combined into new centers_membership table)

Note that we are still waiting for an update to the FSI tables. Also note that there will likely be an additional round of person_id cleanup in the next few weeks, focusing mainly on splitting incorrectly merged person entities (this is just cleanup though and will not changed most ids).

Person Entity Improvements
In the previous update, a less strict procedure for normalizing names and resolving person entities was followed, which resulted in a much larger number of person_ids than it should have. We've removed these person_ids and re-matched all those names - in the previous update, there were 70,281 unique names and 56,521 person_ids, whereas now there are 37,368 unique names and 28,955 person_ids.

The process for performing the name resolution has been improved, and additional tools have been added to correct cases where two person_ids need to be merged into one, and where a single person_id needs to be split into separate entities. However, this means that some person_ids have been reassigned or changed. Any table that involves person_ids should be regenerated or rechecked against the people table to make sure the person_ids still exist.

Also, records that contain bogus or empty person name fields have been assigned a person_id of 0. You likely want to exclude these from any queries you write.

Database Structure Changes
There have been a number of structural changes to the database in order to be more consistent and more reliable in terms of maintaining person_ids.

    removed biox, woods, fsi_all, fsi_aparc, fsi_cddrl, fsi_chp_pcor, fsi_cisac, fsi_iis tables
  • merged these into single centers_membership table with same structure as old tables with additional 'center' column containing original table name
    dissertation_members -
  • removed student_person_id
  • removed __student_name
  • added 'Advisor LDesc'
    removed table faculty
    faculty_courses -
  • added employee_id
    faculty_years
  • added employee_id
  • added birth_date
  • added birth_year
    govcomm_stanford
  • changed financial_year to fiscal_year
  • added __name
    postdoc_advisors
  • added postdoc_id
    postdoc_degrees
  • changed id to postdoc_degree_id
  • added postdoc_id
    pta
  • removed __award_owner_name
  • removed __award_owner_emplid
  • removed __award_owner_person_id
  • removed __award_manager_name
  • removed __award_manager_person_id
  • removed __project_owner_name
  • removed __project_owner_emplid
  • removed __project_owner_person_id
  • removed __project_manager_name
  • removed __project_manager_person_id
  • removed __task_owner_name
  • removed __task_owner_emplid
  • removed __task_owner_person_id
  • removed __task_manager_name
  • removed __task_manager_person_id
  • (these were all moved into pta_roles
    added pta_roles table
  • links to pta on pta_id or project_id, award_id, task_id
  • person_id
  • role
    removed table spo
  • (should have been removed last round)
    spo_oracle
  • changed id to spo_oracle_id
    spo_oracle_employee
  • change id to spo_oracle_employee_id
  • changed name to __name
    spo_projects
  • change __school to __school_name
  • changed __department to __department_name
    spo_sufin
  • changed sufin_id to spo_sufin_id
  • changed __dept_name to __department_name
  • changed spo_project_id to spo_number
    spo_sufin_employee
  • changed id to spo_sufin_employee_id
  • changed spo_project_id to spo_number
  • changed name to __name
    student_programs
  • added student_id (links to students)
  • removed person_id
    students
  • changed name to __student_name
    sufin
  • changed __primary_investigtor_name to __primary_investigator_name