Your genome is 3 billion letters, driving 3 trillion cells, for 3 billion seconds. Why this computational analysis and not that? What did I just find? Who cares? Unidentified caller from Stockholm at 3 in the morning?
We will introduce you to various aspects of genomic data such as what it looks like, how to get it, and what are some of the most (and less) interesting things you could do with it.
Human genome parts list, COVID-19 genome parts list, Genome sequencing technologies, and a taste of the three main forces of life, neutral, negative & positive selection, via, respectively: Population genomics & paternity testing; Medical AI (disease) genomics (where you could really help really sick kids from your keyboard); and Comparative (evolutionary) genomics (bats, cats, rats, gnats, SARS-CoV-2). And maybe a dash of cryptogenomics and genomic privacy.
Get a taste of Machine Learning, Natural Language Processing, Cryptography and even Genomics in the service of humanity.
Background in Biology, ML or NLP purely optional. See class Explore page for more details.
All course materials will be available via this website and Piazza, not Canvas.
CS106 or equivalent (aka, some programming experience in any language)
Example: read string from a file, count some patterns in it, print counts (refer to tutorials from previous offerings; linked below).
This course is cross-listed as DBIO273A and BIOMEDIN273A. Write to Gill if you want to help get it cross-listed elsewhere.
Mondays and Wednesdays 11:30AM-12:50PM.
The course will be taught entirely online.
Link for Zoom
No attendance taken, but lectures will not be recorded.
As a Stanford student you also have free access to many biomedical journals. To access all biomedical resources Stanford pays for from off campus, you can install a browser extension and a shortcut that allows you to directly search and access Lane Library online resources using your SUNetID. Many of the terms we teach are also well defined in wikipedia.
All course communication will be handled via Piazza. You can enroll by clicking this link (our class page). Course announcements and other private course resources will be communicated via Piazza.
Auditors are welcome. Please sign up to Piazza as well. Send us an email if you want to be included in the class mailing list.
Office: Via Zoom
Office hours: Email for appointment
Phone: (650) 723-7666
Office: Link for Zoom
Office hours: Tuesdays and Thursdays 12PM-1PM PST (Except the week of 1/18; Friday 1/22 4PM-6PM PST instead)
All codes must be executable on stanford student machines (i.e. cardinal, myth, or rice). Jupyter notebooks are allowed for Homework 4 and the final exam. Include how to run your code in your README, and all your codes must be able to run without user modification (e.g. if the code takes in a file as an input the path or the file name should not be hard coded but should be passed in through command line. All files must be named appropriately and your submitted zipped file must include your name. Be as detailed as possible to ensure that you get all the points.
If you are registered with the Office of Accessible Education (OAE), please send the accommodation letter via email to the class staff email () in the beginning of the quarter.
All homework assignments are individual assignments and you may not work in a group. You are allowed to discuss ideas and compare final numeric outputs (e.g. number of lines in a file), but no part of your final code can be shared with other students. In your submitted writeup (e.g., README), you must note the names of your collaborators. You may not share any part of your submissions with each other until grades are returned. We take honor code violations seriously. Violations will be reported to the Office of Community Standards.
We may make mistakes when we grade your homework. If you find one please send an email to to ask for a regrade. We will regrade your entire homework, and your grade may go up or down as a result. You cannot redo your homework after grades have been returned. We will not accept anymore submissions after grades have been sent out.
Take home exam must be done independently. You may not discuss it with anyone.
The base course directory is located at /afs/ir.stanford.edu/class/cs273a, and is reachable from the cardinal and myth machines. Source tree executables are available within the bin directory, and are machine-dependent. If you add "/afs/ir.stanford.edu/class/cs273a/bin/@sys" to your PATH variable, the correct version of the executable will be executed (see text processing tutorial).
There are course schedules and materials available from the Winter 2019/2020, Winter 2018/2019, Winter 2017/2018, Autumn 2016/2017, Autumn 2015/2016, Autumn 2014/2015, Autumn 2013/2014, Autumn 2011/2012, Autumn 2010/2011, Autumn 2009/2010, Autumn 2008/2009, Autumn 2007/2008, and Spring 2006/2007 versions of the course. Also see the Winter 2012/13 class of CS173.
|1/13||Introductory Biology Primer|
|1/20||Introduction to Text Processing|
|1/27||Introduction to the UCSC Genome Browser|
|1||1/11||Gill: Class overview||Lecture PDF|
|2||1/13||CA: Biology primer||Lecture PDF|
|3||1/18||MLK Day (no class)|
|4||1/20||CA: Text processing primer||HW 1 assigned |
|5||1/25||Gill: Protein Coding Genes||Lecture PDF|
|6||1/27||CA: Genome browser primer||Lecture PDF|
|7||2/1||Gill: RNA Genes, Gene Enrichment||HW1 due; HW2 assigned |
|8||2/3||Gill: Gene Regulation I||Lecture PDF|
|9||2/8||Gill: Gene Regulation II||Lecture PDF|
|10||2/10||Gill: Gene Regulation III, Repeats I||HW2 due; HW3 assigned |
|11||2/15||President's day (no class)|
|12||2/17||Gill: Repeats II||Lecture PDF|
|13||2/22||Gill: Simple Repeats, Sequencing||Lecture PDF|
|14||2/24||Gill: Sequencing II & Assembly||HW3 due; HW4 assigned |
|15||3/1||Gill: Molecular Evolution, Population Genetics||Lecture PDF|
|16||3/3||Gill: Pop Gen II, Genetic Disease||Lecture PDF|
|18||3/10||Gill||HW4 due; Take home final exam released|
|21||3/19||No Class||Take home final exam due|