CS224W Analysis of Networks

Mining and Learning with Graphs

Course Information

Course description

Networks are a fundamental tool for modeling complex social, technological, and biological systems. Coupled with the emergence of online social networks and large-scale data availability in biological sciences, this course focuses on the analysis of massive networks which provide many computational, algorithmic, and modeling challenges.

The course will cover recent research on the structure and analysis of such large networks and on models and algorithms that abstract their basic properties. We will explore how to practically analyze large-scale network data and how to reason about it through models for network structure and evolution.

Syllabus

Topic Lectures
Graph concepts, models and measurements 1-3
Network Construction and Inference 4
Link Analysis: PageRank and SimRank 5
Network Motifs: Structural roles in Networks 6
Community Detection 7-8
Link Prediction 9
Node Representation Learning: Node2Vec 10
Information Cascades 11-12
Influence & Outbreak Detection 13-14
Network Robustness 15
Network Evolution 16
Node Centrality and Anomaly Detection 17
Knowledge Graphs and Metapaths 18
Message passing and Node classification 19
Graph Convolutional Neural Networks 20

Prerequisites

Students are expected to have the following background:

  • Knowledge of basic computer science principles, sufficient to write a reasonably non-trivial computer program (e.g., CS107 or CS145 or equivalent are recommended).
  • Familiarity with the basic probability theory (CS109 or Stat116 are sufficient but not necessary.)
  • Familiarity with the basic linear algebra (any one of Math 51, Math 103, Math 113, or CS 205 would be much more than necessary.)

The recitation sessions in the first weeks of the class will give an overview of the expected background.

Course materials

There is no official text for this course. Notes and reading assignments will be posted periodically on the course web site. The following books are recommended as optional reading:

Course handouts and other reading materials can be downloaded here.


Coursework

The coursework for the course will consist of:

Homework

The idea for the problem sets is to practice some skills that will be required for the project. The homework will contain written questions and questions that require some programming. Specifically, we will be working both on mathematical models of networks and analyzing real network data, for example, how to find stable sets in structural balance theory or how to optimally seed the network to maximize the influence. Second, we will also work with network datasets to get a flavor of types of questions one asks in network analysis. For example, using citation data, create a small citation network, compute degree distributions, clustering coefficients, node centralities, etc.

Questions: We try very hard to make questions unambiguous, but some ambiguities may remain. Ask (i.e., post a question on Piazza) if confused, or state your assumptions explicitly. Reasonable assumptions will be accepted in case of ambiguous questions.

Assignment submission: All students (SCPD and non-SCPD) submit their assignments via GradeScope by 11:59PM PT on the due date. (We will allow a small 15 minute grace period, but beyond that and late periods, all deadlines are final.) You can typeset or scan your assignment. Make sure that you answer each sub-question on a separate page. That is, one question per page regardless of the answer length. Also, attach a signed cover sheet to the end of your submission.

Register for GradeScope with your Stanford email and student ID if possible and use the course code 9ZZ2XY to sign up for the course.

Please make sure to tag each part correctly on GradeScope so it is easier for us to grade. There will be a 1 point deduction for each mistagged page.

Students also need to upload their code at http://snap.stanford.edu/submit. Put all the code for a single question into a single file and upload it.

Regrade policy: We take great care to ensure that grading is fair and consistent. Since we will always use the same grading procedure, any grades you receive are unlikely to change significantly. However, if you feel that your work deserves a regrade, please submit a request on GradeScope within one week of receiving your grade.

Before requesting a regrade, please prepare a clear and concise argument for your stance by doing the following:

  • Re-read relevant sections of papers, the notes, and the text (where applicable).
  • Read carefully the comments we provide on your work and consider their meaning.
And then submit your regrade request via GradeScope.

We reserve the right to regrade the entirety of any homework for which any regrade is requested.

For grading questions, please talk to us during office hours.

Course Project

One of CS224W main goals is to prepare you to apply state-of-the-art network analysis tools and algorithms to an application. If you are interested in research, CS224W will also leave you well-qualified to do network science research. The class final project will offer you an opportunity to do exactly this.

Students can (and are strongly encouraged) to work in teams of up to 3 people. If you have a project of such large scope and ambition that it cannot be done by a team of only three persons, you can propose doing a project in a team of four.

Your first task is to pick a project topic. If you are looking for project ideas, please come to either Prof. Leskovec or the TAs' office hours, and we'd be happy to brainstorm and suggest some project ideas. Also check the datasets page.

In the meantime, there can be three kinds of course projects:

  • Experimental evaluation of algorithms and models on an interesting dataset;
  • A theoretical project that considers a model, an algorithm or a network property (measure) and derives a rigorous result about it;
  • A scalable implementation of an algorithm for processing massive graphs.

Ideally, projects will be a mix of the three types of projects outlined above. The project should contain at least some amount of mathematical analysis, and some experimentation on real or synthetic data. You can also check previous course project reports.

Many fantastic class projects come from students picking either an application/dataset that they're interested in, or picking some sub-field of network analysis that they want to explore more, and working on that as their project. If you haven't worked on a research project before but would like to, you can also use this as an opportunity to try your hand at it. (Just be sure to ask us for help if you're uncertain how to best get started.)

Alternatively, if you're already working on a research project that network analysis might be applicable to, then working out how to apply network analysis techniques to it will often make a very good project topic. Similarly, if you currently work in industry and have an application on which networks might help, that could also make a great project.

A very good CS224W project will comprise a publishable or nearly-publishable piece of work. Each year, some number of students continue working on their projects after completing CS224W, and submit their work to a conference or journal.

Projects will be evaluated based on:

  • The technical quality of the work: Does the technical material make sense? Are the methods tried reasonable? Are the proposed algorithms or applications clever and interesting? Do the authors convey novel insights about the problem and/or algorithms?
  • Significance: Did the authors choose an interesting or a "real" problem to work on, or only a small "toy" problem? Is this work likely to be useful and/or have impact?
  • The novelty of the work, and the clarity of the write-up.

Lastly, a few words of advice: many of the best class projects come from students working on topics that they're excited about. So, pick something that you can get excited and passionate about! Be brave rather than timid, and do feel free to propose ambitious things that you're excited about. Finally, if you're not sure what would or would not make a good project, please also feel strongly encouraged to either email us or come to office hours to talk about project ideas.

There are four deliverables (Click the respective deliverable to know more):

Previous course project reports:

Project proposal

This assignment consists of two parts. The first part is a 2-3 page reaction paper to several published research papers, and the second part is a 1-2 page proposal for the project you want to pursue for this class. Two parts should be related: your reactions to published papers should inform the project you will work on.

Part 1: Reaction paper component (2-3 pages). The course is based on material from the last few years. This means that most of it in form of research papers, which raise lot of interesting issues that have yet to be explored. The goal of the reaction paper section is that students familiarize themselves more in depth with the material covered in class, do reading beyond what was covered in class.

Students will pick three (or more) papers that are clearly related to course topics (e.g., mentioned in class or as optional readings on course handouts page; if in doubt check with course staff about the papers you aim to read). These papers should go beyond material that was covered in detail in a lecture (i.e., don't only discuss required readings or the textbook). Students should carefully read the papers and write a short (approximately 2-3 pages) reaction paper about the content of the chosen papers. You should be thinking beyond what you read, and not just take other people's work for granted. The reaction part of the paper should address the following questions:

  • Summary
    • What is main technical content of the papers?
    • How do papers relate to the topics presented in the course?
    • What is the connection between the papers you are discussing?
  • Critique
    • What are strengths and weaknesses of the papers and how they be addressed?
    • What were the authors missing?
    • Was anything particularly unrealistic?
  • Brainstorming (that then leads to the project proposal)
    • What are promising further research questions in the direction of the papers?
    • How could they be pursued?
    • An idea of a better model for something? A better algorithm? A test of a model or algorithm on a dataset or simulated data?

Reaction papers should not just be summaries of the papers you read. The last two bullets should form the most substantial part of the document. Answering these questions can be a very good way to explore a potential project topic. The reaction paper should be concluded with a section with a description of some promising further research directions and questions, and how could they be pursued.

In prior versions of the course, the reaction paper has been a very good way to explore a potential project topic.

Part 2: Project proposal (1-2 pages). The project proposal component should build on the reaction paper component. The purpose of the reaction paper is to survey the related work and identify what are strengths and weaknesses of the papers and how they may be addressed. The proposal should then focus on what are some promising further research directions and questions: How precisely do you plan to pursue them? What methods/data do you plan to use? You should try to provide a concrete proposal for a model or algorithm that potentially extends or improves the topics discussed in the papers you've read.

When writing the proposal you should try to answer the following questions:

  • What is the problem you are solving?
  • What data will you use (how will you get it)?
  • What work do you plan to do the project?
  • Which algorithms/techniques/models you plan to use/develop? Be as specific as you can!
  • Who will you evaluate your method? How will you test it? How will you measure success?
  • What do you expect to submit/accomplish by the end of the quarter?

Some other points to note:

  • The project should contain at least some amount of mathematical analysis, and some experimentation on real or synthetic data
  • The result of the project will typically be a 8 page paper, describing the approach, the results, and the related work.

We strongly encourage you to work in groups of 3 people. It is hard for us to balance the grading based on the group size. This means that projects will be graded about the same regardless of how many people are in the group -- working in groups is strongly encouraged!

Here are some examples of past reaction papers / project proposals:

Project Milestone

  • Think of this as a draft of your final report but without your major results.
  • We expect that you have completed 40% of the project
  • Provide a complete picture of your project even if certain key parts have not yet been implemented/solved.
  • Include the parts of your project which have been completed so far, such as:
    • Thorough introduction of your problem
    • Review of the relevant prior work
    • Description of the data collection process
    • Description of any initial findings or summary statistics from your dataset
    • Description of any mathematical background necessary for your problem
    • Formal description of any important algorithms used
    • Description of general difficulties with your problem which bear elaboration
  • Make sure to at least outline the parts which have not yet been completed so that it is clear specifically what you plan to do for the final version.
  • Recommended length 3-5 pages

Examples of past milestone reports:

Project Report

The final project report should be a 5-10 page paper, describing the introduction, related work, approach, results and conclusion. We will not accept reports longer than 10 pages. At the end of the report, you should also highlight the contributions of individual team members to the project (in the format outlined below). The project report should contain at least some amount of mathematical analysis, and some experimentation on real or synthetic data.

Course staff will use the following guidelines when grading your final project write-ups. Keep in mind however, that if there is a good reason why your project doesn't match the rubric below, we will take that into consideration when grading your report. For example, we recognize that purely theoretical or pure data analysis projects may not fit the rubric below perfectly, and that depending on your project you may want swap the ordering of certain sections. But hopefully all projects can be roughly mapped to the criteria below:

  • Introduction/Motivation/Problem Definition (15%)
    • What is it that you are trying to solve/achieve and why does it matter.
  • Related Work (10%)
    • How does your project relate to previous work. Please give a short summary on each paper you cite and include how it is relevant.
  • Model/Algorithm/Method (30%)
    • This is where you give a detailed description of your primary contribution. It is especially important that this part be clear and well written so that we can fully understand what you did.
  • Results and findings (35%)
    • How do you evaluate your solution to whatever empirical, algorithmic or theoretical question you have addressed and what do these evaluation methods tell you about your solution. It is not so important how well your method performs but rather how interesting and clever your experiments and analysis are.
    • We are interested in seeing a clear and conclusive set of experiments which successfully evaluate the problem you set out to solve. Make sure to interpret the results and talk about what can we conclude and learn from your evaluations. Even if you have a theoretical project you should have something here to demonstrate the validity or value of your project (for example, proofs or runtime analysis).
  • Style and writing (10%)
    • Overall writing, grammar, organization and neatness.

Unlike the project proposal and milestone, we plan to assign individual scores to team members for the final project report. We observed that there is a skewed distribution of work in some of the teams and would like to take that into account when we are grading. Your score for the final report will now be a function of two aspects:

  1. The criteria outlined above for your final report
  2. Your contribution to the project relative to that of your team members.

In order to do be able to assign such individual scores, we want you to write down a brief summary of the individual contributions of each of the team members in the format outlined below at the end of each report:

Example:
----------------
John: Plotting graphs during data analysis, crawling the data, preliminary data analysis
Mary: Problem formulation, writing up the report, coming up with the algorithm
Chris: Coding up the algorithm, running tests, tabulating final results
---------------

If you fail to outline individual contributions at the end of your report, we will assign equal score to all the team members.

Here are some of the best project reports from last years:

Project Poster Session

The goal of the poster session is to give you a chance to see what your classmates have been working on, so make sure to go around and explore the posters. TAs will walk around and talk to you. At least two TAs will stop by your poster to talk to you.

General info
  • Time: 3:30-6:30PM, Tuesday, Dec. 11th
  • Location: TBD
  • All groups which have at least one non-SCPD member are expected to present
  • One group member should be at the poster at all times (while other group members can explore other posters)

Honor code and submission policy

The following paragraphs apply both to any material submitted for this course (homework, project proposal, project milestone, and so on).

Honor code: We strongly encourage students to form study groups. Students may discuss and work on homework problems in groups. However, each student must write down the solutions independently, and without referring to written notes from the joint session. In other words, each student must understand the solution well enough in order to reconstruct it by him/herself. In addition, each student should write on the problem set the group of people with whom she/he collaborated.

Further, since we occasionally reuse problem set questions from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions or code. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year.

Late submissions: Each student will have a total of two late periods to use for homeworks and project submissions (except the final project report, for which the deadline is strict.) One late period expires midnight on the day before the next class: this means that if the assignment is due on Thursday then the first late period expires at 11:59pm on the next Monday. Once late periods are exhausted, any write-up turned in late will be penalized 50% per late period. However, no write-up (homework and project related documents) will be accepted more than one late period after its due date.

Please start submitting to Gradescope 30 minutes before the deadline. In previous quarters, Gradescope has been very unresponsive at midnight. In case you cannot successfully submit before 11:59pm, we will offer a 15 minute grace period for you to email us your assignment. This is a hard deadline and we will accept no assignments after this time.

GRADING

The grading will be based on the following weighting scheme:

  • 30% on 3 Homework (9.6% each) + 1 setup homework (1%)
  • 30% on the Exam
  • 40% on the Final Project
    • The final project grade is computed as follows:
      0.2*P + 0.2*M + 0.5*F + 0.1*S,
      where the letter denotes a grade for (each grade is on a range 0...100):
    • P: Proposal paper grade
    • M: Project milestone Report grade
    • F: Final project write-up grade
    • S: Poster presenting the work
  • Course participation (Piazza, SNAP code contributions, etc.) as extra credit

© Copyrights SNAP Stanford. All Rights Reserved.

Created with Spot template by TemplateMag