Stanford CS 293 / EDUC 473 | Empowering Educators via Language Technology

Requirements

Class Participation (5%)

Class participation grades are based on whether you productively contribute to the classroom discussions in lecture. This grade also captures your contributions in the presentation and leading discussion section. Finally, participation grades will also take into account how actively you contribute to the success of your project alongside your teammates.

Reading Quizzes (10%)

We will begin most classes with a short, in-class reading quiz (about 3 questions; ~5 minutes). The goal is not to trick you — it’s to help everyone arrive ready for discussion and to give us quick signal about what landed (and what didn’t) in the readings.

Quizzes are based on the required readings, and the questions will be straightforward if you did the reading.

Your lowest three quiz scores are automatically dropped (this includes absences and late arrivals). Because quizzes happen at the very start of class, we can’t offer make-ups. Please arrive on time.

Discussant (10%)

Each student will be a discussant for assigned readings once during the quarter. Students will work in groups of 2 to serve as discussants for a particular class. Your goal as discussant is to prepare and facilitate an engaged and stimulating discussion for about 30-45 minutes total, depending on the available time. Be creative!

Your responsibilities as discussants include:

First, briefly summarize the reading in no more than 3 minutes.
Then, share one or more questions or comments based on the reading that can serve as a launching point for discussion.
Facilitate an interactive class discussion (see below).

Grading rubric (10 points total):

Preparation & grasp of the reading (0–3): Accurate understanding; identifies the paper’s core contribution, assumptions, and stakes.
Concise summary (0–1): Clear, ≤3 minutes, sets up the discussion rather than re-telling the whole paper. If part of the summary was discussed in the lecture/talk already, please focus on summarizing parts that are not redundant.
Discussion design (0–3): Thoughtful prompts/activities that surface tradeoffs (e.g., evidence, validity, ethics, implementation constraints, stakeholders).
Facilitation (0–2): Keeps the room engaged; asks follow-ups; manages time; balances voices; productively redirects when needed.
Inclusivity & professionalism (0–1): Creates space for multiple perspectives; respectful tone; supports a constructive classroom culture.

The best discussants plan for engaging interaction (polls, small-group prompts, structured debate, role play, “design review,” etc.) and come in with a couple of must-hit questions.

Assignments (50% Total)

There will be four assignments during the course, which will jointly build towards your final project.

At the beginning of the class, you can select a team of 1-3 students to work with on the assignments as well as the final project. You will be working with the same team for all assignments. In our experience, groups of 3 lead to the best outcomes, so we encourage you to form a team of that size. Each project team will be assigned a mentor (a member of the teaching team), who will provide feedback on all their project-related work and generally be available.

Please discuss your project idea with instructor/TA early on in the course.

You have a choice among four datasets for your assignments/final project. These datasets span classroom discourse, student writing, curriculum-grounded math reasoning, and teacher–AI interaction:

NCTE dataset: A large collection of 4th–5th grade elementary math classroom transcripts (1,660 observations, ~45–60 minutes each) with rich metadata; the accompanying paper describes links between discourse features and instructional quality/learning outcomes.
PERSUADE dataset: A corpus of student persuasive essays (14,000+), annotated with argumentative/rhetorical elements and effectiveness ratings, plus holistic quality scores and student/demographic metadata (e.g., grade level).
MathFish dataset: A curriculum-grounded math dataset connecting ~9.9K math problems to fine-grained K–12 standards (with an additional set of 385 standards descriptions), designed for evaluating whether models can verify and tag problems with the standards they truly align to.
CoTeach.AI interaction logs: Chat logs between teachers and the CoTeach AI assistant (curriculum-aligned support for IM® v.360 by Illustrative Mathematics®), capturing how educators ask for lesson adaptations, scaffolds, practice problems, and related instructional planning support.

We will share evaluation criteria for each assignment advance:

Assignment 0 (5%)

Goal: Form your team, identify a problem you care about in teaching/learning, and ground your project in a real instructional context. The instructions are linked from the syllabus.

Assignment 1 (15%)

Goal: Choose a dataset, and have a first close read of your data and context. You should leave this assignment with a clear understanding of what’s in the data, and a plan for what to measure.

Data context & context: Where does the data come from? Who produced it, under what conditions, and what are the key limitations/biases?
Data quality audit: What jumps out in the interactions? Missingness, noise, transcription artifacts, selection effects, demographic blind spots, etc.
Exploratory analyses: At least one manual qualitative pass (e.g., memoing on 10–20 examples) and at least one computational/LLM-assisted exploration (with careful spot-checking).
Educator involvement: Run a short think-aloud or guided review (30–45 minutes) where your educator buddy reacts to representative excerpts: What seems authentic vs. off? What moments look instructionally strong/weak? What would they attend to that you might miss?

Deliverables:

A short write-up (3–5 pages, PDF): recap/elaboration of problem motivation, related work pointers (brief), data description, what you noticed in exploration, and what needs cleaning/normalization.
A cleaned/organized version of your dataset (or a well-documented subset) plus reproducible scripts/notebook for preprocessing.
Attached to the write-up, include a “dataset card” / documentation of your cleaned/organized data sample, describing the key fields (that you plan to use in your project), sampling, known gaps, privacy/ethics considerations, and any preprocessing you performed.

Assignment 2 (15%)

Goal: Build/implement and validate measure of some dimension of instructional or learning quality that matters for your use case (e.g., eliciting reasoning, responsiveness, equity of participation, conceptual clarity, feedback quality, student agency).

You will choose one of two tracks:

Track 1 (replication): Implement an established approach developed and validated in prior work (e.g., lexical/structural features, rubric-based labeling, an existing classifier, or a prompting + scoring pipeline). Focus on transparency, robustness, and careful validation on your data context.
Track 2 (new method): Propose and evaluate a more novel approach. You should compare against at least one reasonable baseline and provide a clear validation story (reliability, construct validity, error analysis, and limitations).

You will be asked to triangulate across multiple data sources:

Bottom-up + top-down: Combine qualitative insight (what patterns matter in examples?) with an theoretically-grounded operational definition and measurement approach.
Educator involvement: At minimum, incorporate practitioner judgment in one of these ways: labeling a small set, ranking/preference judgments, rubric feedback on outputs, or structured critique of failure cases.
Link to an outcome: Show how your measure relates to another outcome you care about (even a proxy) related to instruction quality and student learning. This may vary by what's relevant and available in your dataset.

Deliverable: A 4–6 page write-up (PDF) with clear research questions, methods, validation/evaluation, and an error analysis that identifies where your measure fails and why:

For Track 1: Include final results related to measure validation. Include a brief ethics note on what could go wrong if the measure were deployed.
For Track 2: Include preliminary results related to measure development (e.g. data annotation, interrater agreement). Include plans for baseline comparison.

We will provide advice on all of the above, as well as scripts for measure development.

Assignment 3 (15%)

The goals for this assignment will differ by the track chosen for Assignment 2.

For Track 1:

Goal: Use your measurement work to design a support (tool, workflow, or intervention) that helps someone do something better in a real instructional context.

Design + rationale: What user (teacher/tutor/student/coach) are you supporting, in what setting, and why is your support likely to help?
Prototype: A lightweight but concrete artifact (e.g., a promptable assistant, dashboard, feedback generator, rubricing tool, annotation helper, etc.).
Co-design / feedback: At least one structured session with your educator buddy (or target users) to iterate on the design (goals, constraints, language, and usability).
Evaluation plan: Define “success” and how you would measure it (usage, perceived usefulness, calibration, impact on practice, equity considerations, risks).

Deliverables: (i) a short demo-ready prototype, (ii) a 3–5 page write-up with design goals, user feedback, iteration notes, and an evaluation plan, and (iii) a brief discussion of risks/guardrails and responsible deployment.

For Track 2:

Deliverables: an expanded version of the write-up from Assignment 2 (8-10 pages total) describing the final details of measure development, validation, and results. Include a brief ethics note on what could go wrong if the measure were deployed.

Quality of Peer Feedback (5%)

You will be asked to provide feedback to peers on assignments 1-3.

Grading: Peer feedback is graded on quality, not on whether your suggestions are “correct.” Strong feedback is (i) specific, (ii) evidence-based (points to a place in the write-up/code), (iii) actionable (clear next steps), and (iv) respectful. Each round, you should include at least two concrete strengths and two high-impact suggestions.

Final Presentation (10%)

Each team will give a final presentation during the last week of the quarter, which would be targeted to a broad audience, including practitioners, policy-makers and researchers (e.g. imagine giving it at the AI education summit at Stanford).

Format: ~12 minutes talk + 3 minutes Q&A (strict).
Content: problem & use case; data; what you built/measured; validation/evaluation; error analysis; educator input (what changed based on practitioner feedback); and ethics/limitations. Avoid overloading the technical jargon.

Grading rubric: clarity and framing (30%), technical soundness and evidence [legible to non-technical audiences] (40%), thoughtful reflection on limitations/ethics (15%), and quality of communication/demo (15%).

Final Paper (10%)

Your final paper will closely build on your assignments. This paper should be up to 10 pages long including references, and should adhere to the formal requirements and stylistic expectations for research contributions in computational social science / NLP.

Unlike the assignments that were free-form, you are required to use one of the following templates for your submission:

If you have any questions about organizing your paper, please talk to the instructors. This handout by Chris Potts can provide helpful guidelines for presenting your research to an NLP audience, and it is helpful even if your work is targeted at a different audience (e.g. learning sciences).

There are two required paper sections that are special to our course:

Ethical Consideration: Please write an explicit discussion section of any potential ethical issues, such as around the ethical implication of the project, the use of the data, and potential applications of your work. Here are some recommendations from ACL's ethics guideline: "Ethical questions may arise when working with a variety of types of computational work with language, including (but not limited to) the collection and release of data, inference of information or judgments about individuals, real-world impact of the deployment of language technologies, and environmental consequences of large-scale computation."
Authorship statement: At the end of your paper (after the 'Acknowledgments' section in the template), please include a brief authorship statement, explaining how the individual authors contributed to the project. You are free to include whatever information you deem important to convey. For guidance, see the second page, right column, of this guidance for PNAS authors (p. 12). We are requiring this largely because we think it is a good policy in general. This statement is required even for singly-authored papers, because we want to know whether your project is a collaboration with people outside of the class. Only in extreme cases, and after discussion with the team, would we consider giving separate grades to team members based on this statement.

Final Grades

This is the system we will use at the end of the quarter to map numerical final grades to letter grades. No curve is applied, and there are no other factors shaping the mapping from weighted averages to letter grades.

Grade range	Letter grade
≥ 100	A+
≥ 94	A
≥ 90	A−
≥ 87	B+
≥ 84	B
≥ 80	B−
≥ 77	C+
≥ 74	C
≥ 70	C−
≥ 67	D+
≥ 64	D
≥ 60	D−
< 60	No pass

Units

If you are taking the course for 3-4 units, the amount of work is the same. If you are taking the course for 2 units, you can skip being a discussant.

Academic Honesty

Please familiarize yourself with Stanford's honor code. We will adhere to it and follow through on its penalty guidelines.

It is expected that you accurately represent your own work and the work of others in this class. Ideas should be your own. Please see the course AI Policy below.

AI Policy

Because this course is about productive uses of language technology in teaching and learning, we encourage you to use AI tools thoughtfully — as you would any other powerful tool — while keeping your work rigorous, transparent, and clearly your own.

Encouraged uses: brainstorming and outlining; coding help and debugging; generating test cases; rewriting for clarity; summarizing notes you personally wrote; creating lightweight prototypes; helping you identify edge cases or alternative explanations.
Use with caution: literature review and citation discovery. AI tools frequently hallucinate references and misrepresent findings. If you use AI to help find related work, you must independently verify every citation and claim by reading the original sources.
Not allowed: fabricating data/results; generating “citations” you have not verified; submitting AI-generated text as if it were your own original writing without disclosure; using AI to impersonate a human participant or to generate fake interview/think-aloud content.

Disclosure requirement (mandatory): Every assignment and the final paper must include a short AI Use Statement (2–6 sentences) describing whether you used AI, which tool(s), and for what purpose (e.g., “debugging,” “rewriting,” “prompt-based scoring”), plus what you did to validate/spot-check outputs. When AI meaningfully contributes to an artifact (e.g., generated code blocks, prompts, rubrics, or evaluation outputs), include the relevant prompts and settings in an appendix or link to a reproducible log.

Honor Code note: Failing to disclose substantive AI use is an academic integrity violation. When in doubt, disclose.

Late Days

Each student will have a total of 4 free late (calendar) days applicable to assignments except the final project paper. Final project papers cannot be turned in late under any circumstances.
Free late days can be used at any time, no questions asked. Each 24 hours or part thereof that a homework or quiz is late uses up one full late day. Once these late days are exhausted, any assignment turned in late will be penalized 10% per late day.
If a group's assignment is late n days, then each group member is charged n late days.
Late days are never transferable between students, even students in the same group.
Late days do not apply to the final submission of the course project.
Quizzes do not have late days as they will be taken in class.

Policy on Submitting Related Final Projects to Multiple Classes

On the one hand, we want to encourage you to pursue unified interdisciplinary projects that weave together themes from multiple classes. On the other hand, we need to ensure that final projects for this course are original and involve a substantial new effort.

To try to meet both these demands, we are adopting the following policy on joint submission: if your final project for this course is related to your final project for another course, you are required to submit both projects to us by our final project due date. If we decide that the projects are too similar, your project will receive a failing grade. To avoid this extreme outcome, we strongly encourage you to stay in close communication with us if your project is related to another you are submitting for credit, so that there are no unhappy surprises at the end of the term. Since there is no single objective standard for what counts as "different enough", it is better to play it safe by talking with us.

Fundamentally, we are saying that combining projects is not a shortcut. In a sense, we are in the same position as professional conferences and journals, which also need to watch out for multiple submissions. You might have a look at the ACL/NAACL policy, which strives to ensure that any two papers submitted to those conferences make substantially different contributions – our goal here as well.

Regrade

It is very important to us that all assignments are properly graded. The teaching staff works extremely hard to grade fairly and to turn around assignments quickly. We know what you work hard, and we respect that. Occasionally, mistakes happen, and it's important to us to correct them. If you believe there is an error in your assignment grading, please submit an explanation in writing to the staff within seven days of receiving the grade. We will regrade the entire assignment to ensure quality.

No regrade requests will be accepted orally, and no regrade requests will be accepted more than seven days after receipt of the assignment. Regrade requests must be respectful; we will not consider any regrade requests containing disrespectful language.

Names and Pronouns

Use the names and pronouns (e.g., they/them, she/her, he/him, just a name, or something else) indicated by your classmates for themselves. If you don’t want to share a set of pronouns for yourself, that is perfectly acceptable, too. If your name or pronouns change during the course, we invite you to share this with us and/or other students, so we may talk with you and refer to your ideas in discussion as you would wish.

CS 293/EDUC 473 Logistics

Stanford / Winter 2025-26