Stanford CS 293 / EDUC 473 | Empowering Educators via Language Technology

Schedule

Note: tentative schedule is subject to change.
🔎 Means that the paper will be a core part of the lecture.
🌟 Means that the paper will be the focus of the reading discussion.

Week	Date	Lecture	Reading	Assignment
1	Jan 6 Tuesday	Class Introduction [slides]	Required Readings: HAI post from last year's class: Language Models in the Classroom: Bridging the Gap Between Technology and Teaching" Optional Readings: U.S. Department of Education, Office of Educational Technology. Artificial Intelligence and Future of Teaching and Learning: Insights and Recommendations, Washington, DC, 2023. Litman, D. (2016, March). Natural language processing for enhancing teaching and learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1). CRPE's Think Forward: AI Learning Forum. Wicked Opportunities: Leveraging AI to Transform Education, 2024. Common Sense Media. Generative AI in K–12 Education: Challenges and Opportunities, 2024. Digital Promise. An Ethical and Equitable Vision of AI in Education: Learning Across 28 Exploratory Projects, 2024.	A0 out Form teams, find a teacher buddy! Sign up for a reading discussion by Friday, Jan 9.
1	Jan 8 Thursday	Discovery & Exploration in Educational Language Data [slides]	Required Reading: 🔎 Nguyen, D., Liakata, M., DeDeo, S., Eisenstein, J., Mimno, D., Tromble, R., & Winters, J. (2020). How We Do Things With Words: Analyzing Text as Social and Cultural Data. Frontiers in Artificial Intelligence, 3. Dowell, N., & Kovanovic, V. (2022). Modeling educational discourse with natural language processing. Education, 64, 82. Optional Reading: Hovy, D. (2020). Text analysis in Python for social scientists: Discovery and exploration. Cambridge University Press. Dan Jurafsky and James H. Martin (2021). Speech & language processing. Chapters 2, 6, 8, 18, 21, 23, 25, 26.	A1 out
2	Jan 13 Tuesday	Discovery & Exploration in Educational Language Data Parsing, Lexical Analyses [slides]	Required Reading: 🔎 🌟 Liu, J., & Cohen, J. (2021). Measuring teaching practices at scale: A novel application of text-as-data methods. Educational Evaluation and Policy Analysis, 43(4), 587-614 Optional Reading: 🔎 Lucy, L., Demszky, D., Bromley, P., & Jurafsky, D. (2020). Content analysis of textbooks via natural language processing: Findings on gender, race, and ethnicity in Texas US history textbooks. AERA Open, 6(3), 2332858420940312. Markowitz, D. M., Kittelman, A., Girvan, E. J., Santiago-Rosario, M. R., & McIntosh, K. (2023). Taking Note of Our Biases: How Language Patterns Reveal Bias Underlying the Use of Office Discipline Referrals in Exclusionary Discipline. Educational Researcher, 0(0). Monroe, B. L., Colaresi, M. P., & Quinn, K. M. (2008).Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict. Political Analysis, 16(04), 372–403.	-
2	Jan 15 Thursday	Centering Teachers in the Design & Development of Tools Guest Visit by Dan Meyer	ASU GSV-keynote: The Difference Between Great AI and Great Teaching Required Reading: 🔎 Laurence Holt. The 5 Percent Problem. 🔎 Dan Meyer What Happened at the AEI Debate on AI in Education This Week 🔎 Dan Meyer. Teachers: “These AI Resources Are Not Classroom-Ready.” Optional Reading: Dan Meyer. Teachers Hate These Kinds of Paperwork. Can AI Help?	-
3	Jan 20 Tuesday	Discovery & Exploration in Educational Language Data Topic Modeling, Clustering, Grounded Exploration [slides]	Required Reading: 🌟🔎 Kubsch, M., Krist, C., & Rosenberg, J. M. (2023). Distributing epistemic functions and tasks—A framework for augmenting human analytic power with machine learning in science education research. Journal of Research in Science Teaching, 60(2), 423–447. Optional Reading: Nelson, L. K. (2020). Computational Grounded Theory: A Methodological Framework. Sociological Methods & Research, 49(1), 3–42. 🔎 Chang, J., Gerrish, S., Wang, C., Boyd-graber, J. L., & Blei, D. M. (2009). Reading Tea Leaves: How Humans Interpret Topic Models. Advances in Neural Information Processing Systems, 288–296. Chew, R., Bollenbacher, J., Wenger, M., Speer, J., & Kim, A. (2023). LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding. arXiv. Chen, N.-C., Drouhard, M., Kocielnik, R., Suh, J., & Aragon, C. R. (2018). Using Machine Learning to Support Qualitative Coding in Social Science: Shifting the Focus to Ambiguity. ACM Transactions on Interactive Intelligent Systems, 8(2), 1–20. McCarthy, A. D., & Dore, G. M. D. (2023, July). Theory-grounded computational text analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 1586-1594). Alvero, A. J., Giebel, S., Gebre-Medhin, B., Antonio, A. L., Stevens, M. L., & Domingue, B. W. (2021). Essay content and style are strongly related to household income and SAT scores: Evidence from 60,000 undergraduate applications. Science advances, 7(42), eabi9031.	A0 due Monday at 5pm
3	Jan 22 Thursday	Designing NLP Tools for Empowering Teachers in the Real World Q&A with Rakiya Brown from TeachFX	Required Reading: 🔎 🌟 Jacobs, J., Suresh, A., Booth, B. M., Sumner, T., Bush, J., Brown, C., & D’Mello, S. K. (2025). Automating feedback from recorded instructional observations: Using AI to detect and support dialogic teaching. In S. Kelly (Ed.), Research Handbook on Classroom Observation. Edward Elgar Publishing. Optional Reading: Jacobs, J., Scornavacco, K., Harty, C., Suresh, A., Lai, V., & Sumner, T. (2022). Promoting rich discussions in mathematics classrooms: Using personalized, automated feedback to support reflection and instructional change. Teaching and Teacher Education, 112, 103631. Van Camp, A., Vitale, J., & Lloyd, B. (2025). Next generation classroom observations: Leveraging AI to maximize the scalability and effectiveness of performance feedback for teachers. In Research Handbook on Classroom Observation (pp. 366-381). Edward Elgar Publishing. Demszky, D., Liu, J., Hill, H. C., Sanghi, S., & Chung, A. (2025). Automated feedback improves teachers’ questioning quality in brick-and-mortar classrooms: Opportunities for further enhancement. Computers & Education, 227, 105183.	A2 out
4	Jan 27 Tuesday	Guest visit by Brian Veprek and Theofilos Strinopoulos, Google LearnLM team	Required Reading: 🔎 LearnLM Team (2024). LearnLM: Improving Gemini for Learning. arXiv preprint arXiv:2412.16429. 🔎 Learn LM Team, Eedi (2025). AI tutoring can safely and effectively support students: An exploratory RCT in UK classrooms. Optional Reading: Jurenka, I., Kunesch, M., McKee, K. R., Gillick, D., Zhu, S., Wiltberger, S., ... & Ibrahim, L. (2024). Towards responsible development of generative AI for education: An evaluation-driven approach. arXiv preprint arXiv:2407.12687.	A1 due Monday at 5pm
4	Jan 29 Thursday	Using NLP/Multimodal data for Educational Measurement Data Annotation Guest lecture by Mei Tan	Required Reading: 🔎 🌟 Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational linguistics, 34(4), 555-596. --> Don't get bogged down by details of each metric, rather focus on the problems that motivate these methods. 🔎 Tan, M., & Demszky, D. (2025). Do As I Say: What Teachers’ Language Reveals About Classroom Management Practices. Educational Researcher. --> Only methods section is required; skim rest if you're curious 🔎 Cole, R. (2024). Inter-rater reliability methods in qualitative case study research. Sociological Methods & Research, 53(4), 1944-1975. --> Skim, just to contrast the NLP view; if short on time, just read the tables! Optional Reading: Davani, A. M., Díaz, M., & Prabhakaran, V. (2022). Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10, 92-110. Hills, O. H. L. (2023). Leveraging Human Feedback to Scale Educational Datasets: Combining Crowdworkers and Comparative Judgement (arXiv:2305.12894). arXiv.	-
5	Feb 3 Tuesday	Using NLP/Multimodal data for Educational Measurement [slides]	Required Reading (discussants can pick any): 🔎 Chandler, C., Raju, R., Reitman, J. G., Penuel, W. R., Ko, M., Bush, J. B., Biddy, Q., & D’Mello, S. K. (2025). Improving the Generalizability of Models of Collaborative Discourse. Proceedings of the 18th International Conference on Educational Data Mining (EDM 2025), pp. 215–227. International Educational Data Mining Society. 🔎 Neshaei, S. P., Davis, R. L., Mejia-Domenzain, P., Nazaretsky, T., & Käser, T. (2025). Bridging the Data Gap: Using LLMs to Augment Datasets for Text Classification. Proceedings of the 18th International Conference on Educational Data Mining (EDM 2025), pp. 119–132. International Educational Data Mining Society. 🔎 Dutulescu, A., Ruseti, S., Dascalu, M., & McNamara, D. (2025). One Model to Score Them All: Unified Scoring of Learning Strategies with LLMs. Proceedings of the 18th International Conference on Educational Data Mining (EDM 2025), pp. 496–502. International Educational Data Mining Society. 🔎 Park, S., Shariff, D., Samadi, M. A., Nixon, N., & D’Mello, S. (2025). From Discourse to Dynamics: Understanding Team Interactions Through Temporally Sensitive NLP. Proceedings of the 18th International Conference on Educational Data Mining (EDM 2025), pp. 410–417. International Educational Data Mining Society. 🔎 Siedahmed, A., Ocumpaugh, J., Ferris, Z., Kodwani, D., Heffernan, N., & Worden, E. (2025). Nonstandard English and the Automated Scoring of Open-Ended Math Problems. Proceedings of the 18th International Conference on Educational Data Mining (EDM 2025), pp. 254–264. International Educational Data Mining Society. Optional readings: Hou, R., Bühler, B., Fütterer, T., Bozkir, E., Gerjets, P., Trautwein, U., & Kasneci, E. (2025). Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach. Proceedings of the 18th International Conference on Educational Data Mining (EDM 2025), pp. 241–253. International Educational Data Mining Society. Beigman Klebanov, B., Suhan, M., & Mikeska, J. N. (2025). Towards evaluating teacher discourse without task-specific fine-tuning data. Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pp. 192–200. National Council on Measurement in Education (NCME). Ormerod, C., & Kehat, G. (2025). Long context Automated Essay Scoring with Language Models. Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pp. 35–42. National Council on Measurement in Education (NCME). We recommend browsing the proceedings of: NCME AIME-Con: The first conference of using AI for measurement in education 2025 Education Data Mining Conference	-
5	Feb 5 Thursday	Using Generative AI to Support Teachers AI Feedback Guest visit by Jennifer Meyer, University of Vienna	Required Reading: 🔎 Daumiller, M., & Meyer, J. (2025). Advancing feedback research in educational psychology: Insights into feedback processes and determinants of effectiveness. Contemporary Educational Psychology, 102390. Optional Reading: 🔎 Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., ... & Olson, C. B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, 101894. 🔎 Weidlich, J., Gotsch, F., Schudel, K., Marusic-Würscher, C., Mazzarella, J., Bolten, H., ... & Merki, K. M. (2025). Teacher, peer, or AI? Comparing effects of feedback sources in higher education. Computers and Education Open, 100300. 🔎 Ruwe, T., & Kuklick, L. (2025). Quality counts? Examining the role of feedback provider and feedback quality on students' feedback perceptions. British Journal of Educational Technology. 🔎 Meyer, J., Jansen, T., Schiller, R., Liebenow, L. W., Steinbach, M., Horbach, A., & Fleckenstein, J. (2024). Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions. Computers and Education: Artificial Intelligence, 6, 100199.	A3 out
6	Feb 10 Tuesday	Using Generative AI to Support Tutors Guest talk by Rene Kizilcec, Cornell	Required Readings: 🔎 🌟 Browse this website and read this draft Taxonomy 🔎 Hicke, Y., Geathers, J., Vu, K., Sewell, J., Cardie, C., Talwalkar, J., ... & Kizilcec, R. (2025). MedSimAI: simulation and formative feedback generation to enhance deliberate practice in medical education. arXiv preprint arXiv:2503.05793. Optional Readings: 🔎 Geathers, J., Alvero, A. J., & Kizilcec, R. F. (2025, July). ChitterChatter: Curriculum-Aligned AI Speaking Partners for Language Learning Classrooms. In Proceedings of the Twelfth ACM Conference on Learning@ Scale (pp. 346-350). 🔎 Wang, R. E., Ribeiro, A. T., Robinson, C. D., Loeb, S., & Demszky, D. (2024). Tutor copilot: A human-ai approach for scaling real-time expertise. arXiv preprint arXiv:2410.03017.	--
6	Feb 12 Thursday	In-Class Project Work Session	No readings	-
7	Feb 17 Tuesday	Modeling Approaches & Synthetic Students (aka Simulation)	Required Reading (discussants can choose any): 🔎 Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianxiao Jiang, Jie Cao, Huiqin Liu, Zhiyuan Liu, Lei Hou, and Juanzi Li. 2025. Simulating Classroom Education with LLM-Empowered Agents. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 10364–10379, Albuquerque, New Mexico. Association for Computational Linguistics. 🔎 Liu, Q., Shakya, R., Jovanovic, J., Khalil, M., & de la Hoz‐Ruiz, J. (2025). Ensuring privacy through synthetic data generation in education. British Journal of Educational Technology, 56(3), 1053-1073. 🔎 Khalil, M., Vadiee, F., Shakya, R., & Liu, Q. (2025, March). Creating artificial students that never existed: Leveraging large language models and CTGANs for synthetic data generation. In Proceedings of the 15th International Learning Analytics and Knowledge Conference (pp. 439-450). Optional Reading: 🔎 Perczel, J., Chow, J., & Demszky, D. (2025). TeachLM: Post-training llms for education using authentic learning data. arXiv preprint arXiv:2510.05087. Liu, Y., Bhandari, S., & Pardos, Z. A. (2025). Leveraging LLM respondents for item evaluation: A psychometric analysis. British Journal of Educational Technology, 56(3), 1028-1052. Yue, M., Lyu, W., Mifdal, W., Suh, J., Zhang, Y., & Yao, Z. (2025). MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education (arXiv:2404.06711). arXiv.	A2 due Monday at 5pm
7	Feb 19 Thursday	Practitioner-Centered Design of Teacher Support Tools [slides]	Required Reading: Nicholson, R., Bartindale, T., Kharrufa, A., Kirk, D., & Walker-Gleaves, C. (2022). Participatory Design Goes to School: Co-Teaching as a Form of Co-Design for Educational Technology. CHI Conference on Human Factors in Computing Systems, 1–17. Hutchins, N. M., & Biswas, G. (2024). Co‐designing teacher support technology for problem‐based learning in middle school science. British Journal of Educational Technology, 55(3), 802-822. Optional Reading: Wang, D., Bian, C., & Chen, G. (2024).Using explainable AI to unravel classroom dialogue analysis: Effects of explanations on teachers’ trust, technology acceptance and cognitive load. British Journal of Educational Technology, 55(6), 2530–2556. Lee, V. R., Clarke-Midura, J., Shumway, J., & Recker, M. (2022). “Design for Co-Design” in a Computer Science Curriculum Research-Practice Partnership.	-
8	Feb 24 Tuesday	In Class Project Work Session	-	--
8	Feb 26 Thursday	Generative Language Models for Pre-service Teacher Training Student Simulations Guest talk by Julie Cohen, University of Virginia	Required Reading: 🔎 Cohen, J., Wong, V., Krishnamachari, A., & Berlin, R. (2020). Teacher coaching in a simulated environment. Educational evaluation and policy analysis, 42(2), 208-231. 🌟 🔎 Jamie N. Mikeska, Aakanksha Bhatia, Shreyashi Halder, Tricia Maxwell, Beata Beigman Klebanov, Benny Longwill, Kashish Behl, and Calli Shekell. 2025. Generative AI Teaching Simulations as Formative Assessment Tools within Preservice Teacher Preparation. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 212–220, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME). Markel, J. M., Opferman, S. G., Landay, J. A., & Piech, C. (2023).GPTeach: Interactive TA Training with GPT Based Students. Optional Readings: Shaikh, O., Chai, V., Gelfand, M. J., Yang, D., & Bernstein, M. S. (2024). Rehearsal: Simulating Conflict to Teach Conflict Resolution (arXiv:2309.12309). arXiv.	-
9	Mar 3 Tuesday	Deploying NLP Tools to Empower Teachers Lesson Planning Guest visit by Riz Malik, Coteach.ai	Required Reading: Malik, R., Abdi, D., Wang, R., & Demszky, D. (2025). Scaffolding middle school mathematics curricula with large language models. British Journal of Educational Technology, 56(3), 999-1027. Malik, R., Hao, R. L., Kacholia, R., & Demszky, D. (2025, July). Mathematikz: A dataset and benchmark for mathematical diagram generation. In Proceedings of the Twelfth ACM Conference on Learning@ Scale (pp. 95-104).	A3 due Monday at 5pm
9	Mar 5 Thursday	Frontiers and Open Questions	Watch the closing session of the AI education summit.	-
10	Mar 10 Tuesday	Final Presentations	No reading	-
10	Mar 12 Thursday	Final Presentations	No reading	Final paper due Monday, Mar 16 at 5pm

Stanford CS 293 / EDUC 473 | Empowering Educators via Language Technology

Stanford / Winter 2025-26

Instructors

Welcome!

Schedule

Overview

Course Info

Office Hours

Prerequisites

Academic Accommodations

Well-Being, Stress Management, & Mental Health