Research

Research – BAISed

Our Research

Uncovering and documenting how AI systems perpetuate educational inequity through experimental evidence and rigorous analysis

Evidence of Bias

Inner-City Public School
2.82
Average Score
Elite Private School
3.25
Average Score

Same essay, different school descriptor = 15% score difference

Implicit Bias Runs Deep

Our research reveals that AI language models don’t just reflect bias—they amplify it. Even when developers implement guardrails against explicit bias, subtle patterns persist.

When we tested ChatGPT with identical student essays but varied the contextual information—such as school type, behavioral history, or even music preferences—the AI consistently assigned different scores based on these irrelevant factors.

This isn’t just about numbers. It’s about how these biases could impact real students’ educational opportunities and self-perception.

Published Research

Analyzing AI-Generated Feedback: A Mixed Methods Examination of ChatGPT’s Responses to Student Writers

Nicole Oster and Melissa Warr

Society for Information Technology and Teacher Education (SITE) (2025)

We looked at how ChatGPT gives feedback to student writers and found that it sometimes changes its tone depending on a student’s background. Specifically, when students were described as Hispanic, the AI used more authoritative language—offering feedback that sounded more like commands than suggestions. This shows that even when AI seems helpful, it can quietly reinforce biases, and we need to stay alert and thoughtful in how we use it in education. .

Uncovering the Hidden Curriculum of AI: A Reflective Technology Audit for Teacher Educators

Melissa Warr and Marie Heath

Journal of Teacher Education (2025)

Not only do AI tools show bias when grading student work, the text feedback they give to students shows disturbing patterns. We explore how generative AI can carry hidden biases that reflect and reinforce unfair social patterns—especially in schools. Our findings show that not only can AI show bias in grading, but also these tools may vary patterns of feedback text in response to racial descriptions of students. We believe it’s crucial for educators to recognize these patterns so we can use AI in ways that are more fair and thoughtful.

Implicit Bias in Large Language Models: Experimental Proof and Implications for Education

Melissa Warr, Nicole Jakubczyk Oster, and Roger Isaac

Journal of Research on Technology in Education (2024)

We provide experimental evidence of implicit racial bias in ChatGPT 3.5 in the context of educational tasks. Our findings indicate that descriptions of students as Black or White lead to significantly higher scores compared to race-neutral or Hispanic descriptors, suggesting that ChatGPT’s outputs are influenced by racial information.

Is ChatGPT Racially Biased? The Case of Evaluating Student Writing

Melissa Warr, Margarita Pivovarova, Punya Mishra, and Nicole Oster

Preprint (2024)

By manipulating racial descriptors in prompts, we assessed differences in scores given by two ChatGPT models. Results show statistically significant differences in essay scores when hypothetical student race is mentioned, with patterns varying by ChatGPT version and prompt order.

Beat Bias? Personalization, Bias, and Generative AI

Melissa Warr

SITE 2024 Conference Proceedings

This paper explores how bias presents itself when LLMs attempt to personalize learning experiences. Through experimental studies, we demonstrate that even seemingly innocent personalization factors like music preferences can trigger biased responses in AI evaluation of student work.

Use Our Research

Help create more equitable AI in education by testing systems and sharing findings

Test AI Bias Design Critical AI Activities
Scroll to Top