top of page

Eve Fleisig

Eve Fleisig

PhD student at UC Berkeley | AI Ethics + NLP

I'm a fifth-year PhD student in computer science at UC Berkeley, advised by Dan Klein. My research lies at the intersection of natural language processing (NLP) and AI ethics: how can we design language models that we trust to benefit all users, without perpetuating societal harms? To do so, I work on training and evaluating language models to serve complex distributions of users with varied needs. This includes learning from informative disagreement among users, evaluating discrimination against users who speak differently, and designing frameworks for LLMs that serve populations with many perspectives.

Previously, I earned a BSE in computer science with a minor in linguistics at Princeton University, advised by Christiane Fellbaum.

My research is supported by an NSF Graduate Research Fellowship and Berkeley Chancellor's Fellowship.

I'm on the academic job market this year! You can find my research, teaching, and diversity statements here.

Recent News

✈️ I'm attending EMNLP and NeurIPS 2025. Come say hi!
💬 [Dec '25] Invited panelist for the NeurIPS '25 Science of Benchmarking tutorial.
💬 [Nov '25] Invited panelist for NLPerspectives at EMNLP '25.
💬 [Oct '25] Invited to the 2025 RCAIS doctoral consortium.
✈️ [Oct '25] Visiting Dirk Hovy at the Milan NLP lab this fall.
💬 [Aug '25] Invited talk at Edinburgh NLP on GRACE.
🏆 [July '25] My 3-minute thesis won 2nd place at the LSA Summer Institute.
🏆 [May '25] Our AdvScore paper won Outstanding Paper at NAACL '25.
💬 [May '24] Invited talk at Stanford NLP on Linguistic Bias in ChatGPT.

Selected Work

Please see Google or Semantic Scholar for an up-to-date list.

Balancing Quality and Variation: Spam Filtering Distorts Data Label Distributions

Eve Fleisig*, Matthias Orlikowski*, Philipp Cimiano, Dan Klein

GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration
Yoo Yeon Sung*, Eve Fleisig*, Yu Hou, Ishan Upadhyay, Jordan Boyd-Graber (ACL 2025).

Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness
Yoo Yeon Sung, Maharshi Gor, Eve Fleisig, Ishani Mondal, Jordan Boyd-Graber (NAACL 2025 - Outstanding Paper Award).

[ PAPER ] [ TWITTER ]

Linguistic Bias in ChatGPT: Language Models Reinforce Dialect
Discrimination

Eve Fleisig*, Genevieve Smith*, Madeline Bossi*, Ishita Rustagi*, Xavier Yin*, Dan Klein (EMNLP 2024).

[ PAPER ] [ TWITTER ] [ BLOG ]

Mapping Social Choice Theory to RLHF

Jessica Dai, Eve Fleisig (R2FM @ ICLR 2024).

[ ARXIV ] [ TWITTER ]

When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks

Eve Fleisig, Rediet Abebe, Dan Klein (EMNLP 2023).

[ PAPER ] [ TWITTER ]

Incorporating Worker Perspectives into MTurk Annotation Practices for NLP

Olivia Huang, Eve Fleisig, Dan Klein (EMNLP 2023 - Outstanding Paper Award).

Ghostbuster: Detecting Text Ghostwritten by Large Language Models

Vivek Verma, Eve Fleisig, Nicholas Tomlin, Dan Klein (NAACL 2024).

[ PAPER ] [ TWITTER ] [ BLOG ]

The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels

Eve Fleisig, Su Lin Blodgett, Dan Klein, Zeerak Talat (NAACL 2024).

Hedges and Apologies in ChatGPT Responses to African-American English

Eve Fleisig (NWAV 2023).

[ SLIDES ] [ ABSTRACT ]

First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models

Naomi Saphra, Eve Fleisig, Kyunghyun Cho, Adam Lopez (NAACL 2024).

[ PAPER ] [ TWITTER ]

FairPrism: Evaluating fairness-related harms in text generation

Eve Fleisig, Aubrie Amstutz, Chad Atalla, Su Lin Blodgett, Hal Daumé III, Alexandra Olteanu, Emily Sheng, Dan Vann, Hanna Wallach (ACL 2023).

[ PAPER ] [ DATASET ]

Mitigating Gender Bias in Machine Translation through Adversarial Learning

Eve Fleisig, Christiane Fellbaum
Outstanding Senior Thesis Award; Sigma Xi Book Award

Mentorship

I've mentored some wonderful undergraduate students, including Samuel Ghezae, Olivia Huang (→Citadel), Harbani Jaggi (→Applied Intuition), Kayla Lee (→YC startup founder), Kashyap Murali (→Anthropic), Vyoma Raman (→Stanford), Mahathi Ryali, Zaina Shaik (→Amazon), Vivek Verma (→OpenAI), and Xavier Yin (→CMU).

I am not currently taking on new undergraduates for research projects. However, I maintain a resource guide for students interested in NLP research, and I'm happy to chat about research or anything else!

Interested in chatting or collaborating?

Reach out to me at efleisig :at: berkeley :dot: edu

Miscellaneous

My surname is pronounced /'flʌɪsɪg/ ("fly"-sihg).

I'm a member of ACF, a volunteer-run organization that produces high-quality collegiate quizbowl tournaments.

🇦🇷¡Siempre estoy feliz de charlar con otros latinoamericanos!

News

Papers

bottom of page