Improving automatically generated fill-in-the-blank answer selection with an LLM-based agreement filter

Abstract

Automatic question generation (AQG) systems that apply AI to create formative practice items at scale have proven to be effective for providing students a learning by doing approach in etextbook environments, overcoming the barriers to scaling this learning science-based method. This paper investigates whether a large language model (LLM) can improve the selection of answer terms in fill-in-the-blank questions (currently selected via a rule-based system) by focusing on both domain relevance and sentence-level nuances. Drawing on a dataset of over 1.3 million student-question sessions, an explanatory logistic regression model tested the causal hypothesis that, conditional on pre-treatment features, questions for which the LLM and the rule-based system agree on the answer blank will receive more favorable ratings from students. Results reveal that agreement corresponds to a 31% decrease in likelihood of a thumbs down rating, controlling for previously identified causal factors. Rather than replacing the rule-based system, LLM-rule agreement serves as a signal for questions students perceive as higher quality. These findings offer initial evidence that incorporating an LLM-based agreement filter into an established AQG pipeline can enhance question quality while preserving factual accuracy.

Improving automatically generated fill-in-the-blank answer selection with an LLM-based agreement filter

Abstract

Date

Authors

Research Areas

Conference

Citation

Related Publications

Search our catalog of recent publications authored by our team.

Improving automatically generated fill-in-the-blank answer selection with an LLM-based agreement filter

Open-Ended Questions Need Personalized Feedback: Analyzing LLM-Enabled Features with Student Data

Scaling Effective Characteristics of ITSs: A Preliminary Analysis of LLM-Based Personalized Feedback

Learn by doing: An exploration of AI-generated formative practice

Refining sentence selection for automatic cloze question generation with large language models

Scaling the Doer Effect: A replication analysis using AI-generated questions

Stay current on the latest research and news

Improving automatically generated fill-in-the-blank answer selection with an LLM-based agreement filter

Abstract

Date

Authors

Research Areas

Conference

Citation

Share

Related Publications

Search our catalog of recent publications authored by our team.

Improving automatically generated fill-in-the-blank answer selection with an LLM-based agreement filter

Open-Ended Questions Need Personalized Feedback: Analyzing LLM-Enabled Features with Student Data

Scaling Effective Characteristics of ITSs: A Preliminary Analysis of LLM-Based Personalized Feedback

Learn by doing: An exploration of AI-generated formative practice

Refining sentence selection for automatic cloze question generation with large language models

Scaling the Doer Effect: A replication analysis using AI-generated questions

Stay current on the latest research and news

Learn by doing: An exploration of AI-generated formative practice 

Refining sentence selection for automatic cloze question generation with large language models