Abstract
Automatic question generation has emerged as an effective and efficient method for incorporating formative practice into electronic textbooks on a large scale. This advancement, however, introduces new challenges in ensuring the quality of the generated questions. Traditionally, analyzing student responses has been effective in identifying low-quality questions. However, preemptively filtering out substandard questions before they reach students would be more desirable. In this study, we present preliminary findings on a promising technique that leverages a large language model (LLM) to identify potentially low-quality questions. Our hypothesis is that questions an LLM fails to answer correctly may contain quality issues, particularly since LLMs generally outperform students in answering automatically generated questions. Using a data set of questions from an open-source textbook, our method successfully identified nearly 30% of the questions that were rejected through analysis of student answer data. These results suggest that LLMs can be a valuable tool in improving the quality control process of automatically generated questions.