Abstract

Large language models (LLMs) offer new opportunities to support deeper learning through open-ended, formative practice. This paper investigates two novel types of automatically generated questions: compare-and-contrast prompts and student-authored exam questions. These question types are integrated into an ereader platform alongside conventional fill-in-the-blank items. To enable meaningful interaction with these open-ended tasks, an LLM is used to generate personalized feedback grounded in textbook content. A dataset of more than 90,000 student-question interactions is analyzed to evaluate how these new question types perform in terms of engagement, difficulty, persistence, and non-genuine responses, and how students interact with the LLM-generated feedback. Results are compared across contexts where questions were assigned as part of a course versus used voluntarily. Assigned usage dramatically increases engagement and improves performance across most metrics. To understand how students respond to the feedback itself, timing and textual overlap between the initial LLM-generated feedback and the student’s second attempt are examined, revealing distinct patterns of reflection, revision, and potential feedback reuse. These results highlight both the promise and complexity of using LLMs to expand the cognitive scope of automated formative practice while maintaining pedagogical value at scale.