Abstract

Advances in artificial intelligence and automatic question generation (AQG) have made it possible to generate the volume of formative practice questions needed to engage students in learning by doing. These automatically generated (AG) questions can be integrated with textbook content in a courseware environment so that students can practice as they read. Scaling this learn by doing method is a valuable pursuit, as it is proven to cause better learning outcomes (i.e., the doer effect). However, it is also necessary to ensure these AG questions perform equally as well as human-authored (HA) questions. In previous studies, it was found that AG and HA questions were essentially equivalent with respect to student engagement, difficulty, and persistence. While these question performance metrics expanded existing AQG research, this paper further extends this research by evaluating question discrimination using student data from a university Neuroscience course. It is found that the AG questions also perform as well as HA questions with respect to discrimination.