Abstract

The Content Improvement Service (CIS) is a platform-level adaptive system that monitors millions of automatically generated formative practice questions that are available to students as a study feature in thousands of textbooks. The CIS was designed to do what was not possible by humans—use real-time data to monitor question performance at enormous scale and determine if a change is required. The CIS was designed to use multiple types of data and analyses to make its decisions, including question difficulty (mean score) and student feedback (ratings). In this paper, we outline the decisions made by the CIS for both methods. We also show how human investigation of these analyses can identify trends and insights useful to automatic question generation systems.