



Extracting knowledge from a large supervised model to a lightweight model is a widely successful approach for generating compact and powerful models in semi-supervised learning environments where limited amounts of labeled data are available. However, in large applications, teachers tend to provide a large number of inappropriate soft labels that impair student performance. Furthermore, the very large size of the teacher limits the number of soft labels that can be queried due to prohibitive computational and financial costs. The difficulty of simultaneously achieving efficiency (i.e. minimizing soft-label queries) and robustness (i.e. training with the correct soft-labels) makes knowledge distillation widely applicable to many modern tasks. hindering it. In this paper, we present a parameter-free approach with provable guarantees to query the soft labels of points that are both informative and correctly labeled by teachers. Central to our work is a game theory formulation that explicitly considers the inherent trade-off between usefulness and accuracy of input instances. We establish bounds on the expected performance of the approach, which holds even in the worst-case distilled instances. We present empirical evaluations on common benchmarks that demonstrate the enhanced distillation performance enabled by our study compared to state-of-the-art active learning and active distillation methods.

