Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Beyond the Hint: Using Self-Critique to Constrain LLM Feedback in Conversation-Based Assessment

Abstract

Large language models used in conversation-based assessment often provide inappropriate hints that compromise validity. This paper demonstrates that self-critique -- a simple prompt engineering technique -- effectively constrains this behavior. Using synthetic conversations and high school math data, self-critique dramatically reduced the rate of inappropriate hints from 65.9% to 6.1%, balancing maintaining student engagement while ensuring fair comparisons, without requiring model fine-tuning.