Large language models are known for solving complex tasks, but they often miss the mark in simple conversations. This happens because most LLMs are trained and evaluated using one-off prompts rather than through back-and-forth dialogues. As a result, they may skip asking clarifying questions or make incorrect assumptions, which affects real-world usability.
To improve this Microsoft Research introduced CollabLLM, a new training method that focuses on how models perform across full conversations. Instead of training on single responses, CollabLLM simulates multi-turn dialogues with user models, helping the AI learn when to ask questions, how to adjust tone, and how to engage effectively. Rewards are given based on how well each response contributes to the overall success of the conversation, using measures like goal completion and user engagement.
Through reinforcement learning methods such as PPO and DPO, the model updates itself based on feedback from these simulated interactions. In tests, including a 201-person user study on document co-creation, CollabLLM led to better outcomes than standard training methods, with improved interaction ratings and faster task completion.
Leave a comment