OpenAI trains new "CriticGPT" to critique GPT-4 outputs #AIcritique

OpenAI researchers have introduced CriticGPT, an AI model designed to identify mistakes in code generated by ChatGPT. This model aims to improve the alignment of AI systems with human expectations through Reinforcement Learning from Human Feedback (RLHF). CriticGPT acts as an assistant to human trainers, analyzing code and pointing out errors to make it easier for humans to spot mistakes. The model was trained on a dataset of code samples with intentional bugs to recognize and flag coding errors.

In experiments, CriticGPT demonstrated its ability to catch both inserted bugs and naturally occurring errors in ChatGPT’s output. The model’s critiques were preferred over those generated by ChatGPT itself in 63 percent of cases involving natural bugs, as it produced fewer unhelpful “nitpicks” and false positives. The researchers also developed a technique called Force Sampling Beam Search (FSBS) to help CriticGPT write more detailed code reviews.

Although CriticGPT shows promise, it has limitations, such as being trained on short ChatGPT answers and not being able to eliminate confabulations entirely. The model is most effective at identifying errors in specific code locations, posing a challenge for real-world mistakes spread across multiple parts of an answer. OpenAI plans to integrate CriticGPT-like models into its RLHF labeling pipeline to assist trainers in evaluating AI outputs. However, the researchers caution that extremely complex tasks may still be challenging for human evaluators, even with AI assistance.

Source link

Source link: https://arstechnica.com/information-technology/2024/06/openais-criticgpt-outperforms-humans-in-catching-ai-generated-code-bugs/