OpenAI trains new "CriticGPT" to critique GPT-4 outputs #AIcritique

OpenAI researchers have introduced CriticGPT, an AI model designed to identify mistakes in code generated by ChatGPT. This model aims to improve the alignment of AI systems with human expectations through Reinforcement Learning from Human Feedback (RLHF). CriticGPT acts as an assistant to human trainers, analyzing code and pointing out errors to make it easier for humans to spot mistakes. The model was trained on a dataset of code samples with intentional bugs to recognize and flag coding errors.

In experiments, CriticGPT demonstrated its ability to catch both inserted bugs and naturally occurring errors in ChatGPT’s output. The model’s critiques were preferred over those generated by ChatGPT itself in 63 percent of cases involving natural bugs, as it produced fewer unhelpful “nitpicks” and false positives. The researchers also developed a technique called Force Sampling Beam Search (FSBS) to help CriticGPT write more detailed code reviews.

Although CriticGPT shows promise, it has limitations, such as being trained on short ChatGPT answers and not being able to eliminate confabulations entirely. The model is most effective at identifying errors in specific code locations, posing a challenge for real-world mistakes spread across multiple parts of an answer. OpenAI plans to integrate CriticGPT-like models into its RLHF labeling pipeline to assist trainers in evaluating AI outputs. However, the researchers caution that extremely complex tasks may still be challenging for human evaluators, even with AI assistance.

Source link

Source link: https://arstechnica.com/information-technology/2024/06/openais-criticgpt-outperforms-humans-in-catching-ai-generated-code-bugs/

OpenAI trains new “CriticGPT” to critique GPT-4 outputs #AIcritique

#GenerativeAI hindered by tokens, impacting performance and accuracy. #Tokenization

Jennifer Bourke introduces Open-Ended Computing in NuNet | #InnovativeTech

Humanizing AI Content: Tips, Tricks, Top Tools for Success #AIContentHumanization

Create the top 2024 booking chatbot with Botpress #innovation

OpenAI app for macOS updates after non-encrypted chats fiasco. #privacyconcerns

How to Modify Your Bot’s Language in Botpress #LanguageChange

Try these 8 ChatGPT alternatives in 2024! #AIassistants

Complete guide on ChatGPT prompts for creating images. #AIcreation

WhatsApp testing new AI selfie generator tool with #selfieAI

The Future of Your Work in Brazil: Changes Ahead #IABrasil

#GenerativeAI hindered by tokens, impacting performance and accuracy. #Tokenization

Nationwide study: Deep learning for infectious keratitis diagnosis #AI

#Enhancing agricultural AI with YOLO for common bean disease. #AgriculturalAI

#SalesforceAIResearch introduces SummHay: Benchmark for evaluating long-context summarization #AIbenchmark

East Asian Languages Chapter by Henry Heng LUO, Jun 2024 #Languages

Enhancing Communication with AI Voice Tools for Efficiency #AIVoiceTools

Share this: