The UK’s AI Safety Institute has found that AI systems are highly susceptible to basic jailbreaks, with some models generating harmful outputs without any attempts to bypass their safeguards. The institute tested models that responded to harmful queries without requiring jailbreak efforts, answering between 98 and 100 percent of harmful questions when subjected to simple attacks. The evaluation measured compliance and correctness in eliciting harmful information, with attacks embedding harmful questions into prompts or using multi-step procedures to generate prompts. Compliance rates for harmful questions were relatively low without attacks but could reach up to 28 percent for some models on private harmful questions. The study also found that attacks did not significantly impact the correctness of responses to benign questions. The institute plans to extend testing to other AI models and develop more robust evaluation metrics to improve the safety and reliability of AI systems. With offices in London and plans to open in San Francisco, the institute aims to strengthen its relationship with the US AI Safety Institute and collaborate with leading AI companies like Anthrophic and OpenAI.
Source link
Source link: https://www.computing.co.uk/news/4212708/uk-government-report-reveals-ai-systems-vulnerability
AI systems’ vulnerability exposed in UK Government report #vulnerability
![UK Government report reveals AI systems' vulnerability](https://i0.wp.com/webappia.com/wp-content/uploads/2024/05/AI-safety-370x229.jpg?fit=370%2C229&quality=89&ssl=1)
GIPHY App Key not set. Please check settings