Anthropic researchers question AI ethics, wearing down with repetition. #ethics

China's Moonshot AI zooms to $2.5B valuation, raising $1B for an LLM focused on long context

Anthropic researchers have discovered a new “jailbreaking” technique for large language models (LLMs) that allows them to answer inappropriate questions, such as how to build a bomb. This technique, called “many-shot jailbreaking,” involves priming the model with numerous less-harmful questions first. The vulnerability arises from the increased “context window” of the latest LLM generation, allowing them to hold more data in short-term memory.

The researchers found that LLMs with large context windows perform better on tasks when there are many examples of that task in the prompt. By asking the model numerous trivia questions, its answers improve over time. Surprisingly, the models also become better at replying to inappropriate questions after being primed with a series of less harmful questions.

The team has informed the AI community about this vulnerability and is working on mitigating it by classifying and contextualizing queries before they reach the model. However, limiting the context window negatively impacts the model’s performance. The researchers hope that sharing exploits like this will lead to a culture of openness among LLM providers and researchers.

Overall, this discovery highlights the complexity of LLMs and the challenges of ensuring their security. As AI technology advances, staying ahead of potential vulnerabilities and finding ways to protect against them will be crucial.

Source link

Source link:

What do you think?

Leave a Reply

GIPHY App Key not set. Please check settings

Hunter Biden loses out on each motion to dismiss tax case

#HunterBiden loses multiple motions in tax evasion case. #Justice

Microsoft And OpenAI May Build $100B Supercomputer Campus

Microsoft, OpenAI plan $100B supercomputer campus; #innovation.