Menu
in

DeepMind releases benchmark for evaluating long-context language models #AI

Large language models (LLMs) with long context windows have made it easier to develop advanced AI applications using simple prompting techniques. Google DeepMind has introduced a benchmark called Long-Context Frontiers (LOFT) to evaluate the performance of long-context language models (LCML). LOFT is designed for tasks with very long prompts and can help compare LLMs as context windows expand.

Previously, LLMs required specialized techniques for customization, but with long-context language models, the entire corpus or training examples can be inserted into the prompt for the model to learn the task. Techniques like adding instructions and chain-of-thought reasoning can further enhance the model’s capabilities. Current evaluation methods for LCMLs are limited and do not adequately test them on paradigm-shifting tasks.

LOFT consists of six tasks with 35 datasets to evaluate LCLMs on real-world tasks, supporting context windows of varying lengths. It aims to open up research on long-context prompting with Corpus-in-Context (CiC) Prompting, combining strategies to activate LCLMs for learning, retrieving, and reasoning over in-context corpora. CiC prompting is compatible with prefix-caching in autoregressive language models, allowing for efficient computation.

DeepMind evaluated various models on LOFT and found that LCLMs can match the performance of specialized models in certain tasks but lag in complex reasoning tasks. There is room for improvement in optimizing LCLMs for tasks with large in-context corpora. LOFT provides a testing ground to measure progress in long-context modeling and highlights the potential of LCLMs in advanced AI applications.

Source link

Source link: https://bdtechtalks.com/2024/07/01/deepmind-loft-long-context-llm/amp/

Leave a Reply

Exit mobile version