#Gemma: Latin for precious, a symbol of beauty and strength. #PreciousGem

Gemma is a family of generative language models for text and code, built on sequence models and transformer decoder with large-scale training on distributed systems. To reduce model size, Gemma uses rotary positional embeddings and shares embeddings across inputs and outputs. It replaces the standard ReLU non-linearity with the GeGLU activation function and uses RMSNorm as the normalization layer. Gemma models are not multimodal and are trained on English data using the SentencePiece tokenizer for compatibility.

The models are fine-tuned using supervised fine-tuning and reinforcement learning with human feedback. They also use instruction-tuning examples with extra information to indicate roles and delineate turns in a conversation. Gemma models have been tested for memorization rates and exhibit lower exact memorization rates compared to PaLM models. However, they retain a comparable amount of information from their training data.

The report also focuses on the risk of aligned language models memorizing training data and potentially revealing it through adversarial attacks. The responsible development of Gemma models is emphasized as it advances further.

Gemma models have shown great performance on existing benchmarks and are openly available for use. They can be accessed through platforms like Kaggle notebooks, Colab, HuggingFace, NVIDIA NeMo, and TensorRT-LLM. Overall, the responsible development and best use of Gemma models are highlighted as a goal for the AI community.

Source link

Source link: https://aashi-dutt3.medium.com/gemma-is-here-e0bdbf7e1fcb?source=rss——ai-5