in

Mini versions of LLMs pre-training: GPT and Llama3 #NLP

Pre-training Mini Versions of LLMs — GPT and Llama3 | by Subrata Goswami | Jun, 2024

The content discusses three different models – nano_gpt, torch_gpt, and mini_llama3, which are smaller versions of LLMs with different parameters and setups. Each model is fully contained in one file and has specific characteristics in terms of tokenization, embedding, and data flow. The training code and parameter initialization are similar for all three models, and they can be trained together due to their small size. The mini_llama3 model uses bf16 and has a smaller checkpoint size compared to the other two models. The content also includes a comparison of validation loss versus iteration number for the three models, showing that mini_llama3 converges quickly but may overfit. Additionally, example generations from the three models are provided. The models are pre-trained on a dataset called shakespeare_char and can be easily pre-trained on a consumer-grade GPU. The models use different tokenizers and embedding layers, with mini_llama3 incorporating ColumnParallelLinear, RowParallelLinear, and VocabParallelEmbedding layers from Meta’s fairscale package. Further parameter search and tuning are expected to improve the validation loss for mini_llama3.

Source link

Source link: https://whatdhack.medium.com/pre-training-mini-versions-of-llms-gpt-and-llama3-7cf69ac00280?source=rss——llm-5

What do you think?

Leave a Reply

GIPHY App Key not set. Please check settings

OpenAI Could Become a For-Profit Business

OpenAI’s potential transition to for-profit business model with #AIprofits

AI is not a magic wand – it has built-in problems that are difficult to fix and can be dangerous

AI’s inherent issues pose risks and challenges, not magic. #AIrisks