Microsoft's Small Language Model: Tiny Stories in 9 Words #TinyStories

The Small Language Model Phi-3 from Microsoft was trained on a dataset called TinyStories, which aimed to provide more diverse and qualitative elements found in natural language. The model was trained on synthetic data generated by GPT-3.5 and GPT-4 to avoid repetitive and similar training data. Despite being smaller and more restricted, Phi-3 exhibited behaviors similar to larger language models. The creators of Phi-3 focused on high-quality data, starting with a dataset of 3,000 words with equal numbers of nouns, verbs, and adjectives. They then asked a large language model to create children’s stories using one word from each category, generating millions of tiny stories. This approach allowed the model to be trained in less than a day on a single GPU. The study highlights the importance of creating a framework for generating synthetic training data for language models, showing that it can lead to effective training and diverse outputs.

Source link

Source link: https://cobusgreyling.medium.com/tinystories-4ce620e569a4?source=rss——large_language_models-5