The Small Language Model Phi-3 from Microsoft was trained on a dataset called TinyStories, which aimed to provide more diverse and qualitative elements found in natural language. The model was trained on synthetic data generated by GPT-3.5 and GPT-4 to avoid repetitive and similar training data. Despite being smaller and more restricted, Phi-3 exhibited behaviors similar to larger language models. The creators of Phi-3 focused on high-quality data, starting with a dataset of 3,000 words with equal numbers of nouns, verbs, and adjectives. They then asked a large language model to create children’s stories using one word from each category, generating millions of tiny stories. This approach allowed the model to be trained in less than a day on a single GPU. The study highlights the importance of creating a framework for generating synthetic training data for language models, showing that it can lead to effective training and diverse outputs.
Source link
Source link: https://cobusgreyling.medium.com/tinystories-4ce620e569a4?source=rss——large_language_models-5
in AI Medium
Microsoft’s Small Language Model: Tiny Stories in 9 Words #TinyStories
![TinyStories. The Small Language Model from Microsoft… | by Cobus Greyling | Jun, 2024](https://i0.wp.com/webappia.com/wp-content/uploads/2024/06/1nuVLWkbwzhwyPdhU26G5Lg.png?fit=758%2C1085&quality=80&ssl=1)
GIPHY App Key not set. Please check settings