in

Custom dataset used to fine-tune Vision Language Model #MLStory

[ML Story] Fine-tune Vision Language Model on custom dataset | by Nitin Tiwari | Apr, 2024

The era of LLMs is marked by new language models emerging frequently, such as Google’s Gemini and Gemma, Meta’s Llama 3, and Microsoft’s Phi-3. These tech giants are opening up some of these models to the developer community, allowing for fine-tuning for specific use cases. One such model is the Idefics2-8B Vision Language Model by Hugging Face, which supports multi-modality and can answer questions about images, describe visual content, and more.

Creating a custom dataset for fine-tuning Vision Language Models involves data preparation, loading the dataset, configuring LoRA adapters, creating a data collator, setting up training parameters, and starting the training process. Techniques like LoRA and QLoRA help in efficient fine-tuning of large models by reducing the number of trainable parameters, conserving memory usage, and accelerating the fine-tuning process.

By following these steps, developers can train models like the Idefics2-8B for specific tasks like visual question answering. Fine-tuning models on custom datasets can lead to better results, although the extent of training may be limited by hardware resources.

Source link

Source link: https://tiwarinitin1999.medium.com/ml-story-fine-tune-vision-language-model-on-custom-dataset-8e5f5dace7b1?source=rss——large_language_models-5

What do you think?

Leave a Reply

GIPHY App Key not set. Please check settings

Microsoft, Google AI Just Make Middle Management Easy, AI Expert Says

AI experts say Microsoft and Google AI simplify middle management. #Efficiency

Apple Negotiating With OpenAI and Google to Bring AI to Next iPhone: Report

Apple in talks with OpenAI and Google for AI integration. #AIintegration