Training image-free computer vision system with illustrations for recognition. #ComputerVision

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) conducted a study to test the visual knowledge of large language models (LLMs) that are trained purely on text. They found that LLMs have a solid understanding of the visual world and can generate complex images using image-rendering code. The researchers observed that LLMs can refine their images when prompted to self-correct their code for different images.

To assess the visual knowledge of LLMs, the CSAIL team created a “vision checkup” using their “Visual Aptitude Dataset” to test the models’ abilities to draw, recognize, and self-correct visual concepts. They trained a computer vision system using the final drafts of these illustrations generated by LLMs.

The researchers found that LLMs can generate code for different shapes, objects, and scenes, and use this code to render simple digital illustrations. The study showed that LLMs can iteratively improve the rendering code to enhance the visual quality of the images they generate.

By combining the hidden visual knowledge of LLMs with other AI tools like diffusion models, the researchers believe that it could lead to more advanced artistic capabilities. The study demonstrates that LLMs, even without multimodal pre-training, can generate synthetic data to train computer vision systems that outperform other datasets trained with authentic photos.

The research is published on the arXiv preprint server, and the team believes that leveraging the visual knowledge of LLMs could enhance the capabilities of AI systems in various applications.

Source link

Source link: https://techxplore.com/news/2024-06-image-free-vision-real-photos.html

Training image-free computer vision system with illustrations for recognition. #ComputerVision

Earn $15000 monthly with AI tools for extra income #AIincome

Cyber attacks increasing with AI: SK Shieldus #security

Cyber attacks increasing with AI: SK Shieldus #security

#UncensoredRoleplayModel: New Dawn Llama-3 70B 32K Revolutionized.

Collaborative coding with Eduardoflx in July 2024 #VectoresStoresIndexing

Title: Request not satisfied: Error message displayed on webpage. Hashtag: #errormessage

#RapidDiagnosis of celiac disease using plasma Raman spectroscopy. #CeliacDetection

Revolutionizing Personal Well-being with AI in Mental Health Apps #AIRevolution

Apple to incorporate Google’s Gemini AI into iOS 18, macOS Sequoia by 2024 #AIintegration

Amazon introduces new AI benchmark to measure RAG performance. #AIbenchmark

Reshaping labor market as worker demands shift. #WorkforceTrends

Cyber attacks increasing with AI: SK Shieldus #security

Cyber attacks increasing with AI: SK Shieldus #security

#RapidDiagnosis of celiac disease using plasma Raman spectroscopy. #CeliacDetection

Amazon introduces new AI benchmark to measure RAG performance. #AIbenchmark

#Enhanced CNN-LSTM model predicts river electrical conductivity accurately. #Forecasting

East Asian Languages Chapter by Henry Heng LUO, Jun 2024 #Languages

Enhancing Communication with AI Voice Tools for Efficiency #AIVoiceTools

Share this: