Menu
in

Training image-free computer vision system with illustrations for recognition. #ComputerVision

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) conducted a study to test the visual knowledge of large language models (LLMs) that are trained purely on text. They found that LLMs have a solid understanding of the visual world and can generate complex images using image-rendering code. The researchers observed that LLMs can refine their images when prompted to self-correct their code for different images.

To assess the visual knowledge of LLMs, the CSAIL team created a “vision checkup” using their “Visual Aptitude Dataset” to test the models’ abilities to draw, recognize, and self-correct visual concepts. They trained a computer vision system using the final drafts of these illustrations generated by LLMs.

The researchers found that LLMs can generate code for different shapes, objects, and scenes, and use this code to render simple digital illustrations. The study showed that LLMs can iteratively improve the rendering code to enhance the visual quality of the images they generate.

By combining the hidden visual knowledge of LLMs with other AI tools like diffusion models, the researchers believe that it could lead to more advanced artistic capabilities. The study demonstrates that LLMs, even without multimodal pre-training, can generate synthetic data to train computer vision systems that outperform other datasets trained with authentic photos.

The research is published on the arXiv preprint server, and the team believes that leveraging the visual knowledge of LLMs could enhance the capabilities of AI systems in various applications.

Source link

Source link: https://techxplore.com/news/2024-06-image-free-vision-real-photos.html

Leave a Reply

Exit mobile version