A deep dive into a massive video model #AIModel

In this episode of the AI + a16z podcast, Luma Chief Scientist Jiaming Song discusses his career in video models and the release of Luma’s Dream Machine 3D video model with a16z General Partner Anjney Midha. The model demonstrates reasoning capabilities due to being trained on a large volume of high-quality video data. Jiaming explains the “bitter lesson” of training generative models, emphasizing the importance of using more compute power rather than developing priors. He highlights the shift towards using deep learning features in language and vision tasks, emphasizing the limitations of language data compared to visual data. Jiaming argues that scaling up data efforts for language models is challenging due to the limited sources of high-quality language data. He suggests that language itself is a prior in the face of richer data signals from the physical world. The discussion delves into the future of multimodal models and the potential for using more compute power to enhance AI capabilities.

Source link

Source link: https://a16z.com/podcast/beyond-language-inside-a-hundred-trillion-token-video-model/