Elon Musk’s xAI has unveiled its inaugural multimodal model. Not only can it understand text, but it’s also capable of processing documents, diagrams, charts, screenshots and photographs seen throughout documents or in documents themselves. Grok-1.5 Vision will soon become available to early testers and existing Grok users for trial use.

“Grok-1.5V can compete with current frontier multimodal models in many areas, from multidisciplinary reasoning to understanding documents, science diagrams, charts, screenshots and photographs,” according to a blog post from the company.

Today’s unveiling comes shortly after xAI unveiled their upgraded chatbot model Grok-1.5 earlier in October.

Grok-1.5V’s capabilities are demonstrated through seven examples, from translating whiteboard sketches of flowcharts into Python code to creating bedtime stories based on children’s drawings to explaining memes, converting tables to CSV formats, and even detecting whether your deck needs replacing due to rotted wood.

xAI claims its multimodal model outshone competitors GPT-4V, Claude 3Sonnet, Claude 3 Opus and Gemini Pro 1.5 during tests conducted against peers such as GPT-4V, Claude 3Sonnet, Claude 3 Opus and Gemini Pro 1.5. Of particular note was its Grok-1.5V outperforming its peers on its RealWorldQA benchmark metric designed to evaluate real world spatial understanding.

RealWorldQA began training using over 700 images with associated questions-and-answers, such as anonymized car images or real world samples. xAI will make RealWorldQA available to the public under a Creative Commons license.

Musk’s AI company continues to make advances as it seeks to compete with OpenAI and other market leaders since introducing its Grok chatbot in November 2023. Grok-1.5V arrives just over one month after making it open source; however, this advancement hasn’t come without controversy: this month researchers revealed it could instruct users in criminal activities.

However, xAI continues its pursuit of creating “beneficial [artificial general intelligence] capable of understanding the universe. They announce “significant” updates to Grok AI’s multimodal understanding and generation capabilities over the coming months.

venturebeat.org
ningmenggege@outlook.com

Leave a Reply

Your email address will not be published. Required fields are marked *