Microsoft Unveils a Small Phi-3 Vision Multimodal Model


In April 2024, Microsoft introduced its first AI model in the open-source Phi-3 family: Phi-3 Mini. Now, nearly a month later, the company has unveiled a new compact multimodal model called Phi-3 Vision. During Build 2024, Microsoft also presented two more Phi-3 family models: Phi-3 Small (7B) and Phi-3 Medium (14B). All these models are available under the open-source MIT license.

The Phi-3 Vision model is trained on 4.2 billion parameters, making it relatively lightweight. This is a significant milestone as it’s the first time a major corporation like Microsoft has open-sourced a multimodal model. It has a context length of 128K and can process images. Unlike Google’s PaliGemma model, which isn’t designed for conversational use, Phi-3 Vision is versatile.

Microsoft mentions that the Phi-3 Vision model was trained on publicly available, high-quality educational and code data, supplemented by synthetic data for math, reasoning, general knowledge, charts, tables, diagrams, and slides.

Despite its smaller size, the Phi-3 Vision model outperforms various models like Claude 3 Haiku, LlaVa, and Gemini 1.0 Pro on several multimodal benchmarks and comes near to OpenAI’s GPT-4V model. Microsoft indicates that developers can use the Phi-3 Vision model for general image understanding, OCR tasks, chart and table interpretation and much more.

If you want to check out the Phi-3 Vision model, visit Azure AI Studio (visit).

Comments

comments


What's Your Reaction?

hate hate
0
hate
confused confused
0
confused
fail fail
0
fail
fun fun
0
fun
geeky geeky
0
geeky
love love
0
love
lol lol
0
lol
omg omg
0
omg
win win
0
win

0 Comments

Your email address will not be published. Required fields are marked *