Microsoft Research Unveils Lens: Efficient Image Generator Using 800M Detailed Captions

By Garp EditorialJune 9, 2026Updated June 20, 2026

1 min read Section: AI News Publisher: Garp

Microsoft Research Unveils Lens: Efficient Image Generator Using 800M Detailed Captions

Microsoft Research presents Lens, a text-to-image model with just 3.8 billion parameters that matches much larger rivals on benchmarks, at a fraction of the training cost. The secret sauce: 800 million detailed image captions generated by GPT-4.1 instead of vague web alt-text.

Code and weights are openly available under an open-source license.

Microsoft Research Unveils Lens: Efficient Image Generator Using 800M Detailed Captions

Related Coverage

WhatsApp gets new chief as Meta taps India’s CRED founder Kunal Shah, and invests $900M in startup

How Intrinsic eliminates manual robot coding

GM installs robots at flagship EV factory after laying off 1,300 workers

Bear Robotics acquires Kinisi Robotics to boost its physical AI capabilities