Open-Source · Apache 2.0

ERNIE Image: Baidu's Open-Weight
Text-to-Image Model

ERNIE Image is an open text-to-image model built on an 8B single-stream Diffusion Transformer. It's specifically trained for cases that trip up most generators — like legible in-image text and structured layouts. Read our ERNIE Image review to see its performance, learn how to use ERNIE Image on a single consumer GPU, or check our pricing for flexible options.

Generate with Ernie Image

8B
DiT Parameters: Single-stream architecture; 0.8856
GENEval: Instruction following; 0.9733
LongTextBench: Text accuracy benchmark; 24G
Min. VRAM: Consumer GPU compatible

What Is ERNIE Image?

The straight answer — model architecture, what it's built for, and where it fits.

ERNIE Image is an open-source text-to-image generation model developed by the ERNIE team at Baidu. It uses an 8-billion-parameter single-stream Diffusion Transformer (DiT) and ships with a lightweight Prompt Enhancer that expands short user inputs into richer, structured descriptions before generation.

The model is designed for practical deployment — it runs on a single consumer GPU with 24G VRAM, not a cluster. Despite the compact parameter count, it reaches state-of-the-art performance among open-weight text-to-image models across several benchmarks.

It's released under Apache 2.0. That means the weights are free to download, use commercially, fine-tune, and redistribute — with no API dependency and no usage quota.

What ERNIE Image Is Built For

Six capabilities from the official model documentation — and what each one means in practice.

Render Dense, Layout-Sensitive Text Inside Images

ERNIE Image performs particularly well on long-form, layout-sensitive text — the kind that breaks most diffusion models. Posters with real headlines, infographics with data labels, and UI-like mockups with readable copy all come out clean.

LongTextBench 0.9733

Follow Complex Prompts Involving Multiple Objects

The model handles prompts with multiple objects, detailed spatial relationships, and knowledge-intensive descriptions — and doesn't collapse them into a generic output. GENEval 0.8856 puts it ahead of Qwen-Image and competitive with larger open-weight models.

GENEval 0.8856

Generate Posters, Comics, and Multi-Panel Compositions

Structured visual tasks are where ERNIE Image stands out among open-weight models. Posters, comic panels, storyboards, and multi-panel compositions come out with consistent layout logic — not just good-looking subjects dropped onto a canvas.

Cover Realistic, Design-Oriented, and Stylized Outputs

The model isn’t locked into one visual register. Realistic photography, clean design-oriented imagery, and distinctive stylized aesthetics are all within its range. You’re not choosing between “photo” and “art” mode — it handles both.

Run on a Consumer GPU — No Cloud Required

The full model runs on a single GPU with 24G VRAM — RTX 3090, RTX 4090, or A10G. That's local inference with no API dependency and no per-image cost. Self-hosting the checkpoint also means you control the data pipeline end to end.

24G VRAM · Single GPU

Expand Short Prompts with the Built-In Prompt Enhancer

A lightweight Prompt Enhancer ships alongside the main DiT. It takes brief user inputs and rewrites them into richer, structured descriptions before the model generates. The result: better output from short prompts, without prompt engineering overhead.

New to Ernie Image? Follow the step-by-step Ernie Image guide — from account setup to your first downloaded image in under three minutes.

Where to Download and Run ERNIE Image

Official weights on Hugging Face, ComfyUI workflow on GitHub — both under Apache 2.0.

Download from Hugging Face — Official Weights

The official checkpoint is hosted at baidu/ERNIE-Image on Hugging Face under Apache 2.0. Both the main SFT model and the Turbo variant are available. The Prompt Enhancer ships as a separate safetensors file in the same repository.

huggingface.co/baidu/ERNIE-ImageOpen on Hugging Face

Run in ComfyUI with the Official Workflow Template

ComfyUI added Day-0 support for ERNIE Image in April 2026. Load the safetensors checkpoint, add the Prompt Enhancer node, and it integrates with any standard ComfyUI pipeline. The workflow template is published on GitHub.

github.com/baidu/ernie-imageView on GitHub

ERNIE Image SFT vs Turbo — Which Version to Use

Two variants ship in the same release. Here's what's different and when to pick each one.

Standard

ERNIE Image SFT — Full Quality, 50-Step Generation

The SFT model is the standard release — 50 denoising steps, full instruction fidelity, and the strongest benchmark scores. Use it for final renders where text accuracy, layout precision, and output quality are non-negotiable.

Steps: 50
GENEval: 0.8856
LongTextBench: 0.9733
Best for: Final renders

Fast

ERNIE Image Turbo — 8-Step Drafts for Fast Iteration

ERNIE Image Turbo is a distilled variant trained with DMD (Distribution Matching Distillation) and reinforcement learning. It cuts generation down to 8 steps — fast enough to preview 20+ compositions before committing to a final render. Output quality is lower than SFT but sufficient for client reviews and direction exploration.

Steps: 8
Speed: ~6× faster
Training: DMD + RL
Best for: Drafts, iteration

ERNIE Image SFT vs Turbo — feature comparison
	ERNIE Image SFT	ERNIE Image Turbo
Steps	50	8
Speed	Baseline	~6× faster
Best for	Final renders	Drafts, iteration
GENEval	0.8856	Lower
LongTextBench	0.9733	Lower
Training method	SFT	DMD + Reinforcement Learning
Available on	Hugging Face	Hugging Face

For a full breakdown of output quality, benchmark scores, and pricing value, read the in-depth Ernie Image review.

Real Images Generated with ERNIE Image

Every image below was generated from a text prompt using ERNIE Image — from cinematic portraits to structured posters and bilingual compositions.

Celestial Moon Messenger
Desert Nomad Hunter
Female Dark Knight
Japanese Summer Park
Phone Illustration Blend
Rainy Izakaya Street — bilingual text rendering
Rooftop Assassin in Rain
Sea Witch Cave
Alphabet of Careers — poster with dense text rendering
Browser LLM Intro — structured layout
Car at Sunset Field

Is ERNIE Image free?

Yes. ERNIE Image is released under Apache 2.0 — the weights are free to download and use. Commercial use, fine-tuning, and redistribution are all permitted with no separate license purchase. There is no API quota or usage cap on the self-hosted version. For the web tool, Ernie Image credit plans start at $9.9 with no expiry date.

How does ERNIE Image compare to FLUX.1 or Midjourney?

For in-image text and structured layout generation, ERNIE Image leads most open-weight competitors. GENEval 0.8856 puts it competitive with FLUX.1 on instruction following. Midjourney produces stronger stylized aesthetics but is closed-source with no self-hosting option. ERNIE Image is the stronger choice when text accuracy and layout control matter more than visual style range.

Can I use ERNIE Image outputs commercially?

Yes. The Apache 2.0 license permits commercial use of both the model weights and generated outputs. Ads, product imagery, print, and resale are all allowed. No additional commercial license is needed.

What GPU do I need to run ERNIE Image locally?

ERNIE Image requires a GPU with 24G VRAM for the full SFT model — RTX 3090, RTX 4090, and A10G all work. The Turbo variant is faster and has lower memory requirements. The model runs on a single GPU; no multi-GPU setup is needed.

Does ERNIE Image work with ComfyUI?

Yes. ComfyUI added official Day-0 support for ERNIE Image in April 2026. The model loads as a standard safetensors checkpoint. Baidu published a workflow template on GitHub that includes the Prompt Enhancer node. It's compatible with standard ComfyUI custom nodes.

What languages can I use for prompts?

ERNIE Image supports English, Chinese, and Japanese prompts. In-image text renders cleanly in both English and Chinese in the same generation pass. Benchmark scores are comparable across languages — OneIG-EN 0.5750 vs OneIG-ZH 0.5543 — so there’s no meaningful quality gap between the two.

Official ERNIE Image Resources

Everything in one place — model weights, code, documentation, and the online demo.