Enhance Image Generation with Presets
Goal: Create a preset that wraps a text model around the openrouter:image_generation server tool. The text model rewrites vague user requests into detailed image prompts, then calls the tool. You get better images from a single API call.
Outcome: A reusable preset slug (@preset/your-slug) that any app can call like a model. The user sends “make a watermelon hippo,” and the text model crafts a detailed visual description (materials, lighting, composition, style) before generating the image.
Before you start
You need:
- An OpenRouter API key (for creating the preset and making requests)
- A decision on which text model orchestrates (rewrites prompts) and which image model generates
Use these references for exact schemas:
Each request through this preset makes two model calls: one to the text model (for prompt refinement) and one to the image model (for generation). The text model call is cheap; the image generation cost depends on the image model you configure. Check pricing on the image model’s page before routing production traffic.
How the pattern works
A standard image generation call looks like this: your user says “make a watermelon hippo,” and the image model gets exactly that. The result is decent but literal, because the image model has no creative direction.
The difference is visible. Here’s the same concept, with and without prompt enhancement:
The bare prompt produces a literal interpretation. The preset’s orchestrator expands it into a detailed scene with materials, lighting, and environment before the image model touches it.
The preset pattern inserts a text model as an orchestrator:
- User sends “make a watermelon hippo” to
@preset/your-slug - The text model (guided by your system prompt) rewrites it into something like: “A photorealistic hippopotamus sculpted entirely from watermelon rind and flesh, dark green outer rind with lighter stripes forming the body, exposed sections revealing vibrant pink-red watermelon flesh, black seeds dotting the surface like natural skin texture, standing in a shallow river at golden hour, warm sunlight catching water droplets on the glossy rind”
- The text model calls
openrouter:image_generationwith the enhanced prompt - OpenRouter generates the image and returns the URL to the text model
- The text model responds with the image and (optionally) explains what it created
The text model handles the creative interpretation. The image model handles the rendering. Each does what it’s best at.
Step 1: Create the preset via API
The fastest way to create the preset is to POST a request body to the preset creation endpoint. This captures the model, system prompt, tools, and parameters in one call:
The response confirms your preset was created:
The response shown above is abbreviated. The full response includes additional fields like workspace_id, description, and timestamps. See the Presets API reference for the complete schema.
You can also create or edit presets from the Presets dashboard, which has a visual server tools editor.
Step 2: Use the preset
Send requests to your preset slug as if it were a model:
The text model receives your message, crafts a detailed prompt, calls the image generation tool, and returns the result. Your application code doesn’t know or care about the orchestration happening inside.
What comes back
The response looks like a normal chat completion. The text model’s message contains the generated image URL (typically as a markdown image or inline URL) plus any commentary it added:
The image URL in the response is temporary. Download or cache it if you need it long-term.
Customization
Choosing the orchestrator model
The orchestrator rewrites prompts and decides when/how to call the tool. Good picks:
The orchestrator cost is typically small (a few hundred tokens of prompt rewriting). The image generation cost dominates.
Choosing the image model
Configure the image model in the tool’s parameters.model field. google/gemini-3.1-flash-image (Nano Banana 2) is a strong default: fast, cheap ($0.50/M input), and high quality. See available image models for all current options and pricing.
Tuning the system prompt
The system prompt controls how aggressively the orchestrator rewrites. Some patterns:
- Faithful expansion: “Expand the user’s request with visual detail while preserving their stated intent. Don’t override their style choices.”
- Brand-consistent: “All images should use our brand palette (navy #1a1a2e, gold #e6b800, cream #faf0e6). Apply a clean, modern illustration style.”
- Domain-specific: “You are generating product photography prompts. Focus on lighting setup, background material, camera angle, and product placement.”
Image tool parameters
The tool’s parameters object accepts model plus all image_config fields (quality, aspect_ratio, size, background, output_format, output_compression, moderation). See the image generation server tool reference for the full list and model-specific defaults.
Using different image models for different scenarios
The image model is fixed per preset (the orchestrator can’t switch it mid-request). Create separate presets for different quality tiers and let your application pick the right one.
Updating the preset without redeploying
One of the key benefits: change the image model, tune the system prompt, or adjust parameters from the Presets dashboard or via the API. No code deploy needed. Your application keeps calling @preset/image-enhancer and picks up the new config immediately.
To create a new version via API, POST to the same endpoint again with your updated config. The latest version becomes active automatically.
Next steps
- Explore available image models and compare pricing, quality, and speed
- Read about presets for version management, team sharing, and the preset field merge behavior
- Use the OpenRouter TypeScript SDK or Python SDK for typed preset interactions in your app
- Add web search to the preset’s tools so the orchestrator can reference current visual trends or specific art styles
- Track image generation costs with the Analytics guide to monitor per-preset spend
- Handle multi-turn conversations where the user refines images iteratively (pass previous messages to the preset to maintain context)

