This document describes the limitations of DALL-E 3 for image generation in the Intelligent Content Generation Accelerator and the workarounds implemented to achieve product-seeded marketing image generation.
DALL-E 3 only accepts text prompts. Unlike newer models such as GPT-image-1, DALL-E 3 does not support:
- Image-to-image generation
- Reference/seed images as input
- Image editing or inpainting with image inputs
This means you cannot directly pass a product image to DALL-E 3 and ask it to create a marketing image featuring that product.
| Capability | DALL-E 3 | GPT-image-1 |
|---|---|---|
| Text prompts | ✅ | ✅ |
| Image input | ❌ | ✅ |
| Image editing | ❌ | ✅ |
| Inpainting | ❌ | ✅ |
| Multiple images per request | 1 only | 1-10 |
| Output format | URL or base64 | base64 only |
To work around DALL-E 3's text-only limitation, we use GPT-5 Vision to generate detailed text descriptions of product images during the product ingestion process.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Product Image │────▶│ GPT-5 Vision │────▶│ Text Description│
│ (Blob Storage) │ │ (Auto-analyze) │ │ (CosmosDB) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Marketing Image │◀────│ DALL-E 3 │◀────│ Combined Prompt │
│ (Output) │ │ (Generate) │ │ (Desc + Brief) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
When a product image is uploaded to Blob Storage, the ProductIngestionService automatically:
- Sends the image to GPT-5 Vision
- Generates a detailed text description including:
- Product appearance (colors, shapes, materials)
- Key visual features
- Composition and positioning
- Style and aesthetic qualities
- Stores the description in CosmosDB alongside product metadata
async def generate_image_description(image_url: str) -> str:
"""Generate detailed text description of product image using GPT-5 Vision."""
response = await openai_client.chat.completions.create(
model="gpt-5",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": """Describe this product image in detail for use in marketing image generation.
Include: colors, materials, shape, key features, style, and positioning.
Be specific enough that an image generator could recreate a similar product."""
},
{
"type": "image_url",
"image_url": {"url": image_url}
}
]
}
],
max_tokens=500
)
return response.choices[0].message.contentThe ImageContentAgent combines the stored product description with:
- Creative brief visual guidelines
- Brand guidelines (colors, style, composition rules)
- Scene/context requirements
async def generate_marketing_image(
product: Product,
creative_brief: CreativeBrief,
brand_guidelines: BrandGuidelines
) -> bytes:
"""Generate marketing image using DALL-E 3 with product context."""
prompt = f"""
Create a professional marketing image for a retail campaign.
PRODUCT (maintain accuracy):
{product.image_description}
SCENE:
{creative_brief.visual_guidelines}
BRAND STYLE:
- Primary color: {brand_guidelines.primary_color}
- Style: {brand_guidelines.image_style}
- Composition: Product centered, 30% negative space
REQUIREMENTS:
- Professional lighting
- Clean, modern aesthetic
- Suitable for {creative_brief.deliverable}
"""
response = await openai_client.images.generate(
model="dall-e-3",
prompt=prompt,
size="1024x1024",
quality="hd",
n=1
)
return response.data[0].url-
Product Representation: The generated product in the marketing image may not be an exact match to the original product. DALL-E 3 interprets the text description and creates its own version.
-
Brand-Specific Details: Logos, specific patterns, or unique design elements may not be accurately reproduced.
-
Color Matching: While we include color descriptions, exact color matching is not guaranteed.
| Use Case | Suitability |
|---|---|
| Lifestyle/contextual marketing images | ✅ Excellent |
| Social media campaign visuals | ✅ Excellent |
| Concept mockups | ✅ Good |
| Product-in-scene compositions | ✅ Good |
| Exact product photography replacement | ❌ Not recommended |
| Catalog/technical images | ❌ Not recommended |
GPT-image-1 (currently in limited access preview) will enable true image-to-image generation:
# Future implementation with GPT-image-1
async def generate_marketing_image_with_seed(
product_image_path: str,
scene_description: str,
brand_style: str
) -> bytes:
"""Generate marketing image seeded with actual product photo."""
response = await openai_client.images.edit(
model="gpt-image-1",
image=open(product_image_path, "rb"), # Actual product image as input
prompt=f"""
Create a marketing image featuring the product shown.
Scene: {scene_description}
Brand Style: {brand_style}
Maintain product accuracy.
""",
size="1024x1024",
quality="high",
input_fidelity="high" # Preserve product details
)
return base64.b64decode(response.data[0].b64_json)- Visit GPT-image-1 Access Request
- Complete the application form
- Wait for approval (typically 1-2 weeks)
- Update the
ImageContentAgentto use the Image Edit API
When GPT-image-1 access is granted:
- Update
AZURE_DALLE_MODELenvironment variable togpt-image-1 - Modify
ImageContentAgentto useimages.edit()instead ofimages.generate() - Update Blob Storage retrieval to pass actual image bytes
- Test with sample products before production deployment
For best results with the text-based workaround:
- Be Specific: Include exact colors, materials, and dimensions
- Describe Unique Features: Highlight what makes the product distinctive
- Include Context: Mention typical use cases or settings
- Avoid Ambiguity: Use precise terminology
A sleek wireless Bluetooth headphone in matte black finish with
rose gold accents on the ear cup rims and headband adjustment
sliders. Over-ear cushions in premium memory foam covered with
soft protein leather. The headband features a padded top section
with subtle brand embossing. The left ear cup has touch-sensitive
controls visible as a circular touch pad. Cable port and power
button are positioned on the bottom edge of the right ear cup.
Overall aesthetic is premium, modern, and minimalist.
All generated images are validated by the ComplianceAgent for:
- Brand color adherence
- Prohibited visual elements
- Appropriate imagery for target audience
- Required disclaimers (added as text overlay if needed)
Images with compliance violations are flagged with appropriate severity levels before user review.