SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow

Examples

SPICE produces high-quality and high-complexity image editing results, with little user expertise required. It is a versatile workflow that supports multiple editing needs.

Multi-Purpose Editing

SPICE is good at inpainting, outpainting, structural editing, and detail enhancing. The example below shows how 40 editing steps are iteratively performed to generate an image from an empty canvas. Due to its complexity, the image cannot be possibly generated in a single text-to-image generation step. The characters are inspired by the video game Darkest Dungeon 2.

Gestures

SPICE can generate or fix complicated gestures. In the example below, the user wants to generate a number gesture of 6 in Chinese. The left image is generated by gpt4o. Despite the prompt explicitly asks for a gesture of 6 in Chinese number gesture, gpt4o fails to interpret the prompt and generates a gesture of 5. The right image is edited by SPICE, where the gesture is correct.

Generate Text

SPICE can generate text. In the example below, the user wants to add the word "SCRANTON" to the clock tower. The left image is the original photo. The right image is edited by SPICE.

Fix Text

SPICE can also fix text. The image below is generated using the prompt in one of OpenAI's official examples. The left image is generated by gpt4o, where some words are misspelled (VIGLATORS and TOM-AWAY). The right image is edited by SPICE, where the misspellings have been fixed.

Add Occluded Objects

SPICE can handle complicated object occlusions. In the example below, the user wants to add a black backpack onto the bench. The left image is the original image, and the right image is generated by SPICE. Models that are known to fail on this task include gpt4o, Gemini 2.0 Flash, Doubao SeedEdit, UltraEdit, MGIE, and MagicQuill.

Fix Failures

SPICE can fix its own failures. In the example below, the user wants to change the background from a river into a desert. The left image is a failed result after the first editing step. The right image is the fixed result after 3 additional editing steps.

Adapt to Different Styles

Last but not least, SPICE is adaptable to any art style, if the style is supported by a base model or a LoRA. The two examples are both iteratively generated and refined by SPICE. The characters are from Touhou Project and Hades 2.

SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow

Introduction