SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow

Kenan Tang1,*, Yanhong Li2,*, Yao Qin1

1University of California, Santa Barbara
2University of Chicago
*Equal Contribution

Introduction

SPICE is an image editing workflow that provides state-of-the-art image editing quality, supporting most of the popular diffusion models (Flux Dev, SDXL, SD 1.5, and more). SPICE provides great user experience for beginners and pros alike.

Examples

SPICE produces high-quality and high-complexity image editing results, with little user expertise required. It is a versatile workflow that supports multiple editing needs.

Multi-Purpose Editing

SPICE is good at inpainting, outpainting, structural editing, and detail enhancing. The example below shows how 40 editing steps are iteratively performed to generate an image from an empty canvas. Due to its complexity, the image cannot be possibly generated in a single text-to-image generation step. The characters are inspired by the video game Darkest Dungeon 2.

Gestures

SPICE can generate or fix complicated gestures. In the example below, the user wants to generate a number gesture of 6 in Chinese. The left image is generated by gpt4o. Despite the prompt explicitly asks for a gesture of 6 in Chinese number gesture, gpt4o fails to interpret the prompt and generates a gesture of 5. The right image is edited by SPICE, where the gesture is correct.

Generate Text

SPICE can generate text. In the example below, the user wants to add the word "SCRANTON" to the clock tower. The left image is the original photo. The right image is edited by SPICE.

Fix Text

SPICE can also fix text. The image below is generated using the prompt in one of OpenAI's official examples. The left image is generated by gpt4o, where some words are misspelled (VIGLATORS and TOM-AWAY). The right image is edited by SPICE, where the misspellings have been fixed.

Add Occluded Objects

SPICE can handle complicated object occlusions. In the example below, the user wants to add a black backpack onto the bench. The left image is the original image, and the right image is generated by SPICE. Models that are known to fail on this task include gpt4o, Gemini 2.0 Flash, Doubao SeedEdit, UltraEdit, MGIE, and MagicQuill.

Fix Failures

SPICE can fix its own failures. In the example below, the user wants to change the background from a river into a desert. The left image is a failed result after the first editing step. The right image is the fixed result after 3 additional editing steps.

Adapt to Different Styles

Last but not least, SPICE is adaptable to any art style, if the style is supported by a base model or a LoRA. The two examples are both iteratively generated and refined by SPICE. The characters are from Touhou Project and Hades 2.

Methods

SPICE is easy to use. In order to edit an image, a user needs to sketch both a hint and a mask. The user also needs to provide a prompt that describes the final image. However, no prompt engineering or prompt enhancements are required. With all these inputs, a two-stage denoising process will generate a localized edit in the desired region.

Tutorial

Please refer to our GitHub repository for further instructions on installation and editing.

Citation

@misc{tang2025spicesynergisticpreciseiterative, title={SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow}, author={Kenan Tang and Yanhong Li and Yao Qin}, year={2025}, eprint={2504.09697}, archivePrefix={arXiv}, primaryClass={cs.GR}, url={https://arxiv.org/abs/2504.09697}, }