Skip to content

Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training released! Surpasses GPT-4o in ID persistence! Official ComfyUI workflow release! Only 4GB VRAM is enough to run!

License

Notifications You must be signed in to change notification settings

River-Zhang/ICEdit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

ReLER, CCAI, Zhejiang University; Harvard University
Corresponding Author

Image Editing is worth a single LoRA! We present In-Context Edit, a novel approach that achieves state-of-the-art instruction-based editing using just 0.5% of the training data and 1% of the parameters required by prior SOTA methods. The first row illustrates a series of multi-turn edits, executed with high precision, while the second and third rows highlight diverse, visually impressive single-turn editing results from our method.

📖 For more visual results, go checkout our project page

📢 Attention All: Incorrect ComfyUI Workflow Usage Alert — Read Now!

  • We have released our official ComfyUI workflow for proper usage! Check our repository and have a try!

  • You need to add the fixed pre-prompt "A diptych with two side-by-side images of the same scene. On the right, the scene is exactly the same as on the left but {instruction}" before inputing the edit instructions, otherwise you may get bad results! (This is mentioned in the paper!, The code for the Hugging Face gradio demo already embeds this prompt. So, you can simply input the editing instructions without additional setup.)
  • The width of the input image must resize to 512 (no restriction to height).
  • Please use the Normal LoRA not the MoE-LoRA, because the MoE-LoRA cannot be correctly loaded with ComfyUI lora loader.
  • 🔥💐🎆 Welcome to share your creative workflows (such as combining Redux, ACE, etc.) in the Issues section and showcase the results! We will include references so that more people can see your creativity.

🎆 News

👑 Feel free to share your results in this Gallery!

  • [2025/5/7] 🌟 We update some notes when using the ComfyUI workflow to avoid unsatisfactory results!
  • [2025/5/6] 🔥 ICEdit currently ranks 2nd on the overall/weekly trending list of Hugging Face space. Thank you all for your support and love!🤗
  • [2025/5/5] 🌟 Heartfelt thanks to Datou for creating a fantastic ComfyUI workflow on OpenArt! 🚀 Have a try!
  • [2025/5/2] 🌟 Heartfelt thanks to judian17 for crafting an amazing ComfyUI-nunchaku demo! Only 4GB VRAM GPU is enough to run with ComfyUI-nunchaku!🚀 Dive in and give it a spin!
  • [2025/4/30] 🔥 We release the Huggingface Demo 🤗! Have a try!
  • [2025/4/30] 🔥 We release the paper on arXiv!
  • [2025/4/29] We release the project page and demo video! Codes will be made available in next week~ Happy Labor Day!

🎈 Tutorial on Bilibili or Youtube

📖 Table of Contents

🎨ComfyUI Workflow

Official ComfyUI-workflow

We have released our official ComfyUI workflow in this repository for correct usage of our model! We have embedded the prompt "A diptych with two side-by-side images of the same scene ... but" into our nodes and you just need to input the edit instructions such as "make the girl wear pink sunglasses". We also add a high resolution refinement module for better image quality! The total VRAM consumption is about 14GB. Use this workflow and the ICEdit-normal-lora to fulfill your creative ideas!

We have specially created a repository for the workflow and you can install it directly in ComfyUI. Just open the manager tab and click 'Install via Git URL', copy the following URL and you are able to use it. For more details please refer to this issue

URL: https://github.com/hayd-zju/ICEdit-ComfyUI-official

Great thanks to 月下Hugo for making a Chinese tutorial on how to use our official workflow!

ComfyUI-workflow for increased editing success rate

Thanks to T8star! He made a tutorial (Youtube and bilibili) and a creative workflow (OpenArt and RunningHub) that could increase the editing success rate greatly (about 100%)! Have a try with it!

ComfyUI-nunchaku

We extend our heartfelt thanks to @judian17 for crafting a ComfyUI workflow that facilitates seamless usage of our model. Explore this excellent workflow to effortlessly run our model within ComfyUI. Only 4GB VRAM GPU is enough to run with ComfyUI-nunchaku!

This workflow incorporates high-definition refinement, yielding remarkably good results. Moreover, integrating this LoRA with Redux enables outfit changes to a certain degree. Once again, a huge thank you to @judian17 for his innovative contributions!

comfyui image

ComfyUI-workflow

Thanks to Datou, a workflow of ICEdit in ComfyUI can also be downloaded here. Try it with the normal lora ckpt.

⚠️ Tips

If you encounter such a failure case, please try again with a different seed!

  • Our base model, FLUX, does not inherently support a wide range of styles, so a large portion of our dataset involves style transfer. As a result, the model may sometimes inexplicably change your artistic style.

  • Our training dataset is mostly targeted at realistic images. For non-realistic images, such as anime or blurry pictures, the success rate of the editing drop and could potentially affect the final image quality.

  • While the success rates for adding objects, modifying color attributes, applying style transfer, and changing backgrounds are high, the success rate for object removal is relatively lower due to the low quality of the removal dataset we use.

The current model is the one used in the experiments in the paper, trained with only 4 A800 GPUs (total batch_size = 2 x 2 x 4 = 16). In the future, we will enhance the dataset, and do scale-up, finally release a more powerful model.

⚠️ Clarification

We've noticed numerous web pages related to ICEdit, including https://icedit.net/, https://icedit.org/. Kudos to those who built these pages!

However, we'd like to emphasize two important points:

💼 Installation

Conda environment setup

conda create -n icedit python=3.10
conda activate icedit
pip install -r requirements.txt
pip install -U huggingface_hub

Download pretrained weights

If you can connect to Huggingface, you don't need to download the weights. Otherwise, you need to download the weights to local.

Note: Due to some cooperation permission issues, we have to withdraw the weights and codes of moe-lora temporarily. What is released currently is just the ordinary lora, but it still has powerful performance. If you urgently need the moe lora weights of the original text, please email the author.

Inference in bash (w/o VLM Inference-time Scaling)

Now you can have a try!

Our model can only edit images with a width of 512 pixels (there is no restriction on the height). If you pass in an image with a width other than 512 pixels, the model will automatically resize it to 512 pixels.

If you found the model failed to generate the expected results, please try to change the --seed parameter. Inference-time Scaling with VLM can help much to improve the results.

python scripts/inference.py --image assets/girl.png \
                            --instruction "Make her hair dark green and her clothes checked." \
                            --seed 304897401 \

Editing a 512×768 image requires 35 GB of GPU memory. If you need to run on a system with 24 GB of GPU memory (for example, an NVIDIA RTX3090), you can add the --enable-model-cpu-offload parameter.

python scripts/inference.py --image assets/girl.png \
                            --instruction "Make her hair dark green and her clothes checked." \
                            --enable-model-cpu-offload

If you have downloaded the pretrained weights locally, please pass the parameters during inference, as in:

python scripts/inference.py --image assets/girl.png \
                            --instruction "Make her hair dark green and her clothes checked." \
                            --flux-path /path/to/flux.1-fill-dev \
                            --lora-path /path/to/ICEdit-normal-LoRA

Inference in Gradio Demo

We provide a gradio demo for you to edit images in a more user-friendly way. You can run the following command to start the demo.

python scripts/gradio_demo.py --port 7860

Like the inference script, if you want to run the demo on a system with 24 GB of GPU memory, you can add the --enable-model-cpu-offload parameter. And if you have downloaded the pretrained weights locally, please pass the parameters during inference, as in:

python scripts/gradio_demo.py --port 7860 \
                              --flux-path /path/to/flux.1-fill-dev (optional) \
                              --lora-path /path/to/ICEdit-normal-LoRA (optional) \
                              --enable-model-cpu-offload (optional) \

Or if you want to run the demo on a system with 10 GB of GPU memory, you can download the gguf models from FLUX.1-Fill-dev-gguf, t5-v1_1-xxl-encoder-gguf and pass the parameters during inference, as in:

python scripts/gradio_demo.py --port 7861 \
                              --flux-path models/flux.1-fill-dev \
                              --lora-path models/ICEdit-normal-LoRA \
                              --transformer models/flux1-fill-dev-Q4_0.gguf \
                              --text_encoder_2 models/t5-v1_1-xxl-encoder-Q8_0.gguf \
                              --enable-model-cpu-offload \

Then you can open the link in your browser to edit images.

Gradio Demo: just input the instruction and wait for the result!.

Here is also a Chinese tutorial Youtube video on how to install and use ICEdit, created by softicelee2. It's definitely worth a watch!

💼 Windows one-click package

Great thanks to gluttony-10, a famous Bilibili Up! He made a tutorial (Youtube and Bilibili) on how to install our project on windows and a one-click package for Windows! Just unzip it and it's ready to use. It has undergone quantization processing. It only takes up 14GB of space and supports graphics cards of the 50 series.

Download link: Google Drive or Baidu Wangpan(refer to the comment section of the video)

🔧 Training

Found more details in here: Training Code

💪 To Do List

  • Inference Code
  • Inference-time Scaling with VLM
  • Pretrained Weights
  • More Inference Demos
  • Gradio demo
  • Comfy UI demo (by @judian17, compatible with nunchaku, support high-res refinement and FLUX Redux. Only 4GB VRAM GPU is enough to run!)
  • Comfy UI demo with normal lora (by @Datou in openart)
  • Official ComfyUI workflow
  • Training Code
  • LoRA for higher image resolution (768, 1024)

💪 Comparison with Commercial Models

Compared with commercial models such as Gemini and GPT-4o, our methods are comparable to and even superior to these commercial models in terms of character ID preservation and instruction following. We are more open-source than them, with lower costs, faster speed (it takes about 9 seconds to process one image), and powerful performance.

🌟 Star History

Star History Chart

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@misc{zhang2025ICEdit,
      title={In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer}, 
      author={Zechuan Zhang and Ji Xie and Yu Lu and Zongxin Yang and Yi Yang},
      year={2025},
      eprint={2504.20690},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.20690}, 
}

About

Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training released! Surpasses GPT-4o in ID persistence! Official ComfyUI workflow release! Only 4GB VRAM is enough to run!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published