This project is an interactive tool that transforms simple sketches into detailed images using AI models. Users can draw freehand sketches and convert them into realistic or artistic images in various styles.
- Interactive Drawing: Built with Excalidraw for a natural drawing experience
- Model Selection: Choose between two different AI models with different capabilities
- Multiple Styles: Choose from multiple visual styles for your generated images (Photorealistic, Anime, Oil Painting, Watercolor, and Detailed Sketch)
- Responsive Design: Works on tablets and desktop devices
- GPU Acceleration: Utilizes NVIDIA GPUs when available for faster image generation
- Docker and Docker Compose
- NVIDIA GPU with CUDA support (optional, but recommended for faster performance)
-
Clone the repository:
git clone https://github.com/aihpi/sketch2image.git cd sketch2image
-
Run the setup script:
chmod +x setup.sh ./setup.sh
-
Access the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000/api
-
Draw your sketch on the Excalidraw canvas
- Keep lines clear and distinct for best results
- Simple sketches work better than highly detailed ones
-
Select an AI model:
- SD 1.5 + ControlNet Scribble: Faster generation (5-15s on GPU)
- SDXL + T2I-Adapter Sketch: Higher quality but slower (10-30s on GPU)
-
Choose an output style:
- Photorealistic: Realistic images with photographic details
- Anime: Cartoon/anime style with simplified shapes and bold colors
- Oil Painting: Artistic oil painting look with rich textures
- Watercolor: Soft watercolor art style with gentle color blending
- Detailed Sketch: Enhanced detailed sketch with improved linework and shading
-
Add a description
- Describe what you're drawing for better results
- Example: "a cat sitting on a windowsill"
- Include key details you want emphasized
-
Click "Generate Image"
- Wait for the AI to process your sketch (5-30 seconds)
- The generated image will appear on the right side
-
Managing Results:
- Download your image using the download button
- Use the "Reset All" button to start over with a new sketch
For best results:
- Start with a simple sketch with clear outlines
- Try both models to see which best captures your vision
- Experiment with different styles
- Use specific descriptions that emphasize important elements
- For complex subjects, break down into simpler components
The following examples showcase how the system transforms simple sketches into various styles:
- Sketch Clarity: The system works best with clear, simple line drawings; complex or ambiguous sketches may produce unexpected results.
- Generation Time: Processing time increases with sketch complexity and varies by hardware.
- Style Consistency: Some styles work better with certain subjects than others. For example, the "anime" style may not always produce consistent anime-style artwork for all sketches.
- Unusual Subjects: The models may struggle with abstract or highly unusual sketches that don't resemble common objects.
- Resolution: Output images are fixed at 512×512 pixels.
- Model Limitations:
- Both models occasionally ignore certain elements in very complex sketches.
- Both models sometimes misinterpret the scale or perspective of sketched objects.
You can modify the application settings by editing the .env
file or the docker-compose.yml
file:
MODEL_ID
: The default model to useNUM_INFERENCE_STEPS
: Number of diffusion stepsGUIDANCE_SCALE
: Controls how closely the output follows the promptOUTPUT_IMAGE_SIZE
: Size of the generated imageDEVICE
: Set to "cuda" for GPU or "cpu" for CPU processing
- Slow Generation: Try using the ControlNet Scribble model instead of T2I-Adapter
- Poor Results: Simplify your sketch and provide a clear description
- Container Errors: Check Docker logs with
docker-compose logs
- GPU Not Detected: Ensure NVIDIA drivers and Docker GPU support are correctly installed
- Excalidraw for the drawing interface
- Hugging Face for hosting the pre-trained models
- ControlNet & T2I-Adapter for the sketch-to-image technology
KI-Servicezentrum Berlin-Brandenburg is funded by the Federal Ministry of Education and Research under the funding code 01IS22092.