Add Latte: Latent Diffusion Transformer for Video Generation

### Model/Pipeline/Scheduler description

Latte is a text2video diffusion transformer (similar to Sora), improving past the DiT and PixArt-alpha text2image models

The implementation is already based on diffusers (see [latte_t2v.py](https://github.com/Vchitect/Latte/blob/main/models/latte_t2v.py)), so adding it here should be a straightforward task

### Open source status

- [X] The model implementation is available.
- [X] The model weights are available (Only relevant if addition is not a scheduler).

### Provide useful links for the implementation

The official repo https://github.com/Vchitect/Latte
Model on Huggingface: https://huggingface.co/maxin-cn/Latte
Paper: https://arxiv.org/abs/2401.03048v1
Project page: https://maxin-cn.github.io/latte_project/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Latte: Latent Diffusion Transformer for Video Generation #7223

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Latte: Latent Diffusion Transformer for Video Generation #7223

Description

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions