Open
Description
Hi, I'm finetuning Sana-1.6M on 4K images (4096×4096) and encountered OOM on an H20-96G GPU during vae_encode. Mixed precision is enabled, batch size = 1.
My questions:
1.How much memory is needed to train Sana-1.6M on 4K images?
2.Would using tiled VAE encoding during training affect performance?
Thanks!