OOM when finetuning Sana-1.6M on 4K images — memory requirement & tiled VAE?

Hi, I'm finetuning Sana-1.6M on 4K images (4096×4096) and encountered OOM on an H20-96G GPU during vae_encode. Mixed precision is enabled, batch size = 1.

My questions:
1.How much memory is needed to train Sana-1.6M on 4K images?
2.Would using tiled VAE encoding during training affect performance?

Thanks!