Description
When I executed sv4d's quickstart, an out of memory error occurred, but it ran successfully when I executed sv3d. I referred to some friends who adjusted decoding_t (originally 14, reduced to 1), but it still didn't work. In addition, I am using a 40G A100. Has anyone encountered a similar problem and successfully solved it? I would be very grateful.
The following is the error content
(sv4d) [zhoushengxiao@gpu3 SV4D]$ python scripts/sampling/simple_video_sample_4d.py --input_path assets/test_video1.mp4 --output_folder outputs/sv4d
Reading assets/test_video1.mp4
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False
Initialized embedder #1: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder #3: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder #4: ConcatTimestepEmbedderND with 0 params. Trainable: False
Restored from checkpoints/sv3d_p.safetensors with 0 missing and 0 unexpected keys
/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False
Initialized embedder #1: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder #3: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder #4: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Initialized embedder #5: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Restored from checkpoints/sv4d.safetensors with 0 missing and 0 unexpected keys
Sampling anchor frames [ 4 8 12 16 20]
Traceback (most recent call last):
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/scripts/sampling/simple_video_sample_4d.py", line 236, in
Fire(sample)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/scripts/sampling/simple_video_sample_4d.py", line 170, in sample
samples = run_img2vid(
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/scripts/demo/sv4d_helpers.py", line 705, in run_img2vid
samples = do_sample(
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/scripts/demo/sv4d_helpers.py", line 764, in do_sample
c, uc = model.conditioner.get_unconditional_conditioning(
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/sgm/modules/encoders/modules.py", line 183, in get_unconditional_conditioning
c = self(batch_c, force_cond_zero_embeddings)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/sgm/modules/encoders/modules.py", line 132, in forward
emb_out = embedder(batch[embedder.input_key])
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/sgm/modules/encoders/modules.py", line 1019, in forward
out = self.encoder.encode(vid[n * n_samples : (n + 1) * n_samples])
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/sgm/models/autoencoder.py", line 472, in encode
z = self.encoder(x)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/sgm/modules/diffusionmodules/model.py", line 584, in forward
h = self.down[i_level].block[i_block](hs[-1], temb)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/sgm/modules/diffusionmodules/model.py", line 134, in forward
h = nonlinearity(h)
File "/mnt/lustre/GPU3/home/zhoushengxiao/workspace/codes/SV4D/sv4d/lib/python3.10/site-packages/sgm/modules/diffusionmodules/model.py", line 49, in nonlinearity
return x * torch.sigmoid(x)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.33 GiB (GPU 0; 39.38 GiB total capacity; 31.96 GiB already allocated; 3.20 GiB free; 35.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF