Open
Description
I am using a server with A800.
I tried a test.py (create a empty file and just copy your code in USAGE
, and run python test.py
in bash)
but resulted in a very slow speed. meanwhile the gpu util is 0%.
the output is:
Non-A100 GPU detected, using math or mem efficient attention if input tensor is on cuda
/opt/conda/lib/python3.11/contextlib.py:105: FutureWarning: torch.backends.cuda.sdp_kernel()
is deprecated. In the future, this context manager will be removed. Please see torch.nn.attention.sdpa_kernel()
for the new context manager, with updated signature.
self.gen = func(*args, **kwds)
sampling loop time step: 3%|███▉
I tried to change the code in attend.py
to force to open the flash attention, but resulted the same.
Metadata
Metadata
Assignees
Labels
No labels