Open
Description
Hello
flash_attn is support for cuda 12.1 and no support for cuda 12.4
If install by pip, it cause the error in inference
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found
So how did you use the flash_attn in your case?
Metadata
Metadata
Assignees
Labels
No labels