Description
OS
Linux
GPU Library
AMD ROCm
Python version
3.12
Pytorch version
2.8.0
Model
dakkidaze/Cydonia-22B-v1.3-4.5bpw-h6-exl2
Describe the bug
Exllamav2 versions newer than 2.8.0 seem to very quickly start generating nonsense.
Example I got (both with the model I wrote above, exactly the same request):
Old version (0.2.8):
Once upon a time, in a lush green meadow, there lived a curious little rabbit named Benny. Benny loved exploring the meadow, hopping from one patch of clover to another, nibbling on the sweet leaves. One sunny morning, as Benny was enjoying his breakfast, he noticed a sleek black cat lounging under a nearby tree. (it goes on to write several paragraphs of coherent text)
New version (0.3.1):
Once upon a time, a cat named Tom and a rabbit named named named named (continues to write the same word until the length limit is hit)
Regarding models: This bug seems to work differently depending on the model (and I'm not sure whether it impacts all or only Mistral 22B) Mistral 22B-based finetunes (I tested several) start to repeat the same word, while Llama 3 8B seems to return a generally coherent text most of the time (with appropriate instruct template). Rocinante 12B is also unaffected
Reproduction steps
Clone TabbyAPI into 2 folders with different names.
Inside both, create a venv and clone exllamav2.
Install python via
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
(I tested on a system with rocm 6.4 installed, but torch built for 6.3 seems to be compatible with 6.4, and if my memory serves me well, this issue was also present on 6.3 and 6.2.4 at least)
In repo 1 (old):
For TabbyAPI, checkout commit 3960612d38b231017cd72e5fd19db855fe3bd371
in exllama, checkout v2.8.0
build both via pip install .
(exllamav2 then tabby)
In repo 2 (new):
Simply build both via pip install .
(exllamav2 then tabby)
Launch the old one, generate a response with some deterministic preset (I used SillyTavern), close.
Launch the new one, generate a response with the same prompt and preset.
I encountered the bug several times when trying to move to newer versions to exllama, 2.8.0 is the latest version I'm confident works fine.
Expected behavior
In both cases, a coherent result should be generated
Logs
No response
Additional context
No response
Acknowledgements
- I have looked for similar issues before submitting this one.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will ask my questions politely.