- [2025/05/16] LED-Merging has been accepted to ACL 2025 main conference 🎉🎉🎉
- [2025/06/13] Code has been released 🔥🔥🔥
- Merge code
- Inference code
- Project page
git clone https://github.com/MqLeet/LED-Merging.git
cd LED-Merging
conda create -n led python==3.12
conda activate led
pip install -r requirements.txt
The used pretrained models can be found here:
-
Code generation Benchmark: bigcode-evaluation-harness
-
SafetyGuards Benchmark: HarmBench
We use Llama3-8B as an example to demonstrate the workflow of LED-Merging:
To get the importance scores in Equation 1 in paper, run the commands below:
cd locate/
bash scritps/locate_inst.sh
Then, to select important neurons based on the importance scores, run the commands below:
mask_pattern=11
model_type="llama3"
rates=(0.1 0.4 0.5)
python mask_generate.py ${rates[0]} ${rates[1]} ${rates[2]} $mask_pattern $model_type
Finally, Disjoint conflict neruons and implement model merging:
model_type="llama3"
base_model="llama3-base"
orders=(safety math code)
fuse_types=(o o o o)
lambdas=(1.0 1.0 1.0)
fuse_o=11
alphas=(0.9)
python merge_llms.py \
--models_to_merge llama3-instruct llama3-math llama3-code \
--pretrained_model_name $base_model \
--merging_method_name top_merging \
--scaling_coefficient ${alphas[0]} \
--mask_apply_method task_arithmetic \
--fuse_rates ${mask_rates[0]} ${mask_rates[1]} ${mask_rates[2]} \
--orders ${orders[0]} ${orders[1]} ${orders[2]} \
--lambdas ${lambdas[0]} ${lambdas[1]} ${lambdas[2]} \
--fuse_types ${fuse_types[0]} ${fuse_types[1]} ${fuse_types[2]} \
--fuse_patterns $fuse_o $fuse_o $fuse_o \
--model_ft_name $model_type \
--model_base_name $base_model
First, add items into HarmBench/configs/model_configs/models.yaml, like this:
llama3_8b_base_inst_math_code_task_0.9:
model:
model_name_or_path: save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/
dtype: float16
chat_template: llama-3
num_gpus: 1
model_type: open_source
- HarmBench Generate responses
# HarmBench
cd HarmBench
## generate harmbench
export MKL_SERVICE_FORCE_INTEL=1
chat_template="llama3"
dataset="harmbench"
behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv"
test_cases_path="./data/behavior_datasets/harmbench_behaviors_text_all_results/test_cases.json"
trust_remote_code="True"
max_new_tokens=512
model_name_or_path="save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/"
model_name="model_name"
python -u generate_completions.py \
--model_name $model_name \
--behaviors_path $behaviors_path \
--test_cases_path $test_cases_path \
--save_path $save_path \
--chat_template $chat_template \
--max_new_tokens $max_new_tokens --dataset $dataset --model_name_or_path $model_name_or_path \
--trust_remote_code --generate_with_vllm
cd ./completions
python merge.py harmbench/$model_name.json
cd ../
Evaluation:
dataset="harmbench"
cls_path="/path/to/HarmBench-Llama-2-13b-cls"
behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv"
eval_type="harmbench"
completions_path= "/path/to/generated/reponses"
python -u evaluate_completions.py \
--cls_path $cls_path \
--behaviors_path $behaviors_path \
--completions_path $completions_path \
--save_path $save_path \
--eval_type $eval_type
- SORRY-Bench
Generate Reponses:
dataset="sorrybench"
test_cases_path="./data/sorrybench/test_cases.json"
behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv"
chat_template="llama3"
model_name="model_name"
model_name_or_path="save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/"
python -u generate_completions.py \
--model_name $model_name \
--behaviors_path $behaviors_path \
--test_cases_path $test_cases_path \
--save_path $save_path \
--chat_template $chat_template \
--max_new_tokens $max_new_tokens --dataset $dataset --model_name_or_path $model_name_or_path \
--trust_remote_code --generate_with_vllm
Evaluation:
eval_type="sorrybench"
cls_path="/path/to/ft-mistral-7b-instruct-v0.2-sorry-bench-202406"
behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv"
completions_path= "/path/to/generated/reponses"
python -u evaluate_completions.py \
--cls_path $cls_path \
--behaviors_path $behaviors_path \
--completions_path $completions_path \
--save_path $save_path \
--eval_type $eval_type \
--include_advbench_metric
- GSM8K
python inference_llms.py --dataset_name gsm8k --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus
python test_gsm8k_rule.py --files /path/to/generated/responses
- MATH
python inference_llms.py --dataset_name MATH --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus
python test_math_rule.py --files /path/to/generated/responses
- mbpp
python inference_llms.py --dataset_name mbpp --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus
accelerate launch --num_processes $num_gpus ./bigcode-evaluation-harness/main.py --tasks mbpp --allow_code_execution --load_generations_path /path/to/generated/responses
- humanevalpack
python inference_llms.py --dataset_name human_eval_pack --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus
accelerate launch --num_processes $num_gpus ./bigcode-evaluation-harness/main.py --tasks humanevalfixdocs-python --allow_code_execution --load_generations_path /path/to/generated/responses
If you find LED-Merging useful for your research and applications, please kindly cite our paper using this BibTeX:
@article{ma2025led,
title={LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint},
author={Ma, Qianli and Liu, Dongrui and Chen, Qian and Zhang, Linfeng and Shao, Jing},
journal={arXiv preprint arXiv:2502.16770},
year={2025}
}
Our code is built upon MergeLLM, MergeLM and alignment-attribution-code. The Benchmark we use is bigcode-evaluation-harness and HarmBench.
We also refer to the Pretrained backbones and SFT LLMs used above. Thanks to all the contributors for their great work!