[ACL 2025 Main] LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

🔥 News

[2025/05/16] LED-Merging has been accepted to ACL 2025 main conference 🎉🎉🎉
[2025/06/13] Code has been released 🔥🔥🔥

📝 TODO

Merge code
Inference code
Project page

🛠️ Getting Started

1. Setup

git clone https://github.com/MqLeet/LED-Merging.git
cd LED-Merging
conda create -n led python==3.12
conda activate led
pip install -r requirements.txt

2. Model Preparation

The used pretrained models can be found here:

Pretrained Backbones	SFT LLMs
Meta-Llama-3-8B	Meta-Llama-3-8B-Instruct
	MAmmoTH2-8B-Plus
	Replete-Coder-Llama3-8B
Mistral-7B	Mistral-7B-Instruct
	MetaMath-Mistral-7B
Llama-2-13B	WizardLM-13B
	WizardMath-13B
	llama-2-13b-code-alpaca

3. Dataset / Benchmark Preparation

Code generation Benchmark: bigcode-evaluation-harness
SafetyGuards Benchmark: HarmBench

🚀 Run and evaluation

We use Llama3-8B as an example to demonstrate the workflow of LED-Merging:

🔎 Locate

To get the importance scores in Equation 1 in paper, run the commands below:

cd locate/
bash scritps/locate_inst.sh

🗳️ Elect

Then, to select important neurons based on the importance scores, run the commands below:

mask_pattern=11
model_type="llama3"
rates=(0.1 0.4 0.5)

python mask_generate.py ${rates[0]} ${rates[1]} ${rates[2]} $mask_pattern $model_type

🪡 Disjoint and Merging

Finally, Disjoint conflict neruons and implement model merging:

model_type="llama3"
base_model="llama3-base"
orders=(safety math code)
fuse_types=(o o o o)
lambdas=(1.0 1.0 1.0)
fuse_o=11
alphas=(0.9)

python merge_llms.py \
    --models_to_merge llama3-instruct llama3-math llama3-code \
    --pretrained_model_name $base_model \
    --merging_method_name top_merging \
    --scaling_coefficient ${alphas[0]} \
    --mask_apply_method task_arithmetic \
    --fuse_rates ${mask_rates[0]} ${mask_rates[1]} ${mask_rates[2]} \
    --orders ${orders[0]} ${orders[1]} ${orders[2]} \
    --lambdas ${lambdas[0]} ${lambdas[1]} ${lambdas[2]} \
    --fuse_types ${fuse_types[0]} ${fuse_types[1]} ${fuse_types[2]} \
    --fuse_patterns $fuse_o $fuse_o $fuse_o \
    --model_ft_name $model_type \
    --model_base_name $base_model

🧪 Inference

Test on SafetyGuards

First, add items into HarmBench/configs/model_configs/models.yaml, like this:

llama3_8b_base_inst_math_code_task_0.9:
  model:
    model_name_or_path: save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/
    dtype: float16
    chat_template: llama-3
  num_gpus: 1
  model_type: open_source

HarmBench Generate responses

# HarmBench
cd HarmBench


## generate harmbench
export MKL_SERVICE_FORCE_INTEL=1
chat_template="llama3"
dataset="harmbench"
behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv"
test_cases_path="./data/behavior_datasets/harmbench_behaviors_text_all_results/test_cases.json"
trust_remote_code="True"
max_new_tokens=512
model_name_or_path="save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/"
model_name="model_name"

python -u generate_completions.py \
    --model_name $model_name \
    --behaviors_path $behaviors_path \
    --test_cases_path $test_cases_path \
    --save_path $save_path \
    --chat_template $chat_template \
    --max_new_tokens $max_new_tokens --dataset $dataset --model_name_or_path $model_name_or_path \
    --trust_remote_code --generate_with_vllm

cd ./completions
python merge.py harmbench/$model_name.json
cd ../

Evaluation:

dataset="harmbench"
cls_path="/path/to/HarmBench-Llama-2-13b-cls"
behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv"
eval_type="harmbench"
completions_path= "/path/to/generated/reponses"

python -u evaluate_completions.py \
    --cls_path $cls_path \
    --behaviors_path $behaviors_path \
    --completions_path $completions_path \
    --save_path $save_path \
    --eval_type $eval_type

SORRY-Bench

Generate Reponses:

dataset="sorrybench"
test_cases_path="./data/sorrybench/test_cases.json"
behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv"
chat_template="llama3"
model_name="model_name"
model_name_or_path="save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/"



python -u generate_completions.py \
    --model_name $model_name \
    --behaviors_path $behaviors_path \
    --test_cases_path $test_cases_path \
    --save_path $save_path \
    --chat_template $chat_template \
    --max_new_tokens $max_new_tokens --dataset $dataset --model_name_or_path $model_name_or_path \
    --trust_remote_code --generate_with_vllm

Evaluation:

eval_type="sorrybench"
cls_path="/path/to/ft-mistral-7b-instruct-v0.2-sorry-bench-202406"
behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv"
completions_path= "/path/to/generated/reponses"


python -u evaluate_completions.py \
    --cls_path $cls_path \
    --behaviors_path $behaviors_path \
    --completions_path $completions_path \
    --save_path $save_path \
    --eval_type $eval_type \
    --include_advbench_metric

Test on Math

GSM8K

python inference_llms.py --dataset_name gsm8k --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus
python test_gsm8k_rule.py --files /path/to/generated/responses

MATH

python inference_llms.py --dataset_name MATH --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus
python test_math_rule.py --files /path/to/generated/responses

Test on Code

mbpp

python inference_llms.py --dataset_name mbpp --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus
accelerate launch --num_processes $num_gpus ./bigcode-evaluation-harness/main.py --tasks mbpp --allow_code_execution --load_generations_path /path/to/generated/responses

humanevalpack

python inference_llms.py --dataset_name human_eval_pack --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus
accelerate launch --num_processes $num_gpus ./bigcode-evaluation-harness/main.py --tasks humanevalfixdocs-python --allow_code_execution --load_generations_path /path/to/generated/responses

✒️ Citation

If you find LED-Merging useful for your research and applications, please kindly cite our paper using this BibTeX:

@article{ma2025led,
  title={LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint},
  author={Ma, Qianli and Liu, Dongrui and Chen, Qian and Zhang, Linfeng and Shao, Jing},
  journal={arXiv preprint arXiv:2502.16770},
  year={2025}
}

❤️ Acknowledgement

Our code is built upon MergeLLM, MergeLM and alignment-attribution-code. The Benchmark we use is bigcode-evaluation-harness and HarmBench.

We also refer to the Pretrained backbones and SFT LLMs used above. Thanks to all the contributors for their great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ACL 2025 Main] LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

🔥 News

📝 TODO

🛠️ Getting Started

1. Setup

2. Model Preparation

3. Dataset / Benchmark Preparation

🚀 Run and evaluation

🔎 Locate

🗳️ Elect

🪡 Disjoint and Merging

🧪 Inference

Test on SafetyGuards

Test on Math

Test on Code

✒️ Citation

❤️ Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
HarmBench		HarmBench
bigcode-evaluation-harness		bigcode-evaluation-harness
locate		locate
model_merging_methods		model_merging_methods
utils		utils
README.md		README.md
inference_llms.py		inference_llms.py
mask_generate.py		mask_generate.py
merge_llms.py		merge_llms.py
requirements.txt		requirements.txt
test_gsm8k_rule.py		test_gsm8k_rule.py
test_math_rule.py		test_math_rule.py

MqLeet/LED-Merging

Folders and files

Latest commit

History

Repository files navigation

[ACL 2025 Main] LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

🔥 News

📝 TODO

🛠️ Getting Started

1. Setup

2. Model Preparation

3. Dataset / Benchmark Preparation

🚀 Run and evaluation

🔎 Locate

🗳️ Elect

🪡 Disjoint and Merging

🧪 Inference

Test on SafetyGuards

Test on Math

Test on Code

✒️ Citation

❤️ Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages