Papers
arxiv:2410.10801

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

Published on Oct 14, 2024
Authors:
,
,
,

Abstract

A study compares model merging techniques for enhancing performance and safety in multilingual large language models, showing significant improvements through objective-based and language-based merging methods.

AI-generated summary

Large Language Models (LLMs) have been adopted and deployed worldwide for a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms prevalent in Western-centric datasets, and safety protocols frequently fail to extend to multilingual settings. In this work, we explore model merging in a diverse multi-task setting, combining safety and general-purpose tasks within a multilingual context. Each language introduces unique and varied learning challenges across tasks. We find that objective-based merging is more effective than mixing data, with improvements of up to 8% and 10% in general performance and safety respectively. We also find that language-based merging is highly effective -- by merging monolingually fine-tuned models, we achieve a 4% increase in general performance and 7% reduction in harm across all languages on top of the data mixtures method using the same available data. Overall, our comprehensive study of merging approaches provides a useful framework for building strong and safe multilingual models.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2410.10801
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 23

Browse 23 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.10801 in a dataset README.md to link it from this page.

Spaces citing this paper 63

Collections including this paper 3