DMR News

Advancing Digital Conversations

Microsoft and Beihang Unveil MoRA: Efficient LLM Fine-Tuning

ByYasmeeta Oon

May 30, 2024

Microsoft and Beihang Unveil MoRA: Efficient LLM Fine-Tuning

In a groundbreaking development, researchers from Microsoft and Beihang University have unveiled a new technique for fine-tuning large language models (LLMs) that significantly reduces the cost traditionally associated with this process. This innovative method, termed MoRA (Matrix Optimization for Rank Adaptation), stands out as a parameter-efficient fine-tuning (PEFT) technique, potentially revolutionizing the way developers approach LLM adaptation.

MoRA aims to address some of the inherent limitations found in other popular PEFT techniques, such as low-rank adaptation (LoRA). As the enterprise adoption of PEFT methods continues to rise, MoRA promises to be a crucial addition to the toolkit of LLM application developers, especially for tasks necessitating the acquisition of new knowledge.

Comparison of Fine-Tuning Techniques
FeatureClassic Fine-TuningLoRAMoRA
Parameter Update MethodFull Parameter SetLow-Rank MatricesSquare Matrix
Memory RequirementHighLowModerate
Performance on Simple TasksHighHighHigh
Performance on Complex TasksHighModerateHigh
CostHighLowLow

Traditional fine-tuning methods necessitate updating all parameters of an LLM, a task that becomes prohibitively expensive and slow when dealing with models containing billions of parameters. PEFT techniques, like LoRA, mitigate this by identifying the optimal subset of parameters for modification, thus significantly reducing memory requirements and facilitating the storage and deployment of fine-tuned models.

LoRA has garnered popularity for its use of low-rank matrices, which map the full-rank weight matrix to a smaller subspace, efficiently updating the parameters. This approach has proven effective for tasks such as text classification and instruction tuning. However, LoRA faces challenges with more complex tasks that require the model to enhance its knowledge base, such as mathematical reasoning and continual pre-training.

Several studies have highlighted that LoRA’s low-rank updating mechanism may limit LLMs’ ability to effectively learn and retain new knowledge. The rank of the LoRA adapter, being significantly smaller than the full rank of the model, restricts its capacity to store new information through fine-tuning.

To overcome these limitations, the researchers developed MoRA, which uses a square matrix instead of low-rank matrices. This technique employs trainable parameters in a manner that achieves the highest possible rank within the model’s original dimensions. Unlike LoRA, the input and output dimensions of the MoRA adapter do not match those of the original model, preventing them from being combined in the same matrix multiplication operation.

To bridge this gap, the researchers devised a compression/decompression function that transforms inputs between the two spaces, allowing MoRA to be seamlessly integrated into LLMs of varying sizes. The square weight matrix endows MoRA with a stronger capacity to learn new knowledge compared to a LoRA model of equivalent size.

The researchers conducted comparisons between equally sized LoRA and MoRA models across various tasks and settings. On memorization tasks, MoRA significantly outperformed LoRA, achieving performance levels closer to that of a fully fine-tuned model with fewer parameters and training steps.

  • MoRA demonstrated significant improvements over LoRA with the same number of trainable parameters, benefiting from high-rank updating.
  • In instruction tuning and mathematical reasoning tasks, MoRA’s performance was nearly on par with LoRA.
  • For continual pretraining in biomedical and financial domains, MoRA outperformed LoRA, thanks to its high-rank updating, which aids in memorizing new knowledge.
  • Increasing the rank of the MoRA adapter can eliminate the performance gap between PEFT and full fine-tuning in mathematical reasoning tasks, albeit with higher training and storage costs.

Fine-tuning is a critical use case for enterprise LLM applications. By enhancing the capabilities and accuracy of LLMs on proprietary knowledge, fine-tuning enables companies to deploy smaller models for tasks that previously required expensive, state-of-the-art models. Currently, LoRA and its variants are the gold standards for parameter-efficient fine-tuning, supported by a robust ecosystem of tools and platforms.

For instance, S-LoRA is a framework that allows developers to run thousands of LoRA adapters on a single GPU, facilitating applications that require numerous fine-tuned LLMs, such as models customized for individual user content. The researchers at Microsoft and Beihang University have made an open-source implementation of MoRA available, which is compatible with LoRA. This compatibility can significantly benefit enterprise applications looking to integrate new knowledge into base models.

  • MoRA reduces the cost of fine-tuning large language models.
  • It uses a square matrix for parameter updates, achieving higher rank within the model’s dimensions.
  • MoRA outperforms LoRA in complex tasks such as continual pre-training.
  • The technique is suitable for enterprise applications, enhancing LLM capabilities on proprietary knowledge.
  • Open-source implementation of MoRA is compatible with LoRA, facilitating integration into existing workflows.

By addressing the limitations of existing PEFT techniques, MoRA represents a significant advancement in the field of fine-tuning large language models. Its ability to efficiently learn and retain new knowledge positions it as a valuable tool for developers and enterprises aiming to optimize their LLM applications.


Related News:


Featured Image courtesy of DALL-E by ChatGPT

Yasmeeta Oon

Just a girl trying to break into the world of journalism, constantly on the hunt for the next big story to share.

Leave a Reply

Your email address will not be published. Required fields are marked *