Using Multi-LoRA
If you have multiple fine-tuned versions of the same base model (e.g. you’ve finetuned the same model for different use cases, applications, or prototyping/experimentation), it is possible to share a single base model deployment across these LoRA models to achieve higher utilization. We call this feature Multi-LoRA and it serves as an alternative to the deployment pattern we used in deploying a fine-tuned model using an on-demand deployment, where we had a single deployment serving a single LoRA model. Using Multi-LoRA comes with performance tradeoffs, so we recommend only using Multi-LoRA if you need to serve multiple fine-tunes of the same base model and are willing to trade off performance for higher deployment utilization.
To use Multi-LoRA, first create a deployment of your base model and pass the –enable-addons
flag
Then, when the deployment is ready, deploy the LoRA but provide the deployment ID of this deployment
You can deploy several LoRA models onto the same deployment this way.