If you have multiple fine-tuned versions of the same base model (e.g. you’ve finetuned the same model for different use cases, applications, or prototyping/experimentation), it is possible to share a single base model deployment across these LoRA models to achieve higher utilization. We call this feature Multi-LoRA and it serves as an alternative to the deployment pattern we used in deploying a fine-tuned model using an on-demand deployment, where we had a single deployment serving a single LoRA model. Using Multi-LoRA comes with performance tradeoffs, so we recommend only using Multi-LoRA if you need to serve multiple fine-tunes of the same base model and are willing to trade off performance for higher deployment utilization.

To use Multi-LoRA, first create a deployment of your base model and pass the –enable-addons flag

firectl create deployment "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons

Then, when the deployment is ready, deploy the LoRA but provide the deployment ID of this deployment

firectl load-lora <FINE_TUNED_MODEL_ID> --deployment <DEPLOYMENT_ID>

You can deploy several LoRA models onto the same deployment this way.