You can also upload and deploy LoRA models fine-tuned outside of Fireworks. See importing fine-tuned models for details.
Single-LoRA deployment
Deploy your LoRA fine-tuned model with a single command that delivers performance matching the base model. This streamlined approach, called live merge, eliminates the previous two-step process and provides better performance compared to multi-LoRA deployments.Quick deployment
Deploy your LoRA fine-tuned model with one simple command:Your deployment will be ready to use once it completes, with performance that matches the base model.
Deployment with the Build SDK
You can also deploy your LoRA fine-tuned model using the Build SDK:The
id parameter can be any simple string - it does not need to follow the format "accounts/account_id/deployments/model_id".Multi-LoRA deployment
If you have multiple fine-tuned versions of the same base model (e.g., you’ve fine-tuned the same model for different use cases, applications, or prototyping), you can share a single base model deployment across these LoRA models to achieve higher utilization.Multi-LoRA deployment comes with performance tradeoffs. We recommend using it only if you need to serve multiple fine-tunes of the same base model and are willing to trade performance for higher deployment utilization.
Deploy with CLI
1
Create base model deployment
Deploy the base model with addons enabled:
2
Load LoRA addons
Once the deployment is ready, load your LoRA models onto the deployment:You can load multiple LoRA models onto the same deployment by repeating this command with different model IDs.
Deploy with the Build SDK
You can also use multi-LoRA deployment with the Build SDK:When using
deployment_type="on-demand-lora", you need to provide the base_id parameter that references the deployment ID of your base model deployment.When to use multi-LoRA deployment
Use multi-LoRA deployment when you:- Need to serve multiple fine-tuned models based on the same base model
- Want to maximize deployment utilization
- Can accept some performance tradeoff compared to single-LoRA deployment
- Are managing multiple variants or experiments of the same model
Serverless deployment
For quick experimentation and prototyping, you can deploy your fine-tuned model to shared serverless infrastructure without managing GPUs.Not all base models support serverless addons. Check the list of models that support serverless with LoRA to confirm your base model is supported.
Deploy to serverless
Load your fine-tuned model into a serverless deployment:Key considerations
- No hosting costs: Deploying to serverless is free—you only pay per-token usage costs
- Rate limits: Same rate limits apply as serverless base models
- Performance: Lower performance than on-demand deployments and the base model
- Automatic unloading: Unused addons may be automatically unloaded after a week
- Limit: Deploy up to 100 fine-tuned models to serverless
For production workloads requiring consistent performance, use on-demand deployments instead.