LLMTunableChat
Fireworks' latest and most performant function-calling model. Firefunction-v2 is based on Llama-3 and trained to excel at function-calling as well as chat and instruction-following. See blog post for more details https://fireworks.ai/blog/firefunction-v2-launch-post
FireFunction V2 is a fine-tuned model (LoRA). LoRAs are smaller models that have a corresponding base model and can only be deployed on a serverless or dedicated deployment for its corresponding base model. Compared to inference prices with base models, there is no extra charge for using LoRAs. Docs
FireFunction V2 can be fine-tuned on your data to create a model with better response quality. Fireworks uses low-rank adaptation (LoRA) to train a model that can be served efficiently at inference time.
See the Fine-tuning guide for details.
On-demand deployments allow you to use FireFunction V2 on private GPUs. Deploy FireFunction V2 (and up to 100 other LoRAs) to any of your on-deployments of accounts/fireworks/models/llama-v3-70b-instruct-hf with no additional costs.
See the On-demand deployments guide for details.