Using multi-LoRA

Fireworks AI Docs

Introducing the Fireworks Build SDK

Basics of the Build SDK

Tutorial

Developing Evaluators

Reference

Learn to create, deploy, and manage resources using Firectl

Getting started

Authentication for access to your account

Authentication

List various resources in a Fireworks account

List resources

Create a Deployment on Fireworks AI platform

Create a deployment

Create a fine-tuning job with a base model

Create a fine-tuning job

Create a Dataset on the Fireworks platform

Create a dataset

Create model

Deletes a resource(s) in a Fireworks account

Delete resources

Load LoRA

Unload LoRA

Download a model from third-party locations

Download a model

Retrieves information from the Fireworks platform

Get resources

Updates resources on the Fireworks platform

Update resources

Undelete resources on the Fireworks platform

Undelete resources

Introduction

Creates a model response for the given chat conversation.

Create Chat Completion

Creates a completion for the provided prompt and parameters.

Create Completion

Creates a model response, optionally interacting with custom tools via the Model Context Protocol (MCP). This endpoint supports conversational continuation and streaming. Explore our cookbooks for detailed examples: - [Basic MCP Usage](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_mcp_examples.ipynb) - [Streaming with MCP](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_mcp_with_streaming.ipynb) - [Conversational History with `previous_response_id`](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_previous_response_cookbook.ipynb) - [Basic Streaming](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_streaming_example.ipynb) - [Controlling Response Storage](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/mcp_server_with_store_false_argument.ipynb)


Create a model response

Create embeddings

Generate an image with FLUX.1 [schnell] FP8

Generate or edit an image with FLUX.1 Kontext

Get generated image from FLUX.1 Kontext

Streaming Transcription

Transcribe audio

Translate audio

Create Batch Request

Check Batch Status

Get Account

List Deployments

Get Deployment

Create Deployment

Update Deployment

Delete Deployment

Undelete Deployment

List Models

Get Model

Create Model

Update Model

Prepare Model for different precisions

Get Model Upload Endpoint

Get Model Download Endpoint

Validate Model Upload

Delete Model

List LoRAs

Get LoRA

Update LoRA

List Supervised Fine-tuning Jobs

Get Supervised Fine-tuning Job

Create Supervised Fine-tuning Job

Delete Supervised Fine-tuning Job

List Reinforcement Fine-tuning Jobs

Get Reinforcement Fine-tuning Job

Create Reinforcement Fine-tuning Job

Delete Reinforcement Fine-tuning Job

List Datasets

Get Dataset

Create Dataset

Provides a streamlined way to upload a dataset file in a single API request. This path can handle file sizes up to 150Mb. For larger file sizes use [Get Dataset Upload Endpoint](get-dataset-upload-endpoint).


Upload Dataset Files

Get Dataset Upload Endpoint

Validate Dataset Upload

Update Dataset

Delete Dataset

List Users

Get User

Create User

Update User

List API Keys

Create API Key

Delete API Key

List Batch Jobs

Get Batch Job

Create Batch Job

Update Batch Job

Cancels an existing batch job if it is queued, pending, or running.

Cancel Batch Job

Get Batch Job Logs

Delete Batch Job

Batch Delete Batch Jobs

List Clusters

Get Cluster

Create Cluster

Update Cluster

Delete Cluster

Retrieve connection settings for the cluster to be put in kubeconfig

Get Cluster Connection Info

List Environments

Create Environment

Get Environment

Update Environment

Connects the environment to a node pool.
Returns an error if there is an existing pending connection.

Connect Environment

Disconnects the environment from the node pool. Returns an error
if the environment is not connected to a node pool.

Disconnect Environment

Delete Environment

Batch Delete Environments

List Node Pools

Get Node Pool

Create Node Pool

Update Node Pool

Get Node Pool Stats

Delete Node Pool

Batch Delete Node Pools

List Node Pool Bindings

Create Node Pool Binding

Delete Node Pool Binding

List Snapshots

Get Snapshot

Create Snapshot

Delete Snapshot

List Aws Iam Role Bindings

Create Aws Iam Role Binding

Delete Aws Iam Role Binding

Company account access

What should I do if I can't access my company account after being invited when I already have a personal account?

Close account

How do I close my Fireworks.ai account?

Multiple accounts login

I have multiple Fireworks accounts. When I try to login with Google on Fireworks' web UI, I'm getting signed into the wrong account. How do I fix this?

GitHub authentication email

What email does GitHub authentication use?

LinkedIn authentication email

What email does LinkedIn authentication use?

Pricing structure

How much does Fireworks cost?

Fine-tuned model fees

Are there extra fees for serving fine-tuned models?

Bulk usage discounts

Are there discounts for bulk usage?

Serverless discounts

Are there discounts for bulk spend on serverless deployments?

Credits & billing system

How does billing and credit usage work?

Account suspension reasons

Why might my account be suspended even with remaining credits?

$1 credit depleted

What happens when I finish my $1 credit?

Missing credits issue

I bought credits but don’t see them reflected in my account. Did they disappear?

Invoice vs credits

Why did I receive an invoice when I only deposited credits?

Credit receipts

Where's my receipt for purchased credits?

Models API billing

Are calls to the Models API billable?

Serverless prompt caching billing

Is prompt caching billed differently for serverless models?

Learn how to calculate token usage for images in vision models and understand pricing implications

Input image pricing

How many tokens per image?

Performance optimization

What are the techniques to improve performance?

Performance benchmarking

How can we benchmark?

Model latency ranges

What’s the latency for small, medium, and large LLM models?

Performance factors

What factors affect model latency and performance?

Performance best practices

What are the best practices for optimizing performance?

Serverless latency guarantees

Is latency guaranteed for serverless models?

Serverless SLAs

Are there any SLAs for serverless models?

Serverless quotas

Are there any quotas for serverless?

Fine-tuned serverless costs

Are there costs associated with deploying fine-tuned models to serverless infrastructure?

Model removal notice

Do you provide notice before removing model availability?

Serverless timeout issues

Why am I experiencing request timeout errors and slow response times with serverless LLM models?

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

Using multi-LoRA