Foreword
This tutorial demonstrates how to use the Fireworks AI Python SDK with a few toy examples. First, we will use the LLM class to make a simple request to various models and compare the outputs. Then, we will try to fine-tune a model to learn information it has never seen before.This tutorial will cost $10 to run due to the on-demand model deployments.
1. Setup
To get started with the Fireworks AI Python SDK, you need to install thefirectl
CLI tool and create an API key.
1
Install our CLI tool
firectl
to interact with the Fireworks AI platform.2
Sign in to Fireworks by running the following command:A browser window will open to the Fireworks AI login page. Once you login, your machine will be authenticated.
3
Create an API key by running the following command:Copy the value of the
Key
field to your environment variable FIREWORKS_API_KEY
.4
Install the Fireworks AI Python SDK.
2. Call a language model using the LLM()
class
Now that your machine is setup with credentials and the SDK, lets ensure you are
ready to make your first LLM call and explain some of the nuances of this SDK.
1
Create a new file called
main.py
and import the Fireworks AI SDK.main.py
2
Instantiate the When creating an LLM instance, you can specify the deployment type as either
LLM
class. The LLM class accepts a model
argument that you
can use to specify the model you want to use. For this tutorial, we will use the
Llama 4 Maverick
model.main.py
"serverless"
, "on-demand"
, or "auto"
. If you pass "auto"
, the SDK will try to use serverless hosting if available, otherwise it will create an on-demand deployment. In the other cases, the SDK will try to create a deployment of the specified type and will throw an error if it’s not available for the model you selected.The SDK will try and re-use existing deployments for the same model if possible, see Resource management for more details.
With great power comes great responsibility! Be careful with the
deployment_type
parameter, especially for "auto"
and "on-demand"
. While the SDK will try to make the most cost effective choice for you and put sensible autoscaling policies in place, it is possible to unintentionally create many deployments that lead to unwanted spend, especially when working with non-serverless models.When using
deployment_type="on-demand"
, you must provide an id
parameter to uniquely identify your deployment. This is required to prevent accidental creation of multiple deployments.When using
deployment_type="on-demand"
or deployment_type="on-demand-lora"
, you must call .apply()
to apply the deployment configuration to Fireworks. This is not required for serverless deployments. When using deployment_type="auto"
, the SDK will automatically handle deployment creation, but if it falls back to on-demand deployment, you may need to call .apply()
explicitly. If you do not call .apply()
, you are expected to set up the deployment through the deployment page at https://app.fireworks.ai/dashboard/deployments.3
Make a request to the LLM. The
LLM
class is OpenAI compatible, so you can use
the same chat completion interface to make a request to the LLM.4
The great thing about the SDK is that you can use your favorite Python constructs to powerfully work with LLMs. For example, let’s try calling a few LLMs in a loop and see how they respond:Or, we can test different temperature values to see how the model’s behavior changes:
main.py
main.py
3. Fine-tune a model
The Build SDK makes fine-tuning a model a breeze! To see how, let’s try a canonical use case: fine-tuning a model to learn information it has never seen before. To do this, we will use the TOFU (Task of Fictitious Unlearning) dataset. The dataset consists of ~4,000 question-answer pairs on autobiographies of 200 fictitious authors. Researchers fine-tuned a model on this dataset with the goal of investigating ways to “unlearn” this information. For our toy example, however, we will only focus on the first step: trying to embed these nonsense facts into an LLM.1
Install the required dependencies, you will need the
datasets
library from Hugging Face to load the dataset.2
Load and prepare the dataset. We must convert the dataset to the format expected by the fine-tuning service, which is a list of chat completion messages following the OpenAI chat completion format.
tofu.py
3
We can then create a
Dataset
object and upload it to Fireworks using the Dataset.from_list()
method.tofu.py
4
Now we can create a base model and fine-tune it on the dataset. Let’s try fine-tuning Qwen2.5 7B Instruct. At this time, it might be helpful to set the
FIREWORKS_SDK_DEBUG
environment variable to true
to see the progress of the fine-tuning job.Qwen2.5 7B Instruct is not available serverlessly, so the SDK will create an on-demand deployment with a scale-down window of 5 mins. This will incur some costs.
tofu.py
5
Now we can test the fine-tuned model.If everything worked out correctly, you should see something like:
tofu.py
Just like we did in the previous section, you can try iterating over different models and fine-tuning hyperparameters like
epochs
and learning_rate
to experiment with different fine-tuning jobs!6
You’ll notice that despite us using two models in this tutorial, the only actually created a single deployment. This is the power of the Build SDK’s smart resource management in action! Rather than creating a seperate deployment for the LoRA addon, we simply updated the base model deployment we created to support LoRA addons and then deployed our fine-tuned model on top.You can feel free to send more requests to either model. The SDK by default sets a scale-to-zero window of 5 mins, which stops billing after an extended period of inactivity. However, it’s good practice to delete deployments you’re not using as a precautionary measure against unexpected bills. You can call
base_model.delete_deployment(ignore_checks=True)
to delete the deployment, bypassing the check that triggers if you’ve used the deployment recently.tofu.py