Stable Diffusion 3

Fireworks has partnered with Stability to provide blazing fast image generation using SD3, the latest and most advanced generative image model yet.

Try It Now

Featured Models

These models are deployed for industry-leading speeds to excel at production tasks

FireLLaVA-13B

Vision-language model allowing both image and text as inputs (single image is recommended), trained ...

up to 200 tokens/sec

Stable Diffusion 3

The most capable text-to-image model produced by stability.ai, with greatly improved performanc...

~1.25 sec for a 1024 x 1024 30-step image

FireFunction V1

Fireworks' open-source function calling model.

up to 200 tokens/sec

Stable Diffusion XL

Image generation model, produced by stability.ai.

~1.25 sec for a 1024 x 1024 30-step image

Mixtral MoE 8x7B Instruct

Mistral MoE 8x7B Instruct v0.1 model with Sparse Mixture of Experts. Fine tuned for instruction foll...

up to 200 tokens/sec

Mixtral MoE 8x22B Instruct

Mistral MoE 8x22B Instruct v0.1 model with Sparse Mixture of Experts. Fine tuned for instruction fol...

up to 200 tokens/sec

Llama 3 70B Instruct

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of...

up to 200 tokens/sec

Image Models

All currently deployed image models.

Model Name	Model Description
Stable Diffusion 3Serverless	The most capable text-to-image model produced by stability.ai, with greatly improved performance in multi-subject prompts, image quality, and spelling abilities. The Stable Diffusion 3 API is provided by Stability and the model is powered by Fireworks. Unlike other models on the Fireworks playground, you'll need a Stability API key to use this model. To use the API directly, visit https://platform.stability.ai/docs/api-reference#tag/Generate/paths/~1v2beta~1stable-image~1generate~1sd3/post accounts/stability/models/sd3
Stable Diffusion XLServerless	Image generation model, produced by stability.ai. accounts/fireworks/models/stable-diffusion-xl-1024-v1-0
Playground v2 1024Serverless	Playground v2 is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at playground.com. accounts/fireworks/models/playground-v2-1024px-aesthetic
Playground v2.5 1024Serverless	Playground v2.5 is a diffusion-based text-to-image generative model, and a successor to Playground v2. accounts/fireworks/models/playground-v2-5-1024px-aesthetic
Segmind Stable Diffusion 1B (SSD-1B)Serverless	Image generation model. Distilled from Stable Diffusion XL 1.0 and 50% smaller. accounts/fireworks/models/SSD-1B
Japanese Stable Diffusion XLServerless	Japanese Stable Diffusion XL (JSDXL) is a Japanese-specific SDXL model that is capable of inputting prompts in Japanese and generating Japanese-style images. accounts/fireworks/models/japanese-stable-diffusion-xl
Stable Diffusion 3 TurboServerless	Distilled, few-step version of Stable Diffusion 3, the newest image generation model from Stability AI, which is equal to or outperforms state-of-the-art text-to-image generation systems such as DALL-E 3 and Midjourney v6 in typography and prompt adherence, based on human preference evaluations. Stability AI has partnered with Fireworks AI, the fastest and most reliable API platform in the market, to deliver Stable Diffusion 3 and Stable Diffusion 3 Turbo. To use the API directly, visit https://platform.stability.ai/docs/api-reference#tag/Generate/paths/~1v2beta~1stable-image~1generate~1sd3/post accounts/stability/models/sd3-turbo

Language Models

Serverless models are hosted by Fireworks — No need to configure hardware or deploy models. Usage is billed per token.

Model Name	Model Description	Context
FireLLaVA-13BServerless	Vision-language model allowing both image and text as inputs (single image is recommended), trained on OSS model generated training data and open sourced on huggingface at fireworks-ai/FireLLaVA-13b accounts/fireworks/models/firellava-13b	4,096
FireFunction V1Serverless	Fireworks' open-source function calling model. accounts/fireworks/models/firefunction-v1	32,768
Mixtral MoE 8x7B InstructServerless	Mistral MoE 8x7B Instruct v0.1 model with Sparse Mixture of Experts. Fine tuned for instruction following accounts/fireworks/models/mixtral-8x7b-instruct	32,768
Mixtral MoE 8x22B InstructServerless	Mistral MoE 8x22B Instruct v0.1 model with Sparse Mixture of Experts. Fine tuned for instruction following. accounts/fireworks/models/mixtral-8x22b-instruct	65,536
Llama 3 70B InstructServerless	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. accounts/fireworks/models/llama-v3-70b-instruct	8,192
BleatServerless	Bleat allows you to enable function calling in LLaMA 2 in a similar fashion to OpenAI's implementation for ChatGPT. accounts/fireworks/models/bleat-adapter	4,096
Chinese Llama 2 LoRA 7BServerless	The LoRA version of Chinese-Llama-2 base on Llama-2-7b-hf. accounts/fireworks/models/chinese-llama-2-lora-7b	4,096
DBRX InstructServerless	DBRX Instruct is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. DBRX Instruct specializes in few-turn interactions. Dbrx is hosted as an experimental model. Fireworks only guarantees that it will be hosted serverless through April 2024. Future serverless availability will depend on overall usage. accounts/fireworks/models/dbrx-instruct	32,768
Gemma 7B InstructServerless	Gemma 7B Instruct from Google. Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms accounts/fireworks/models/gemma-7b-it	8,192
Hermes 2 Pro Mistral 7bServerless	Latest version of Nous Research's Hermes series of models, using an updated and cleaned version of the Hermes 2 dataset, and is now trained on a diverse and rich set of function calling and JSON mode samples accounts/fireworks/models/hermes-2-pro-mistral-7b	Unknown
Japanese StableLM Instruct Beta 70BServerless	japanese-stablelm-instruct-beta-70b is a 70B-parameter decoder-only language model based on japanese-stablelm-base-beta-70b and further fine tuned on Databricks Dolly-15k, Anthropic HH, and other public data. accounts/stability/models/japanese-stablelm-instruct-beta-70b	Unknown
Japanese Stable LM Instruct Gamma 7BServerless	This is a 7B-parameter decoder-only Japanese language model fine-tuned on instruction-following datasets, built on top of the base model Japanese Stable LM Base Gamma 7B. accounts/stability/models/japanese-stablelm-instruct-gamma-7b	Unknown
Llama 2 13B FrenchServerless	Fine-tuned meta-llama/Llama-2-13b-chat-hf to answer French questions in French. accounts/fireworks/models/llama-2-13b-fp16-french	4,096
Llama2 13B Guanaco QLoRA GGMLServerless	This chatbot model was built via parameter-efficient QLoRA finetuning of llama-2-13b on all 9.85k rows of timdettmers/openassistant-guanaco (a subset of OpenAssistant/oasst1 containing the highest-rated conversation paths). Finetuning was executed on a single A6000 (48 GB) for roughly 3.7 hours on the Lambda Labs platform. accounts/fireworks/models/llama-2-13b-guanaco-peft	4,096
Llama 7B SummarizeServerless	Summarizes articles and conversations. accounts/fireworks/models/llama2-7b-summarize	4,096
Llama Guard v2 8BServerless	Meta Llama Guard 2 is an 8B parameter Llama 3-based LLM safeguard model. Similar to Llama Guard, it can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. accounts/fireworks/models/llama-guard-2-8b	8,192
Llama 2 13BServerless	A 13B parameter Llama 2 model, trained on 2 trillion tokens with a context length of 4096. accounts/fireworks/models/llama-v2-13b	4,096
Llama 2 13B ChatServerless	A fine-tuned version of Llama 2 13B, optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF), and perform comparably to ChatGPT according to human evaluations. accounts/fireworks/models/llama-v2-13b-chat	4,096
Llama 2 13B CodeServerless	Code Llama is a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. accounts/fireworks/models/llama-v2-13b-code	4,096
Llama 2 13B Code InstructServerless	The 13B parameter Code Llama instruct model, fine-tuned for understanding natural language instructions. accounts/fireworks/models/llama-v2-13b-code-instruct	4,096
Llama 2 34B CodeServerless	Code Llama is a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. accounts/fireworks/models/llama-v2-34b-code	4,096
Llama 2 34B Code InstructServerless	The 34B parameter Code Llama instruct model, fine-tuned for understanding natural language instructions. accounts/fireworks/models/llama-v2-34b-code-instruct	4,096
Llama 2 70B ChatServerless	A fine-tuned version of Llama 2 70B, optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF), and perform comparably to ChatGPT according to human evaluations. accounts/fireworks/models/llama-v2-70b-chat	4,096
Llama 2 70B Code Llama instructServerless	An instruction-tuned version of Code Llama 70B, optimized for code generation. accounts/fireworks/models/llama-v2-70b-code-instruct	4,096
Llama 2 7BServerless	A 7B parameter Llama 2 model, trained on 2 trillion tokens with a context length of 4096. accounts/fireworks/models/llama-v2-7b	4,096
Llama 2 7B ChatServerless	A fine-tuned version of Llama 2 7B, optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF), and perform comparably to ChatGPT according to human evaluations. accounts/fireworks/models/llama-v2-7b-chat	4,096
Llama 3 70B Instruct (HF version)Serverless	Llama v3 70B instruct (Hugging Face) accounts/fireworks/models/llama-v3-70b-instruct-hf	8,192
Llama 3 8B InstructServerless	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. accounts/fireworks/models/llama-v3-8b-instruct	8,192
Llama 3 8B Instruct (HF version)Serverless	Llama V3 8B Instruct (huggingface) accounts/fireworks/models/llama-v3-8b-instruct-hf	8,192
LLaVA V1.6 Yi 34BServerless	Vision language model LLaVA 1.6 allowing both image and text inputs accounts/fireworks/models/llava-yi-34b	4,096
Mistral 7BServerless	The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. accounts/fireworks/models/mistral-7b	32,768
Mistral 7B InstructServerless	The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets. accounts/fireworks/models/mistral-7b-instruct-4k	32,768
Mistral 7B Instruct v0.2Serverless	The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. accounts/fireworks/models/mistral-7b-instruct-v0p2	32,768
Mixtral MoE 8x22BServerless	Mixtral 8x22b that matches HF numerics accounts/fireworks/models/mixtral-8x22b-hf	65,536
Mixtral MoE 8x22B Instruct (HF version)Serverless	Mixtral 8x22b Instruct that matches HF numerics accounts/fireworks/models/mixtral-8x22b-instruct-hf	65,536
Mixtral MoE 8x7BServerless	Mistral MoE model. Warning: unofficial implementation as model code is not yet available. accounts/fireworks/models/mixtral-8x7b	32,768
Mixtral MoE 8x7B Instruct (HF version)Serverless	Mixtral MoE 8x7B Instruct, with matching numerics from Huggingface implementation accounts/fireworks/models/mixtral-8x7b-instruct-hf	32,768
MythoMax L2 13bServerless	An improved, potentially even perfected variant of MythoMix. accounts/fireworks/models/mythomax-l2-13b	4,096
Nous Hermes 2 - Mixtral 8x7B - DPO (fp8)Serverless	Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM. accounts/fireworks/models/nous-hermes-2-mixtral-8x7b-dpo-fp8	32,768
Mistral 7B OpenOrcaServerless	We have used our own OpenOrca dataset to fine-tune on top of Mistral 7B. This dataset is our attempt to reproduce the dataset generated for Microsoft Research's Orca Paper. We use OpenChat packing, trained with Axolotl. accounts/fireworks/models/openorca-7b	32,768
Qwen1.5 72B ChatServerless	Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. accounts/fireworks/models/qwen1p5-72b-chat	32,768
StableLM 2 Zephyr 1.6BServerless	Stable LM 2 Zephyr 1.6B is a 1.6 billion parameter instruction tuned language model inspired by HugginFaceH4's Zephyr 7B training pipeline. The model is trained on a mix of publicly available datasets and synthetic datasets, utilizing Direct Preference Optimization (DPO). accounts/stability/models/stablelm-2-zephyr-2b	4,096
StableLM Zephyr 3BServerless	StableLM Zephyr 3B is a 3 billion parameter instruction tuned inspired by HugginFaceH4's Zephyr 7B training pipeline this model was trained on a mix of publicly available datasets, synthetic datasets using Direct Preference Optimization (DPO), evaluation for this model based on MT Bench and Alpaca Benchmark. accounts/stability/models/stablelm-zephyr-3b	4,096
StarCoder 15.5BServerless	A 15.5B parameter model trained on 80+ programming languages from The Stack (v1.2), using Multi Query Attention and the Fill-in-the-Middle objective. accounts/fireworks/models/starcoder-16b	8,192
StarCoder 7BServerless	A 7B parameter model trained on 80+ programming languages from The Stack (v1.2), using Multi Query Attention and the Fill-in-the-Middle objective. accounts/fireworks/models/starcoder-7b	8,192
Traditional Chinese Llama2Serverless	QLoRa fine-tuned Llama 2 model on traditional Chinese Alpaca dataset. accounts/fireworks/models/traditional-chinese-qlora-llama2	4,096
Capybara 34BServerless	34B chat model from NousResearch, based on Yi-34B-200k. accounts/fireworks/models/yi-34b-200k-capybara	200,000
Zephyr 7B BetaServerless	Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). accounts/fireworks/models/zephyr-7b-beta	4,096