GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Fireworks Blog

glm fireworks lockup

GLM 5.2 is live on Fireworks inference, day zero.

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning
Model Releases
4/18/2024

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

Getting Started with Stability’s API Powered by Fireworks
Developer Experience
4/17/2024

Getting Started with Stability’s API Powered by Fireworks

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI
Developer Experience
3/21/2024

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

multi-operation-fusions-MoE
Developer Experience
3/10/2024

Training-Inference Parity in MoE Models: Where Numerics Drift

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference
Model Releases
3/8/2024

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

Fireworks Platform Spring 2024 Updates
Model Releases
3/1/2024

Fireworks Platform Spring 2024 Updates

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights
Model Releases
2/20/2024

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

Why do all LLMs need structured output modes?
Model Releases
2/20/2024

Why do all LLMs need structured output modes?

FireLLaVA: the first commercially permissive OSS LLaVA model
Model Releases
1/18/2024

FireLLaVA: the first commercially permissive OSS LLaVA model

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs
Developer Experience
1/8/2024

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

Fireworks Raises the Quality Bar with Function Calling Model and API Release
Model Releases
12/20/2023

Fireworks Raises the Quality Bar with Function Calling Model and API Release

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release
Model Releases
12/14/2023

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

LLM Inference Performance Benchmarking (Part 1)
Developer Experience
11/3/2023

LLM Inference Performance Benchmarking (Part 1)

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!
Model Releases
11/2/2023

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance
Company News
10/27/2023

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

Accelerating Code Completion with Fireworks Fast LLM Inference
Model Releases
10/11/2023

Accelerating Code Completion with Fireworks Fast LLM Inference

Fireworks.ai Now Available on LangChain Prompt Playground
Model Releases
10/2/2023

Fireworks.ai Now Available on LangChain Prompt Playground

Simplifying Code Infilling with Code Llama and Fireworks.ai
Developer Experience
9/12/2023

Simplifying Code Infilling with Code Llama and Fireworks.ai

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning
Developer Experience
8/29/2023

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform
Model Releases
8/17/2023

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

Multi-Query Attention is All You Need
Developer Experience
7/12/2023

Multi-Query Attention is All You Need