GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Fireworks Blog

glm fireworks lockup

GLM 5.2 is live on Fireworks inference, day zero.

Fireworks f1: A breakthrough in complex reasoning with Compound AI
Model Releases
11/15/2024

Fireworks f1: A breakthrough in complex reasoning with Compound AI

How Upwork and Fireworks deliver faster, smarter proposals for freelancers
Case Studies
11/11/2024

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

FLUX.1 on Fireworks: Fast, frugal, and flexible
Model Releases
10/22/2024

FLUX.1 on Fireworks: Fast, frugal, and flexible

FireAttention V3: Enabling AMD as a viable alternative for GPU inference
Developer Experience
10/15/2024

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

Three projects, one platform: A developer's winning streak with Fireworks AI
Case Studies
10/14/2024

Three projects, one platform: A developer's winning streak with Fireworks AI

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference
Model Releases
9/25/2024

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

How Enterprises are using Multimodal Models in production with Fireworks
Case Studies
9/25/2024

How Enterprises are using Multimodal Models in production with Fireworks

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency
Developer Experience
9/18/2024

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

FireOptimizer: Customizing latency and quality for your production inference workload
Model Releases
8/30/2024

FireOptimizer: Customizing latency and quality for your production inference workload

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction
Developer Experience
8/29/2024

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1
Developer Experience
8/14/2024

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

How Fireworks evaluates quantization precisely and interpretably
Developer Experience
8/1/2024

How Fireworks evaluates quantization precisely and interpretably

Introducing Llama 3.1 inference endpoints in partnership with Meta
Model Releases
7/23/2024

Introducing Llama 3.1 inference endpoints in partnership with Meta

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
Company News
7/11/2024

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

How Cursor built Fast Apply using the Speculative Decoding API
Developer Experience
6/23/2024

How Cursor built Fast Apply using the Speculative Decoding API

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference
Model Releases
6/20/2024

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=
Model Releases
6/17/2024

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than  vLLM
Model Releases
6/3/2024

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

GPUs on-demand: Not serverless, not reserved, but some third thing
Developer Experience
6/3/2024

GPUs on-demand: Not serverless, not reserved, but some third thing

Code Generation with Large Language Models - Fireworks AI Take
Developer Experience
5/8/2024

Code Generation with Large Language Models - Fireworks AI Take

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell
Developer Experience
5/6/2024

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell