Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
By Lin Qiao|7/11/2024
Llama 3.3 70B Instruct just dropped, featuring improved reasoning, math, and instruction-following. Try it out!
By Lin Qiao|7/11/2024
We’re thrilled to announce our $52M Series B funding round led by Sequoia Capital, raising our valuation to $552M. Other investors in this round include NVIDIA, AMD, and MongoDB Ventures. Previous investors include Benchmark, Databricks Ventures, former Snowflake CEO Frank Slootman, former Meta COO Sheryl Sandberg, Airtable CEO Howie Liu, Scale AI CEO Alexandr Wang, as well as executives from LinkedIn, Confluent, Meta, and OnePassword.
This new funding round brings the total capital raised by Fireworks AI to $77M. This investment will help us drive the industry shift to compound AI systems, expand our team, and enhance our platform, enabling developers to quickly move AI applications from prototype to production.
Sequoia General Partner Sonya Huang shared with me “Fireworks AI is perfectly positioned to lead this industry shift. Their team's expertise in building high-performance inference stacks and innovative approach to enabling compound AI systems will empower developers with scalable AI solutions that were previously accessible only to tech giants.”
Since its inception, Fireworks AI has empowered developers with the fastest and most cost-effective inference for popular models. Today, we serve over 100 state-of-the-art models in text, image, audio, embedding, and multimodal formats, optimized for latency, throughput, and cost per token. We've reduced inference times by up to 12x compared to vLLM and 40x compared to GPT4. We process 140 billion tokens daily on our platform with 99.99% API uptime.
Unlike proprietary mega models that are generic, non-private, and hard to customize, Fireworks AI provides smaller, production-grade models that can be deployed privately and securely. Using minimal human-curated data, our ultra-fast LoRA fine-tuning allows developers to quickly customize models to their specific needs, transitioning from dataset preparation to querying a fine-tuned model in minutes. These fine-tuned models are seamlessly deployed, maintaining the same performance and cost benefits as our base models.
Developers at leading AI startups like Cresta, Cursor, and Liner, as well as digital-native giants like DoorDash, Quora, and Upwork, choose Fireworks AI for our smaller, specialized models. Cursor, for example, has used Fireworks AI's custom Llama 3-70b model to achieve 1000 tokens/sec for code generation use cases such as instant apply, smart rewrites, and cursor prediction, which boost developer productivity.
We continue to enhance our platform through deep collaboration with top providers across the AI stack, including partnerships with:
In the past three months, we've launched new features that drastically boost performance and cut costs, bridging the gap between prototyping and production. These include:
While leaderboards emphasize larger models, real-world AI results, especially in production, increasingly come from compound systems with multiple components. Compound AI systems tackle tasks using various interacting parts, such as multiple models, modalities, retrievers, external tools, data, and knowledge. Similar to microservices, agents in a compound AI system use LLMs to complete individual tasks and collectively solve complex problems. This modular approach allows developers to create multi-turn, multitask AI agent workflows with minimal coding. It reduces costs and complexity while enhancing reliability and speed for applications such as search, domain-expert copilots (e.g., coding, math, medicine). This approach was first proposed in a post by Matei Zaharia et al. from Berkeley Artificial Intelligence Research (BAIR).
Fireworks AI recently introduced a fundamental building block for compound AI systems: FireFunction V2, an open weights function calling model. FireFunction serves as an orchestrator across multiple models and their multimodal capabilities, external data and knowledge sources, search, transcription, and other APIs, while preserving core LLM capabilities such as multi-turn chat.
Key features include:
Superhuman, an AI-powered email provider, used Fireworks to create Ask AI, a compound AI system that delivers rapid answers from your inbox. Customers simply ask questions without needing to remember senders, guess keywords, or search through messages. Ask AI uses function calling to interact with search and calendar tools, prompt LLMs, and generate rapid responses.
We are thrilled about this new chapter for Fireworks AI and the AI community, as it reduces the complexity and inefficiencies of productionizing AI applications. We started Fireworks to empower AI startups, digital-native companies, and Fortune 500 enterprises alike to disrupt the status quo with groundbreaking products, experiences, and increased productivity. We can’t wait to see what you disrupt.