Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning
By Chenyu Zhao|4/18/2024
Loading...
We are pleased to announce the availability of the open-source Llama 3 8B and 70B models with 8k context, served from our blazing fast inference stack. Llama 3 is pretrained on over 15 trillion tokens and with a vocabulary of 128K tokens that encodes language much more efficiently.
Apart from adding the base models, over the next few days, we will be adding support for fine-tuning Llama 3 models, serving of LoRA adapters and increasing inference speeds further. Our serveless inference stack allows to serve 100s of LoRA adapters with NO additional cost.
Key takeaways from the Llama 3 Model announcement:
State-of-the-art performance: The new 8B and 70B parameter Llama 3 models establish a new state-of-the-art for open-source language model benchmarks. Improvements in pretraining, instruction fine-tuning, and architecture have led to superior performance on industry benchmarks and real-world use cases compared to competing models.
Open and responsible development: Meta is taking an open approach by releasing Llama 3 models early to enable community innovation. This is combined with a strong focus on responsible development and deployment, providing new safety tools, an updated Responsible Use Guide, and a system-level approach to mitigating potential harms.
Extensible platform with more to come: The 8B and 70B models are just the beginning. In the coming months, Meta plans to release larger models with up to 400B parameters with new capabilities like multimodality, multilingual conversation, and longer context.
Bringing the best of Open-source AI to Enterprises
Our goal at Fireworks is to make Open-source AI accessible to developers and businesses by providing the best language and image models at lightning-fast speeds with the utmost reliability.
Our industry-leading inference speed and quality for image and text generation are utilized by companies like Quora, Sourcegraph, Upstage, Tome, and Anysphere for their production use cases.
Build with Llama 3 on Fireworks AI
To quickly get up and running using Llama 3 on the Fireworks AI visit fireworks.ai to sign up for an account. Pickup the API Key from Profile on top right -> API Keys.
Install the Fireworks AI Python package
Accessing Llama 3 on Serverless Inference API
Below code snippet instantiates Fireworks client and uses chat completions API to call the Llama 3 listed at - accounts/fireworks/models/llama-v3-70b-instruct.
The above API request results in the below response.
For enterprises who need even faster speeds or throughput, you can serve Llama 3 on your own dedicated GPU infrastructure or personalized enterprise configurations. If you have any questions or would like to learn more, please don't hesitate to contact us.