Llama 3 70B Instruct API & Playground

Llama 3 70B Instruct API Features

Fine-tuning Docs	Llama 3 70B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for Llama 3 70B Instruct using Fireworks' reliable, high-performance system with no rate limits.

Llama 3 70B Instruct FAQs

What is Llama 3 70B Instruct and who developed it?

Llama 3 70B Instruct is an instruction-tuned large language model developed by Meta, part of the Llama 3 family released in April 2024. It is optimized for assistant-style dialogue and natural language generation tasks.

What applications and use cases does Llama 3 70B Instruct excel at?

Llama 3 70B Instruct excels at:

Chat-based assistants
Code generation (HumanEval: 81.7%)
Reasoning tasks (MMLU: 82.0%, GSM8K: 93.0%)
QA and comprehension (SQuAD, DROP, BoolQ)

What is the maximum context length for Llama 3 70B Instruct?

The model supports a context length of 8.2k tokens.

What is the usable context window for Llama 3 70B Instruct?

Usable context is up to 8.2k tokens, which matches the maximum context length.

Does Llama 3 70B Instruct support quantized formats (4-bit/8-bit)?

The model supports 48 quantized variants, confirming availability in 4-bit and 8-bit formats.

What are known failure modes of Llama 3 70B Instruct?

Despite extensive safety evaluations, the model may:

Produce inaccurate or biased outputs
Exhibit residual refusal or safety edge cases
Show English-language bias (training focus was English)

Does Llama 3 70B Instruct support function-calling schemas?

No, function calling is not supported.

How many parameters does Llama 3 70B Instruct have?

The model has 70.6 billion parameters.

Is fine-tuning supported for Llama 3 70B Instruct?

Llama 3 70B Instruct supports LoRA fine-tuning, full fine-tuning, and serverless LoRA.

How are tokens counted (prompt vs completion)?

Tokens are billed per 1M input/output tokens.

What rate limits apply on the shared endpoint?

On-demand deployment is supported with no rate limits.

What license governs commercial use of Llama 3 70B Instruct?

The model is released under the Llama 3 Community License, which allows commercial use. License details are hosted at llama.meta.com/license.

Llama 3 70B Instruct

Llama 3 70B Instruct API Features

Fine-tuning

On-demand Deployment

Llama 3 70B Instruct FAQs

Metadata

Specification

Supported Functionality