Llama Guard 3 8B API

What is Llama Guard 3 8B and who developed it?

Llama Guard 3 8B is a content safety classification model developed by Meta, based on the Llama 3.1 8B architecture. It is fine-tuned to classify prompts and responses as safe or unsafe across 14 hazard categories based on the MLCommons taxonomy. It supports multilingual content moderation and is designed to integrate with systems using Llama 3.1.

What applications and use cases does Llama Guard 3 8B excel at?

•LLM prompt and response moderation
•Safety filtering for agentic systems
•Tool call classification (e.g., code interpreter abuse detection)
•Multilingual content safety across 8 languages
•Evaluation and filtering aligned with MLCommons hazard taxonomy

What is the maximum context length for Llama Guard 3 8B?

The model supports a context length of 131,072 tokens (131.1K).

What is the usable context window for Llama Guard 3 8B?

Fireworks supports the full 131.1K token context window on on-demand deployments.

Does Llama Guard 3 8B support quantized formats (4-bit/8-bit)?

Yes. Meta provides a quantized INT8 variant that reduces checkpoint size by ~40% with minimal performance degradation. Performance remains comparable across key benchmarks.

What is the maximum output length Fireworks allows for Llama Guard 3 8B?

The maximum output is constrained by the 131.1K token context limit (prompt + output). The model typically generates short classification strings.

What are known failure modes of Llama Guard 3 8B?

•May produce false positives (e.g., flagging benign prompts as unsafe)
•Limited performance on hazard categories requiring real-world updates (e.g., elections, IP, defamation)
•Susceptible to adversarial prompts and prompt injection
•May struggle with multilingual edge cases or commonsense gaps

How many parameters does Llama Guard 3 8B have?

The model contains 8 billion parameters.

Is fine-tuning supported for Llama Guard 3 8B?

Yes. Fireworks supports LoRA-based fine-tuning using its Reserved Fine-Tuning (RFT) infrastructure.

How are tokens counted (prompt vs completion)?

Token usage is measured as prompt + output, and constrained by the 131.1K token context window.

What rate limits apply on the shared endpoint?

•Serverless: Not supported
•On-demand: Available with no rate limits using dedicated GPUs

What license governs commercial use of Llama Guard 3 8B?

Llama Guard 3 8B is released under the Llama 3 Community License (Meta). This license restricts certain commercial use cases and requires user agreement for access. Full terms are available at https://llama.meta.com/license.

Fine-tuning Docs	Llama Guard 3 8B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for Llama Guard 3 8B using Fireworks' reliable, high-performance system with no rate limits.

Llama Guard 3 8B

Llama Guard 3 8B API Features

Fine-tuning

On-demand Deployment

Llama Guard 3 8B FAQs

Metadata

Specification

Supported Functionality