Description
What is Llama?
Llama 4 Models (April 2025)
Scout (109B total params, 17B active)
- Context window: 10M tokens (industry leader)
- Architecture: 16 experts MoE
- Deployment: Fits in 1 H100 GPU (with int4 quantization)
- Training: ~40T multimodal tokens
- Best for: Long-context reasoning, summarization, visual understanding
Maverick (400B total params, 17B active)
- Context window: 1M tokens
- Architecture: 128 experts MoE
- Deployment: 1 H100 DGX host
- Training: ~22T multimodal tokens
- Performance: Beats GPT-4o, Gemini 2.0 Flash; comparable DeepSeek v3
- Best for: Multimodal tasks, reasoning, coding
- Integration: Used in Meta AI (WhatsApp, Messenger, Instagram)
Behemoth (2T total params, 288B active) - In training
- Architecture: 16 experts MoE
- Performance: Outperforms GPT-4.5, Claude 3.7 Sonnet, Gemini 2.0 Pro
- Benchmarks: Leader in MATH-500, GPQA Diamond (STEM)
- Status: Not yet publicly released
Previous Generations
Llama 3.2 (Oct 2024): First multimodal model
Llama 3.1 405B (Jul 2024): First frontier open-source model
Llama 3 (Apr 2024): 8B and 70B params, better reasoning
Llama 2 (Jul 2023): First version with open license
Llama 1 (Feb 2023): Initial release (limited access)
Key Features
- Simultaneous text + image + video understanding
- Early fusion training (integration from start, no separate encoders)
- Scout: 10M tokens (industry record)
- Maverick: 1M tokens
- Llama 3.x: 128K tokens
- Only 17B active params per token (of 109B-400B total)
- Faster and cheaper inference than equivalent dense models
- Scout: Fits in 1 H100 GPU
- 12 languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, Vietnamese
- Downloadable and modifiable weights
- Full fine-tuning permitted
- On-premise or cloud deployment
Pricing
- Free commercial use (< 700M MAU)
- Modification and fine-tuning permitted
- Research without restrictions
- Input: $0.1/1M tokens
- Output: $0.4/1M tokens
- 10-15x cheaper than GPT-4o/Claude 3.5
- Companies 700M+ MAU: require special Meta license
- EU users/companies: prohibited from using or distributing
- Acceptable Use Policy: prohibits violence, criminal activity, etc.
Safety Tools
Prompt Guard: Protection against prompt injection
Code Shield: Inference-time insecure code filtering
CyberSecEval: Cybersecurity risk assessment suite
Llama Firewall: Security guardrails for AI systems
Where to Use Llama
- WhatsApp, Messenger, Instagram Direct
- Meta.ai website
- 40 countries available
- AWS Bedrock
- Azure AI
- Google Cloud
- Databricks
- Snowflake
- Hugging Face
- Together AI
- Fireworks AI
- Groq
- Cerebras
- Replicate
- Ollama (local)
- Unsloth, Axolotl, LLaMA-Factory
- AWS, Azure managed services
- Qualcomm Snapdragon integration
- Smartphones, PCs, VR/AR headsets
Use Cases
- Custom chatbots and assistants
- RAG pipelines with proprietary data
- Document analysis and summarization
- Multilingual translation
- Code generation and debugging
- Agentic coding workflows
- API integration
- Text generation
- Image understanding
- Video analysis
- Creative writing
- Base for model distillation
- Architecture benchmarking
- Academic research
Advantages
✅ Extreme context: 10M tokens (Scout)
✅ Native multimodal: text + image + video
✅ Efficient MoE: 17B active vs 400B total
✅ On-premise: Complete data control
✅ Fine-tuning: Full customization
✅ No vendor lock-in
✅ Meta ecosystem: 3B+ users
✅ Safety tools included
✅ Multilingual: 12 languages
Limitations
❌ EU restrictions: Prohibited for EU users/companies
❌ 700M MAU limit: Successful startups must renegotiate
❌ Hardware requirements: Large models need expensive GPUs
❌ Inferior coding: 40% LiveCodeBench vs 85% GPT-5
❌ Hallucinations: Generates false info like other LLMs
❌ Data cutoff: August 2024
❌ Not reasoning model: Not like o1/o3-mini
Key Features
Llama 4 Scout: 10M token context, MoE 16 experts, fits in 1 H100 GPU
Llama 4 Maverick: 400B params, 17B active, 1M context, beats GPT-4o
Llama 4 Behemoth: 2T params in training, outperforms GPT-4.5 and Claude 3.7
Native multimodal: text + image + video from start
Mixture-of-Experts: 17B active reduces costs vs dense models
Open-weight: download weights, full fine-tuning, on-premise deploy
Extreme context: up to 10M tokens (Scout) - industry leader
Free under Community license (< 700M MAU users)
Multilingual: 12 languages including Spanish
Safety tools: Llama Guard 3, Prompt Guard, Code Shield included
Meta AI integration: WhatsApp, Messenger, Instagram (3B+ users)
Cloud platforms: AWS, Azure, GCP, Databricks, Snowflake
Inference providers: Hugging Face, Together AI, Groq, Ollama
On-device: Qualcomm Snapdragon for smartphones and headsets
Early fusion multimodality: better than separate encoders
Cost-efficient: $0.1-0.4/1M tokens (10-15x cheaper than GPT-4o)
Fine-tuning frameworks: LoRA, QLoRA, PEFT-based
RAG integration: LangChain, LlamaIndex compatible
Llama 3.3 70B: 405B-level performance at fraction of cost
Training scale: 40T tokens (Scout), 22T tokens (Maverick)
Use Cases
Enterprise chatbots and assistants with proprietary data
RAG pipelines for document analysis
Code generation and debugging workflows
Multilingual content translation (12 languages)
Long-context document summarization (10M tokens)
Image understanding and visual Q&A
Video analysis and content moderation
On-premise AI deployment (data control)
Model distillation to create smaller models
Research and academic experimentation
Fine-tuning for domain-specific tasks
Customer support automation
Content generation for marketing
Legal document analysis
Medical research text processing
Financial data analysis
Social media content moderation
Educational tutoring systems
Synthetic data generation
Agentic workflows with tool calling
Information
Company
Meta AI
Website
llama.comUser Reviews
Related AIs

Midjourney
Midjourney Inc.
Leading AI image generator in artistic quality that transforms text prompts into stunning visual artwork, with V7 model, V1 video generation, and 21M+ user community.

Stable Diffusion
Stability AI
Open-source AI image generation model from Stability AI. Includes SD 3.5 with 8.1B parameters, runnable locally on consumer hardware, with over 10,000 fine-tuned models and free license for commercial use.

Runway
Runway AI Inc.
Leading AI video generation platform for film and creatives. Gen-4.5 (#1 Video Arena), partnerships with Lionsgate/IMAX, 300K+ customers and $3B+ valuation.
