YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Llama-3.2-3B

Run Llama-3.2-3B optimized for Intel NPUs with nexaSDK.

Quickstart

  1. Install nexaSDK and create a free account at sdk.nexa.ai

  2. Activate your device with your access token:

    nexa config set license '<access_token>'
    
  3. Run the model on Qualcomm NPU in one line:

    nexa infer NexaAI/llama3.2-3B-intel-npu
    

Model Description

Llama-3.2-3B is a compact member of the Llama 3.2 family, designed to provide strong general-purpose language modeling in a lightweight 3B parameter footprint.
It balances efficiency with capability, making it well-suited for edge devices, prototyping, and applications where latency and resource constraints are critical.

Features

  • Lightweight architecture: 3B parameters optimized for fast inference and low memory usage.
  • Instruction-following: Tuned for prompts, Q&A, and step-by-step reasoning.
  • Multilingual capabilities: Covers a wide range of global languages at smaller scale.
  • Deployment flexibility: Runs efficiently on consumer hardware and server environments.

Use Cases

  • Conversational assistants and chatbots.
  • Educational tools and lightweight tutoring systems.
  • Prototyping and experimentation with large language models on limited resources.
  • Applications where cost or latency is a priority over sheer scale.

Inputs and Outputs

Input: Text prompts—questions, commands, or code snippets.
Output: Natural language responses including answers, explanations, or structured outputs.

License

  • Licensed under Meta Llama 3.2 Community License

References

Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NexaAI/llama3.2-3B-intel-npu