--- thumbnail: https://assets-global.website-files.com/646b351987a8d8ce158d1940/64ec9e96b4334c0e1ac41504_Logo%20with%20white%20text.svg metrics: - memory_disk - memory_inference - inference_latency - inference_throughput - inference_CO2_emissions - inference_energy_consumption tags: - pruna-ai ---
[](https://github.com/PrunaAI/pruna) [](https://twitter.com/PrunaAI) [](https://www.linkedin.com/company/pruna-ai) [](https://discord.com/invite/JFQmtFKCjd) [](https://dashboard.pruna.ai/login?utm_source=huggingface&utm_medium=org_card&utm_campaign=hf_traffic) ## This repo contains GGUF versions of the McGill-NLP/Llama-3-8B-Web model. # Make AI models cheaper, smaller, faster, and greener! - Give a thumbs up if you like this model! - Read the documentations to know more [here](https://docs.pruna.ai/) - Join the Pruna AI community on Discord [here](https://discord.com/invite/JFQmtFKCjd) to share feedback/suggestions or get help. **Frequently Asked Questions** - ***How does the compression work?*** The model is compressed with GGUF. - ***How does the model quality change?*** The quality of the model output might vary compared to the base model. - ***What is the model format?*** We use GGUF format. - ***What calibration data has been used?*** If needed by the compression method, we used WikiText as the calibration data. - ***How to compress my own models?*** You can request premium access to more compression methods and tech support for your specific use-cases [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai). # Downloading and running the models You can download the individual files from the Files & versions section. Here is a list of the different versions we provide. For more info checkout [this chart](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9) and [this guide](https://www.reddit.com/r/LocalLLaMA/comments/1ba55rj/overview_of_gguf_quantization_methods/): | Quant type | Description | |------------|--------------------------------------------------------------------------------------------| | Q5_K_M | High quality, recommended. | | Q5_K_S | High quality, recommended. | | Q4_K_M | Good quality, uses about 4.83 bits per weight, recommended. | | Q4_K_S | Slightly lower quality with more space savings, recommended. | | IQ4_NL | Decent quality, slightly smaller than Q4_K_S with similar performance, recommended. | | IQ4_XS | Decent quality, smaller than Q4_K_S with similar performance, recommended. | | Q3_K_L | Lower quality but usable, good for low RAM availability. | | Q3_K_M | Even lower quality. | | IQ3_M | Medium-low quality, new method with decent performance comparable to Q3_K_M. | | IQ3_S | Lower quality, new method with decent performance, recommended over Q3_K_S quant, same size with better performance. | | Q3_K_S | Low quality, not recommended. | | IQ3_XS | Lower quality, new method with decent performance, slightly better than Q3_K_S. | | Q2_K | Very low quality but surprisingly usable. | ## How to download GGUF files ? **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file. The following clients/libraries will automatically download models for you, providing a list of available models to choose from: * LM Studio * LoLLMS Web UI * Faraday.dev - **Option A** - Downloading in `text-generation-webui`: - **Step 1**: Under Download Model, you can enter the model repo: PrunaAI/Llama-3-8B-Web-GGUF-smashed and below it, a specific filename to download, such as: phi-2.IQ3_M.gguf. - **Step 2**: Then click Download. - **Option B** - Downloading on the command line (including multiple files at once): - **Step 1**: We recommend using the `huggingface-hub` Python library: ```shell pip3 install huggingface-hub ``` - **Step 2**: Then you can download any individual model file to the current directory, at high speed, with a command like this: ```shell huggingface-cli download PrunaAI/Llama-3-8B-Web-GGUF-smashed Llama-3-8B-Web.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False ```