Link: Meta debuts "quantized" versions of Llama 3.2 1B and 3B models, designed to run on low-powered devices and developed in collaboration with Qualcomm and MediaTek (Mike Wheatley/SiliconANGLE)
Meta Platforms Inc. has launched quantized versions of its Llama 3.2 1B and Llama 3B models, aiming to enhance accessibility. These lightweight models, designed for low-powered devices, were made to meet the demand for on-device AI.
The quantized models significantly reduce memory usage and increase inference speeds, even on devices with limited resources. Meta announced these advancements at their Connect 2024 event.
In a recent blog post, Meta's AI research team shared their use of Quantization-Aware Training with LoRA adaptors (QLoRA) to optimize model performance in low-precision environments. They also introduced an alternative method, SpinQuant, for balancing performance with portability.
Tests show that the quantized models are up to four times faster and reduce model size by 56% without sacrificing much performance. These models performed well in trials, even on commercial devices like the Android OnePlus 12.
Meta has collaborated with Qualcomm and MediaTek to fine-tune these models for ARM-based system-on-chip hardware, enhancing their functionality on mobile CPUs. This collaboration helps in creating more dynamic AI experiences directly on mobile devices.
Starting today, the new quantized Llama models are available for download on Llama.com and Hugging Face, marking a significant step in Meta's push for more accessible, efficient AI solutions. #
--
Yoooo, this is a quick note on a link that made me go, WTF? Find all past links here.
Member discussion