2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy

Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPUContinue reading on Towards Data Science »

Jan 31, 2025 - 21:40
 0
2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy

Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU