Tag: quantization

Small Language Models Achieve LLM-Level Accuracy in Bug Fixing, Significantly Reducing Resource Demands

New research reveals that small language models (SLMs) can fix software bugs with accuracy comparable to large language models (LLMs), while consuming fewer computational resources. The study also demonstrates that int8 quantization further enhances efficiency with minimal impact on performance, making automated program repair more accessible.

hugging face

optimum

Optimizing Transformer Models: A Deep Dive into Hugging Face Optimum, ONNX Runtime, and Quantization

This tutorial guides you through optimizing Transformer models using Hugging Face Optimum, ONNX Runtime, and quantization techniques. We demonstrate how to achieve faster inference speeds while maintaining model accuracy, providing a practical approach for production deployments.