Skip to content

Latest commit

 

History

History
190 lines (126 loc) · 14 KB

efficient_llm.md

File metadata and controls

190 lines (126 loc) · 14 KB

Efficient LLM

Survey

  • A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, arXiv, 2411.03350, arxiv, pdf, cication: -1

    Fali Wang, Zhiwei Zhang, Xianren Zhang, ..., Ming Huang, Suhang Wang · (mp.weixin.qq)

  • A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, arXiv, 2411.03350, arxiv, pdf, cication: -1

    Fali Wang, Zhiwei Zhang, Xianren Zhang, ..., Ming Huang, Suhang Wang

  • A Survey of Small Language Models, arXiv, 2410.20011, arxiv, pdf, cication: -1

    Chien Van Nguyen, Xuan Shen, Ryan Aponte, ..., Ryan A. Rossi, Thien Huu Nguyen

Efficient LLM

Finetune

  • Knowledge Composition using Task Vectors with Learned Anisotropic Scaling, arXiv, 2407.02880, arxiv, pdf, cication: -1

    Frederic Z. Zhang, Paul Albert, Cristian Rodriguez-Opazo, ..., Anton van den Hengel, Ehsan Abbasnejad · (atlas - fredzzhang) Star

  • Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study, arXiv, 2411.02462, arxiv, pdf, cication: -1

    André Storhaug, Jingyue Li · (peft-unit-test-generation-replication-package - andstor) Star

  • LoRA vs Full Fine-tuning: An Illusion of Equivalence, arXiv, 2410.21228, arxiv, pdf, cication: -1

    Reece Shuttleworth, Jacob Andreas, Antonio Torralba, ..., Pratyusha Sharma · (𝕏)

Quantization

  • PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs, arXiv, 2410.05265, arxiv, pdf, cication: -1

    Mengzhao Chen, Yi Liu, Jiahao Wang, ..., Wenqi Shao, Ping Luo · (PrefixQuant - ChenMnZ) Star · (arxiv)

  • 🌟 Scaling Laws for Precision, arXiv, 2411.04330, arxiv, pdf, cication: -1

    Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, ..., Christopher Ré, Aditi Raghunathan · (𝕏) · (𝕏)

  • 🌟 BitNet a4.8: 4-bit Activations for 1-bit LLMs, arXiv, 2411.04965, arxiv, pdf, cication: -1

    Hongyu Wang, Shuming Ma, Furu Wei

  • "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization, arXiv, 2411.02355, arxiv, pdf, cication: -1

    Eldar Kurtic, Alexandre Marques, Shubhra Pandit, ..., Mark Kurtz, Dan Alistarh

  • QTIP: Quantization with Trellises and Incoherence Processing, arXiv, 2406.11235, arxiv, pdf, cication: 1

    Albert Tseng, Qingyao Sun, David Hou, ..., Christopher De Sa · (qtip - Cornell-RelaxML) Star · (x) · (t)

Distillation

  • Stronger Models are NOT Stronger Teachers for Instruction Tuning, arXiv, 2411.07133, arxiv, pdf, cication: -1

    Zhangchen Xu, Fengqing Jiang, Luyao Niu, ..., Bill Yuchen Lin, Radha Poovendran

  • Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling, arXiv, 2410.11325, arxiv, pdf, cication: -1

    Wenda Xu, Rujun Han, Zifeng Wang, ..., Chen-Yu Lee, Tomas Pfister

Pruning

  • The Super Weight in Large Language Models, arXiv, 2411.07191, arxiv, pdf, cication: -1

    Mengxia Yu, De Wang, Qi Shan, ..., Colorado Reed, Alvin Wan

  • Sparsing Law: Towards Large Language Models with Greater Activation Sparsity, arXiv, 2411.02335, arxiv, pdf, cication: -1

    Yuqi Luo, Chenyang Song, Xu Han, ..., Zhiyuan Liu, Maosong Sun

  • What Matters in Transformers? Not All Attention is Needed, arXiv, 2406.15786, arxiv, pdf, cication: 1

    Shwai He, Guoheng Sun, Zheyu Shen, ..., Ang Li

Inference

Small Language Models

Transformer

  • 🌟 SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration, arXiv, 2411.10958, arxiv, pdf, cication: -1

    Jintao Zhang, Haofeng Huang, Pengle Zhang, ..., Jun Zhu, Jianfei Chen · (SageAttention. - thu-ml) Star

  • ThunderKittens: Simple, Fast, and Adorable AI Kernels, arXiv, 2410.20399, arxiv, pdf, cication: -1

    Benjamin F. Spector, Simran Arora, Aaryan Singhal, ..., Daniel Y. Fu, Christopher Ré

  • 🎬 Differential Transformer 论文原理逐段讲解

  • SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs, arXiv, 2410.13276, arxiv, pdf, cication: -1

    Yizhao Gao, Zhichen Zeng, Dayou Du, ..., Fan Yang, Mao Yang

  • MoH: Multi-Head Attention as Mixture-of-Head Attention, arXiv, 2410.11842, arxiv, pdf, cication: -1

    Peng Jin, Bo Zhu, Li Yuan, ..., Shuicheng Yan · (arxiv) · (MoH - SkyworkAI) Star · (huggingface)

Hardware

Tutorials

Projects

Products

Misc