Observational Scaling Laws and the Predictability of Language Model Performance, Yangjun Ruan+, arXiv'24 #1540

AkihikoWatanabe · 2024-11-22T03:04:57Z

URL

https://arxiv.org/abs/2405.10938

Authors

Yangjun Ruan
Chris J. Maddison
Tatsunori Hashimoto

Abstract

Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~100 publically available models. Building a single scaling law from multiple model families is challenging due to large variations in their training compute efficiencies and capabilities. However, we show that these variations are consistent with a simple, generalized scaling law where language model performance is a function of a low-dimensional capability space, and model families only vary in their efficiency in converting training compute to capabilities. Using this approach, we show the surprising predictability of complex scaling phenomena: we show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models; we show that the agent performance of models such as GPT-4 can be precisely predicted from simpler non-agentic benchmarks; and we show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.

Translation (by gpt-4o-mini)

言語モデルの性能がスケールによってどのように変化するかを理解することは、ベンチマークやアルゴリズムの開発にとって重要である。スケーリング法則はこの理解を構築するための一つのアプローチであるが、さまざまなスケールでモデルを訓練する必要があるため、その利用は制限されてきた。そこで本研究では、モデルの訓練を回避し、約100の公開されているモデルからスケーリング法則を構築する代替的な観察アプローチを提案する。複数のモデルファミリーから単一のスケーリング法則を構築することは、訓練にかかる計算効率や能力の大きな変動のために困難である。しかし、これらの変動は、言語モデルの性能が低次元の能力空間の関数であり、モデルファミリーは訓練計算を能力に変換する効率が異なるという単純で一般化されたスケーリング法則と一致することを示す。このアプローチを用いることで、複雑なスケーリング現象の驚くべき予測可能性を示す。具体的には、いくつかの新たに現れる現象が滑らかでシグモイド的な挙動に従い、小さなモデルから予測可能であることを示し、GPT-4のようなモデルのエージェント性能がより単純な非エージェント的ベンチマークから正確に予測できることを示し、言語モデルの能力が向上し続ける中で、Chain-of-ThoughtやSelf-Consistencyのような訓練後の介入の影響を予測する方法を示す。

Summary (by gpt-4o-mini)

言語モデルの性能を理解するために、約100の公開モデルからスケーリング法則を構築する新しい観察アプローチを提案。モデルファミリー間の能力変動を考慮し、性能が低次元の能力空間の関数であることを示す。これにより、複雑なスケーリング現象の予測可能性を示し、GPT-4のエージェント性能を非エージェント的ベンチマークから予測できることを明らかにし、Chain-of-ThoughtやSelf-Consistencyの影響を予測する方法を示す。

AkihikoWatanabe · 2024-11-22T03:07:37Z

縦軸がdownstreamタスクの主成分（のうち最も大きい80%を説明する成分）の変化（≒LLMの性能）で、横軸がlog scaleの投入計算量。
Qwenも頑張っているが、投入データ量に対する性能（≒データの品質）では、先駆け的な研究であるPhiがやはり圧倒的?

AkihikoWatanabe · 2024-11-22T03:14:58Z

Textbooks Are All You Need, Suriya Gunasekar+, N/A, arXiv'23 #766

も参照のこと

AkihikoWatanabe added the Pocket label Nov 22, 2024

AkihikoWatanabe changed the title あ Observational Scaling Laws and the Predictability of Language Model Performance, Yangjun Ruan+, arXiv'24 Nov 22, 2024

AkihikoWatanabe added Analysis Efficiency/SpeedUp NLP LanguageModel labels Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observational Scaling Laws and the Predictability of Language Model Performance, Yangjun Ruan+, arXiv'24 #1540

Observational Scaling Laws and the Predictability of Language Model Performance, Yangjun Ruan+, arXiv'24 #1540

AkihikoWatanabe commented Nov 22, 2024 •

edited

Loading

AkihikoWatanabe commented Nov 22, 2024 •

edited

Loading

AkihikoWatanabe commented Nov 22, 2024

Observational Scaling Laws and the Predictability of Language Model Performance, Yangjun Ruan+, arXiv'24 #1540

Observational Scaling Laws and the Predictability of Language Model Performance, Yangjun Ruan+, arXiv'24 #1540

Comments

AkihikoWatanabe commented Nov 22, 2024 • edited Loading

URL

Authors

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)

AkihikoWatanabe commented Nov 22, 2024 • edited Loading

AkihikoWatanabe commented Nov 22, 2024

AkihikoWatanabe commented Nov 22, 2024 •

edited

Loading

AkihikoWatanabe commented Nov 22, 2024 •

edited

Loading