1-bit Adam: Communication Efficient Large-Scale Training with Adam’s
Convergence Speed
Hanlin Tang 1 2 Shaoduo Gan 3 Ammar Ahmad Awan 1 Samyam Rajbhandari 1 Conglong Li 1 Xiangru Lian 2
Ji Liu 2 Ce Zhang 3 Yuxiong He 1
Abstract 1. Introduction
Scalable training of large models (like BERT Modern advancement of machine learning is heavily driven
and GPT-3) requires careful optimization rooted by the advancement of comp ...


雷达卡




京公网安备 11010802022788号







