SparseBERT: Rethinking the Importance Analysis in Self-attention
Han Shi 1 Jiahui Gao 2 Xiaozhe Ren 3 Hang Xu 3 Xiaodan Liang 4 Zhenguo Li 3 James T. Kwok 1
Abstract include the BERT (Devlin et al., 2019), which achieves state-
Transformer-based models are popularly used in of-the-art performance on a wide range of NLP tasks, and
natural language processing (NLP). Its core com- GPT-3 (Brown et al., 2020) which applies the Transformer’s
pone ...


雷达卡




京公网安备 11010802022788号







