You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli
Sampling
Zhanpeng Zeng 1 Yunyang Xiong 1 Sathya N. Ravi 2 Shailesh Acharya 3 Glenn Fung 3 Vikas Singh 1
Abstract language inference (Devlin et al., 2019) and paraphrasing
Transformer-based models are widely used in nat- (Raffel et al., 2020). Transformer-based models such as
ural language processing (NLP). Central to the BERT (Devlin et al., 2019) are pret ...


雷达卡




京公网安备 11010802022788号







