2 Context and related work
Our object of study is the BERT model introduced in [6]. To set context and terminology, we briefly
describe the model’s architecture. The input to BERT is based on a sequence of tokens (words or
pieces of words). The output is a sequence of vectors, one for each input token. We will often refer to
these vectors as context embeddings because they include information about a token’s context.
BERT’s internals consist of two parts. First, an initial embedding for each t ...


雷达卡



京公网安备 11010802022788号







