Attention Mechanism
Attention Mechanism (注意力机制)
The attention mechanism helps address problems found in the RNN-based encoder-decoder setup. As illustrated in Fig. 2.2, an attention mechanism is like a memory bank. When queried, it produces an output based on stored keys and values (Bahdanau et al., 2014).
注意力机制有助于解决 RNN 结构中编码器-解码器的相关问题。如图 2.2 所示,注意力机制类似于一个记忆库。当查询时,它根据存储的键(keys)和值(values)生成输出(Bahdanau et al., 2014)。
Attention Formulation (注意力机制公式)
Let us consider the memory unit consisting of key-value pairs with and . The attention layer receives an input as query and returns an output with the same shape as the value .
我们考虑一个存储单元,由 组键值对 组成,其中 , 。注意力层接受一个查询向量 ,并返回一个输出 ,其形状与值向量 相同。
The attention layer measures the similarity between the query and the key using a score function , which returns scores for keys given by:
注意力层使用一个评分函数 来计算查询向量与键向量之间的相似性,返回键 的评分 :
Attention weights are computed as a softmax function on the scores:
注意力权重通过对评分进行 Softmax 计算得到:
Each element of is computed as follows:
向量 的每个元素计算如下:
The output is the weighted sum of the attention weights and the values:
最终输出是注意力权重与值的加权求和:
Scaled Dot-Product Attention (缩放点积注意力)
The score function exists in various forms, leading to multiple types of attention mechanisms. The dot product-based scoring function is the simplest, requiring no tunable parameters. A variation, the scaled dot product, normalizes this by to mitigate the impact of increasing dimensions (Luong et al., 2015; Vaswani et al., 2017).
评分函数 具有多种形式,导致了不同类型的注意力机制。基于点积的评分函数是最简单的,并不需要可调参数。一个变体是缩放点积注意力,它通过除以 进行归一化,以减小维度增加带来的影响(Luong et al., 2015; Vaswani et al., 2017)。
Self-Attention (自注意力)
In self-attention, each input vector is projected onto three distinct vectors: query , key , and value . These projections are performed via learnable weight matrices , resulting in:
在自注意力机制中,每个输入向量 被投影到三个不同的向量:查询向量 、键向量 和值向量 。这些投影由可学习的权重矩阵 进行变换:
These weight matrices are initialized randomly and optimized during training.
这些权重矩阵在训练过程中随机初始化并进行优化。
The simplified matrix representation with each of the query, key, and value matrices as a single computation is given by:
整个注意力机制的矩阵形式表达如下:
Last updated