Summary v2
NN
Perceptron & Two Layer NN
hidden layer& activation function& softmax
details
regularization& dropout
initialization
batch normalization
backpropagation
gradient vanish& exploding
Basic CNN
convolution
different types of kernel
identify
smoothing
sharpening
edge
padding& stride
layers
convolution
max/average pooling
fully connect
Non-linearity and ReLU Layer
Adavance CNN
LeNet
AlexNet
VGG
NiN
Multi-Branch Networks(GoogLeNet&I mageNet)
Residual Networks(ResNet) and ResNeXt
Densely Connected Networks(DenseNet)
Basic RNN
Seq Model
AR
Morkov
Text to Seq
tokenization
vocabulary
Language Model
Markov Model & N-grams
Word Frequency
Laplace Smoothing
Perplexity
Partition Sequence(random sampling& sequential partitioning)
RNN
recurrence formula
what is depth & time
one to one
one to many
many to one
many to many
seq to seq(many to one)
seq to seq(many to one + one to many)
without hidden state
with hidden state
backpropagation through time
trancate backpropagation through time
Advance RNN
Gated Recurrent Units(GRU)
Long Short-Term Memory(LSTM)
Bidirectional Recurrent Neural Networks(BRNN)
Encoder-Decoder Architecture
Seuqence to Sequence Learning(Seq2Seq)
Attention Mechanism and Transformers
Queries, Keys, and Values
Attention is all you need
Attention and Kernel
Attention Scoring Function
The Bahdanau Attention
Multi-Head Attention
Self-Attention
The Transformer Architecture
Transformers for NLP
Transformers for Vision& Multimodal
LLM Pretraining
Word Embedding(word2vec)
Approximate Training
Word Embedding with Global Vectors(GloVe)
Encoder(BERT)
Decoding(GPT&XLNet&Lamma)
Encoder-Decoder(BART& T5)
Last updated