Summary v2

NN
- Perceptron & Two Layer NN
- hidden layer& activation function& softmax
- details
  - regularization& dropout
  - initialization
  - batch normalization
- backpropagation
- gradient vanish& exploding
Basic CNN
- convolution
- different types of kernel
  - identify
  - smoothing
  - sharpening
  - edge
- padding& stride
- layers
  - convolution
  - max/average pooling
  - fully connect
  - Non-linearity and ReLU Layer
Adavance CNN
- LeNet
- AlexNet
- VGG
- NiN
- Multi-Branch Networks(GoogLeNet&I mageNet)
- Residual Networks(ResNet) and ResNeXt
- Densely Connected Networks(DenseNet)
Basic RNN
- Seq Model
  - AR
  - Morkov
- Text to Seq
  - tokenization
  - vocabulary
- Language Model
  - Markov Model & N-grams
  - Word Frequency
  - Laplace Smoothing
  - Perplexity
  - Partition Sequence(random sampling& sequential partitioning)
- RNN
  - recurrence formula
  - what is depth & time
    one to one
    one to many
    many to one
    many to many
    seq to seq(many to one)
    seq to seq(many to one + one to many)
  - without hidden state
  - with hidden state
- backpropagation through time
  - trancate backpropagation through time
Advance RNN
- Gated Recurrent Units(GRU)
- Long Short-Term Memory(LSTM)
- Bidirectional Recurrent Neural Networks(BRNN)
- Encoder-Decoder Architecture
- Seuqence to Sequence Learning(Seq2Seq)
Attention Mechanism and Transformers
- Queries, Keys, and Values
- Attention is all you need
  - Attention and Kernel
  - Attention Scoring Function
  - The Bahdanau Attention
- Multi-Head Attention
- Self-Attention
- The Transformer Architecture
- Transformers for NLP
- Transformers for Vision& Multimodal
LLM Pretraining
- Word Embedding(word2vec)
- Approximate Training
- Word Embedding with Global Vectors(GloVe)
- Encoder(BERT)
- Decoding(GPT&XLNet&Lamma)
- Encoder-Decoder(BART& T5)

PreviousDeep Learning NextBasic Neural Network

Last updated 1 year ago