Summary v2

  • NN

    • Perceptron & Two Layer NN

    • hidden layer& activation function& softmax

    • details

      • regularization& dropout

      • initialization

      • batch normalization

    • backpropagation

    • gradient vanish& exploding

  • Basic CNN

    • convolution

    • different types of kernel

      • identify

      • smoothing

      • sharpening

      • edge

    • padding& stride

    • layers

      • convolution

      • max/average pooling

      • fully connect

      • Non-linearity and ReLU Layer

  • Adavance CNN

    • LeNet

    • AlexNet

    • VGG

    • NiN

    • Multi-Branch Networks(GoogLeNet&I mageNet)

    • Residual Networks(ResNet) and ResNeXt

    • Densely Connected Networks(DenseNet)

  • Basic RNN

    • Seq Model

      • AR

      • Morkov

    • Text to Seq

      • tokenization

      • vocabulary

    • Language Model

      • Markov Model & N-grams

      • Word Frequency

      • Laplace Smoothing

      • Perplexity

      • Partition Sequence(random sampling& sequential partitioning)

    • RNN

      • recurrence formula

      • what is depth & time

        • one to one

        • one to many

        • many to one

        • many to many

        • seq to seq(many to one)

        • seq to seq(many to one + one to many)

      • without hidden state

      • with hidden state

    • backpropagation through time

      • trancate backpropagation through time

  • Advance RNN

    • Gated Recurrent Units(GRU)

    • Long Short-Term Memory(LSTM)

    • Bidirectional Recurrent Neural Networks(BRNN)

    • Encoder-Decoder Architecture

    • Seuqence to Sequence Learning(Seq2Seq)

  • Attention Mechanism and Transformers

    • Queries, Keys, and Values

    • Attention is all you need

      • Attention and Kernel

      • Attention Scoring Function

      • The Bahdanau Attention

    • Multi-Head Attention

    • Self-Attention

    • The Transformer Architecture

    • Transformers for NLP

    • Transformers for Vision& Multimodal

  • LLM Pretraining

    • Word Embedding(word2vec)

    • Approximate Training

    • Word Embedding with Global Vectors(GloVe)

    • Encoder(BERT)

    • Decoding(GPT&XLNet&Lamma)

    • Encoder-Decoder(BART& T5)

Last updated