♎
Limited AI
  • Machine Learning
    • Linear Model Cheating Sheet
    • Nonlinear Model Cheating Sheet
    • General Linear Model 1
    • General Linear Model 2
    • General Linear Model 3
    • Tree Based Methods
    • Tree Based Methods Supplement
    • XG,Cat,Light__Boosting
    • KNN&PCA
    • Model Performance
    • Model Evaluation
    • Code Practice
      • KNN
      • Decision Tree Python Code
    • Data and Feature Engineering
      • Handle Bias Data
      • Cold Start Problem
  • Deep Learning
    • Summary v2
    • Basic Neural Network
      • From Linear to Deep
      • Perceptron and Activation Function
      • NN network Details
      • Backpropagation Details
      • Gradient Vanishing vs Gradient Exploding
    • Basic CNN
      • Why CNN
      • Filter/ Convolution Kernel and Its Operation
      • Padding& Stride
      • Layers
      • Extra:From Fully Connected Layers to Convolutions
      • Extra: Multiple Input and Multiple Output Channels
    • Advance CNN
      • Convolutional Neural Networks(LeNet)
      • Deep Convolution Neural Networks(AlexNet)
      • Networks Using Blocks (VGG)
      • Network in Network(NiN)
      • Multi-Branch Networks(GoogLeNet&I mageNet)
      • Residual Networks(ResNet) and ResNeXt
      • Densely Connected Networks(DenseNet)
      • Batch Normalization
    • Basic RNN
      • Seq Model
      • Raw Text to Seq
      • Language Models
      • Recurrent Neural Networks(RNN)
      • Backpropagation Through Time
    • Advance RNN
      • Gated Recurrent Units(GRU)
      • Long Short-Term Memory(LSTM)
      • Bidirectional Recurrent Neural Networks(BRNN)
      • Encoder-Decoder Architecture
      • Seuqence to Sequence Learning(Seq2Seq)
    • Attention Mechanisms and Transformers
      • Queries, Keys, and Values
      • Attention is all you need
        • Attention and Kernel
        • Attention Scoring Functions
        • The Bahdanau Attention Mechanism
        • Multi-Head Attention
        • Self-Attention
        • Attention的实现
      • The Transformer Architecture
        • Extra Reading
        • 最短的最大路径长度
      • Large-Scaling Pretraning with Transformers
        • BERT vs OpenAI GPT vs ELMo
        • Decoder Model框架
        • Bert vs XLNet
        • T5& GPT& Bert比较
        • 编码器-解码器架构 vs GPT 模型
        • Encoder vs Decoder Reference
      • Transformers for Vision
      • Transformer for Multiomodal
    • NLP Pretraining
      • Word Embedding(word2vec)
        • Extra Reading
      • Approximate Training
      • Word Embedding with Global Vectors(GloVe)
        • Extra Reading
        • Supplement
      • Encoder(BERT)
        • BERT
        • Extra Reading
      • Decoder(GPT&XLNet&Lamma)
        • GPT
        • XLNet
          • XLNet架构
          • XLNet特点与其他比较
      • Encoder-Decoder(BART& T5)
        • BART
        • T5
  • GenAI
    • Introduction
      • GenAI Paper Must Read
      • GenAI六个阶段
    • Language Models Pre-training
      • Encoder-Decoder Architecture
      • Encoder Deep Dive
      • Decoder Deep Dive
      • Encoder VS Decoder
      • Attention Mechanism
      • Transformers
    • Example: Llama 3 8B架构
    • Fine-Tuning Generation Models
    • RAG and Adavance RAG
    • AI Agent
  • Statistics and Optimization
    • A/B testing
    • Sampling/ABtesting/GradientMethod
    • Gradient Decent Deep Dive
  • Machine Learning System Design
    • Extra Reading
    • Introduction
  • Responsible AI
    • AI Risk and Uncertainty
      • What is AI risk
      • General Intro for Uncertainty Quantification
      • Calibration
      • Conformal Prediction
        • Review the linear regression
        • Exchangeability
        • Split Conformal Prediction
        • Conformalized Quantile Regression
        • Beyond marginal coverage
        • Split Conformal Classification
        • Full Conformal Coverage
        • Cross-Validation +
        • Conformal Histgram Regression
    • xAI
      • SHAP value
  • Extra Research
    • Paper Reading
    • Reference
Powered by GitBook
On this page
  1. Deep Learning

Summary v2

  • NN

    • Perceptron & Two Layer NN

    • hidden layer& activation function& softmax

    • details

      • regularization& dropout

      • initialization

      • batch normalization

    • backpropagation

    • gradient vanish& exploding

  • Basic CNN

    • convolution

    • different types of kernel

      • identify

      • smoothing

      • sharpening

      • edge

    • padding& stride

    • layers

      • convolution

      • max/average pooling

      • fully connect

      • Non-linearity and ReLU Layer

  • Adavance CNN

    • LeNet

    • AlexNet

    • VGG

    • NiN

    • Multi-Branch Networks(GoogLeNet&I mageNet)

    • Residual Networks(ResNet) and ResNeXt

    • Densely Connected Networks(DenseNet)

  • Basic RNN

    • Seq Model

      • AR

      • Morkov

    • Text to Seq

      • tokenization

      • vocabulary

    • Language Model

      • Markov Model & N-grams

      • Word Frequency

      • Laplace Smoothing

      • Perplexity

      • Partition Sequence(random sampling& sequential partitioning)

    • RNN

      • recurrence formula

      • what is depth & time

        • one to one

        • one to many

        • many to one

        • many to many

        • seq to seq(many to one)

        • seq to seq(many to one + one to many)

      • without hidden state

      • with hidden state

    • backpropagation through time

      • trancate backpropagation through time

  • Advance RNN

    • Gated Recurrent Units(GRU)

    • Long Short-Term Memory(LSTM)

    • Bidirectional Recurrent Neural Networks(BRNN)

    • Encoder-Decoder Architecture

    • Seuqence to Sequence Learning(Seq2Seq)

  • Attention Mechanism and Transformers

    • Queries, Keys, and Values

    • Attention is all you need

      • Attention and Kernel

      • Attention Scoring Function

      • The Bahdanau Attention

    • Multi-Head Attention

    • Self-Attention

    • The Transformer Architecture

    • Transformers for NLP

    • Transformers for Vision& Multimodal

  • LLM Pretraining

    • Word Embedding(word2vec)

    • Approximate Training

    • Word Embedding with Global Vectors(GloVe)

    • Encoder(BERT)

    • Decoding(GPT&XLNet&Lamma)

    • Encoder-Decoder(BART& T5)

PreviousDeep LearningNextBasic Neural Network

Last updated 8 months ago