♎
Limited AI
  • Machine Learning
    • Linear Model Cheating Sheet
    • Nonlinear Model Cheating Sheet
    • General Linear Model 1
    • General Linear Model 2
    • General Linear Model 3
    • Tree Based Methods
    • Tree Based Methods Supplement
    • XG,Cat,Light__Boosting
    • KNN&PCA
    • Model Performance
    • Model Evaluation
    • Code Practice
      • KNN
      • Decision Tree Python Code
    • Data and Feature Engineering
      • Handle Bias Data
      • Cold Start Problem
  • Deep Learning
    • Summary v2
    • Basic Neural Network
      • From Linear to Deep
      • Perceptron and Activation Function
      • NN network Details
      • Backpropagation Details
      • Gradient Vanishing vs Gradient Exploding
    • Basic CNN
      • Why CNN
      • Filter/ Convolution Kernel and Its Operation
      • Padding& Stride
      • Layers
      • Extra:From Fully Connected Layers to Convolutions
      • Extra: Multiple Input and Multiple Output Channels
    • Advance CNN
      • Convolutional Neural Networks(LeNet)
      • Deep Convolution Neural Networks(AlexNet)
      • Networks Using Blocks (VGG)
      • Network in Network(NiN)
      • Multi-Branch Networks(GoogLeNet&I mageNet)
      • Residual Networks(ResNet) and ResNeXt
      • Densely Connected Networks(DenseNet)
      • Batch Normalization
    • Basic RNN
      • Seq Model
      • Raw Text to Seq
      • Language Models
      • Recurrent Neural Networks(RNN)
      • Backpropagation Through Time
    • Advance RNN
      • Gated Recurrent Units(GRU)
      • Long Short-Term Memory(LSTM)
      • Bidirectional Recurrent Neural Networks(BRNN)
      • Encoder-Decoder Architecture
      • Seuqence to Sequence Learning(Seq2Seq)
    • Attention Mechanisms and Transformers
      • Queries, Keys, and Values
      • Attention is all you need
        • Attention and Kernel
        • Attention Scoring Functions
        • The Bahdanau Attention Mechanism
        • Multi-Head Attention
        • Self-Attention
        • Attention的实现
      • The Transformer Architecture
        • Extra Reading
        • 最短的最大路径长度
      • Large-Scaling Pretraning with Transformers
        • BERT vs OpenAI GPT vs ELMo
        • Decoder Model框架
        • Bert vs XLNet
        • T5& GPT& Bert比较
        • 编码器-解码器架构 vs GPT 模型
        • Encoder vs Decoder Reference
      • Transformers for Vision
      • Transformer for Multiomodal
    • NLP Pretraining
      • Word Embedding(word2vec)
        • Extra Reading
      • Approximate Training
      • Word Embedding with Global Vectors(GloVe)
        • Extra Reading
        • Supplement
      • Encoder(BERT)
        • BERT
        • Extra Reading
      • Decoder(GPT&XLNet&Lamma)
        • GPT
        • XLNet
          • XLNet架构
          • XLNet特点与其他比较
      • Encoder-Decoder(BART& T5)
        • BART
        • T5
  • GenAI
    • Introduction
      • GenAI Paper Must Read
      • GenAI六个阶段
    • Language Models Pre-training
      • Encoder-Decoder Architecture
      • Encoder Deep Dive
      • Decoder Deep Dive
      • Encoder VS Decoder
      • Attention Mechanism
      • Transformers
    • Example: Llama 3 8B架构
    • Fine-Tuning Generation Models
    • RAG and Adavance RAG
    • AI Agent
  • Statistics and Optimization
    • A/B testing
    • Sampling/ABtesting/GradientMethod
    • Gradient Decent Deep Dive
  • Machine Learning System Design
    • Extra Reading
    • Introduction
  • Responsible AI
    • AI Risk and Uncertainty
      • What is AI risk
      • General Intro for Uncertainty Quantification
      • Calibration
      • Conformal Prediction
        • Review the linear regression
        • Exchangeability
        • Split Conformal Prediction
        • Conformalized Quantile Regression
        • Beyond marginal coverage
        • Split Conformal Classification
        • Full Conformal Coverage
        • Cross-Validation +
        • Conformal Histgram Regression
    • xAI
      • SHAP value
  • Extra Research
    • Paper Reading
    • Reference
Powered by GitBook
On this page
  • 1. 线性回归的基本要素
  • 线性模型
  • 损失函数
  • 解析解&随机梯度下降
  • 2. 矢量化
  • 3. 正态分布与平方损失
  • 4. 从线性回归到深度网络
  • 神经网络图
  • 生物学
  • 5. Softmax回归
  • 分类问题
  • 网络架构
  • 前连接层的参数开销
  • Softmax运算
  • 小批量样本的矢量化
  • 损失函数
  • 信息论基础
  • 模型预测和评估
  1. Deep Learning
  2. Basic Neural Network

From Linear to Deep

1. 线性回归的基本要素

线性模型

损失函数

解析解&随机梯度下降

2. 矢量化

3. 正态分布与平方损失

4. 从线性回归到深度网络

神经网络图

  • 需要注意的是,输入值都是已经给定的,并且只有一个计算神经元。 由于模型重点在发生计算的地方,所以通常我们在计算层数时不考虑输入层。

  • 我们可以将线性回归模型视为仅由单个人工神经元组成的神经网络,或称为单层神经网络。

  • 对于线性回归,每个输入都与每个输出(在本例中只有一个输出)相连, 我们将这种变换 称为全连接层(fully-connected layer)或称为稠密层(dense layer)。 下一章将详细讨论由这些层组成的网络。

生物学

5. Softmax回归

分类问题

网络架构

前连接层的参数开销

Softmax运算

小批量样本的矢量化

损失函数

对数似然

softmax及其导数

交叉熵损失

信息论基础

熵

信息量

模型预测和评估

Reference:d2l

PreviousBasic Neural NetworkNextPerceptron and Activation Function

Last updated 8 months ago