♎
Limited AI
  • Machine Learning
    • Linear Model Cheating Sheet
    • Nonlinear Model Cheating Sheet
    • General Linear Model 1
    • General Linear Model 2
    • General Linear Model 3
    • Tree Based Methods
    • Tree Based Methods Supplement
    • XG,Cat,Light__Boosting
    • KNN&PCA
    • Model Performance
    • Model Evaluation
    • Code Practice
      • KNN
      • Decision Tree Python Code
    • Data and Feature Engineering
      • Handle Bias Data
      • Cold Start Problem
  • Deep Learning
    • Summary v2
    • Basic Neural Network
      • From Linear to Deep
      • Perceptron and Activation Function
      • NN network Details
      • Backpropagation Details
      • Gradient Vanishing vs Gradient Exploding
    • Basic CNN
      • Why CNN
      • Filter/ Convolution Kernel and Its Operation
      • Padding& Stride
      • Layers
      • Extra:From Fully Connected Layers to Convolutions
      • Extra: Multiple Input and Multiple Output Channels
    • Advance CNN
      • Convolutional Neural Networks(LeNet)
      • Deep Convolution Neural Networks(AlexNet)
      • Networks Using Blocks (VGG)
      • Network in Network(NiN)
      • Multi-Branch Networks(GoogLeNet&I mageNet)
      • Residual Networks(ResNet) and ResNeXt
      • Densely Connected Networks(DenseNet)
      • Batch Normalization
    • Basic RNN
      • Seq Model
      • Raw Text to Seq
      • Language Models
      • Recurrent Neural Networks(RNN)
      • Backpropagation Through Time
    • Advance RNN
      • Gated Recurrent Units(GRU)
      • Long Short-Term Memory(LSTM)
      • Bidirectional Recurrent Neural Networks(BRNN)
      • Encoder-Decoder Architecture
      • Seuqence to Sequence Learning(Seq2Seq)
    • Attention Mechanisms and Transformers
      • Queries, Keys, and Values
      • Attention is all you need
        • Attention and Kernel
        • Attention Scoring Functions
        • The Bahdanau Attention Mechanism
        • Multi-Head Attention
        • Self-Attention
        • Attention的实现
      • The Transformer Architecture
        • Extra Reading
        • 最短的最大路径长度
      • Large-Scaling Pretraning with Transformers
        • BERT vs OpenAI GPT vs ELMo
        • Decoder Model框架
        • Bert vs XLNet
        • T5& GPT& Bert比较
        • 编码器-解码器架构 vs GPT 模型
        • Encoder vs Decoder Reference
      • Transformers for Vision
      • Transformer for Multiomodal
    • NLP Pretraining
      • Word Embedding(word2vec)
        • Extra Reading
      • Approximate Training
      • Word Embedding with Global Vectors(GloVe)
        • Extra Reading
        • Supplement
      • Encoder(BERT)
        • BERT
        • Extra Reading
      • Decoder(GPT&XLNet&Lamma)
        • GPT
        • XLNet
          • XLNet架构
          • XLNet特点与其他比较
      • Encoder-Decoder(BART& T5)
        • BART
        • T5
  • GenAI
    • Introduction
      • GenAI Paper Must Read
      • GenAI六个阶段
    • Language Models Pre-training
      • Encoder-Decoder Architecture
      • Encoder Deep Dive
      • Decoder Deep Dive
      • Encoder VS Decoder
      • Attention Mechanism
      • Transformers
    • Example: Llama 3 8B架构
    • Fine-Tuning Generation Models
    • RAG and Adavance RAG
    • AI Agent
  • Statistics and Optimization
    • A/B testing
    • Sampling/ABtesting/GradientMethod
    • Gradient Decent Deep Dive
  • Machine Learning System Design
    • Extra Reading
    • Introduction
  • Responsible AI
    • AI Risk and Uncertainty
      • What is AI risk
      • General Intro for Uncertainty Quantification
      • Calibration
      • Conformal Prediction
        • Review the linear regression
        • Exchangeability
        • Split Conformal Prediction
        • Conformalized Quantile Regression
        • Beyond marginal coverage
        • Split Conformal Classification
        • Full Conformal Coverage
        • Cross-Validation +
        • Conformal Histgram Regression
    • xAI
      • SHAP value
  • Extra Research
    • Paper Reading
    • Reference
Powered by GitBook
On this page
  • Overfitting
  • what is overfitting?
  • 什么情况出现overfitting?
  • bias/variance/error/model error
  • Model Error
  • what is model error?
  • How to solve the overfitting problem?
  • Regularization
  • Ridge Regression: L2 penalty in loss function
  • LASSO: L1 penalty in loss function
  • Hyperparameter Optimization
  • 一些调参数的过程的思考
  1. Machine Learning

Model Evaluation

Overfitting

what is overfitting?

  • 如果模型在训练数据上表现很好,反而在未知的数据表现的不太好,这就出于overfitting(这里是连接training和)

什么情况出现overfitting?

  • Q1: Is the more complicated the better?

  • Q2: In classification problems. are our features the more the better?(features多余你的data)

  • Fundamental causes of overfittng:

bias/variance/error/model error

  • 学生的考试能力 = 你的平均水平(与这一次没有关系)+这一次的发挥来说对你多正常

  • 模型的精准性bias:

    • 第一层次的理解:模型输出结果与真实值之间的差距。错误!!!

    • 第二层次的理解:这个model在训练数据有变化下的平均输出结果与真实值相比,得到的平均准确性

  • 模型的稳定性variance:

    • 第一层次的理解:模型输出结果的稳定性

    • 第二层次的理解:’某一次model的数据结果与这次model的平均水平的差距‘的平方的期望

  • Q:那你怎么从一个training的bias 和variance去看model本身的bias和variance呢?

    • trade off?

Model Error

what is model error?

  • 是在test error上

  • loss function 是用来构造模型;model error 是用来检查模型效果的

How to solve the overfitting problem?

  • Increase traning data size

  • avoid over-traning your dataset:

    • Filter out features, e.g. feature reduction

      • Principal component analysis(PCA)

    • Regularization

      • Ridge regression, Least absolute shrinkage and selction operation(LASSO)

      • Logistic Regression-L2, Logistic Regression-L1

    • Ensemble Learning

Regularization

情况

Ridge Regression: L2 penalty in loss function

  • 这样的penalty公式中的 就是hyperparameter:

    • 可以让方差和偏差达到平衡:增大,模型方差(variance)减少,偏差增大(bias)

LASSO: L1 penalty in loss function

Hyperparameter Optimization

  • 这样的penalty公式中的λ\lambdaλ​​​​​​​就是hyperparameter:

    • λ\lambdaλ​​​​​​​可以让方差和偏差达到平衡:增大,模型方差(variance)减少,偏差增大(bias)

    • λ\lambdaλ​​​​​​​相当于你在用来控制你的参数a和b

  • 总结:

    • L1::feature selection regulariation

    • L2:是correlation的情况处理的比较好

    • model error 不存在training上,mse是用来测试机器好不好,所以不是不用Regularization

    • L1和L2一般就用在linear 和logistics上现在,不过Regularization这个概念在其他地方都有

一些调参数的过程的思考

PreviousModel PerformanceNextCode Practice

Last updated 1 year ago