Model Performance

1 Why do model evaluation?

What is Cross Validation?
- Assess how your model result will generalize to another independent data set.
- Predict and test on the same data is a methodological mistake
- There are several cross validation techniques, popular is k-fold cross validation

picture
TP/FN/FP/TN；True/False；Postive/Negative
- TP: true positive（真正例）; FN: false negative（假反例）(type II)
- FP: false positive（假正例type I）; TN: true negative（真反例）
- True/False means if you made a correct/wrong prediction
- Positive/Negative means what your prediction is/is not
Accuracy
- $\frac{TP+TN}{TP+TN+FP+FN}$
- 预测中你对的多少个
Precision
- $P=\frac{TP}{TP+FP}$
- 查准率precision：你说的准确到底有多少可以相信的
- 在所有你认为positive的数据中，有多少真的是positive？
- example, spam email：要求precision高（杀的必须准）
Recall Sensitivity
- $R=\frac{TP}{TP+FN}$
- （查全率recall：真的是不是都预测对了）查全率recall：真的是不是都预测对了
- 在所有positive的数据中，有多少被你正确地识别出来（是positive）
- disease/cybersecuirity：要求recall高( 宁可错杀)
- 如果negative很重要，你看recall，反之你看precision更关注positive
F1
- $F1=\frac{2TP}{2TP+FP+FN}=\frac{2}{\frac{1}{recall}+\frac{1}{precision}}$
- (可以统一recall&precision)
- 越高越好

Receiver operation characteristic curve（根据你的threshold来制定的，threshold是用来就判断是positive or negative，像logistic regression里，我高于0.8,判定为positive，当然你可以选0.6）
Define False Positive Rate as X axis, True Positive Rate as Y axis
The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold.
Special Points in ROC space
- best case(0,1); worst case:(1,0)
- 对角线上的点：
  - 当threshold设定为最高时，所有样本都被预测为negative，此时得到的点在(0,0).
  - 当threshold设定为最低时，所有样本都被预测为positive，此时得到的点在(1,1)
- why does Equal Error rate mean FPR = FNR?
  - 已知固定关系： FNR=1-TP/Number of real positive = 1- TPR
  - 根据图中焦点性质可知： FPR （x）= 1-TPR（y）
  - FPR=FNR

你想完整的表示前面的auc么？
- Area under the curve of ROC(AUC)
- AUC value: [0,1]
- The larger the value is, the better classification performance your classifier has.
- AUC value is a probability value.
面试题：机器学习里0-1的值，都希望有一个概率。怎么用概率来解释AUC？
- ROC AUC is the probability that a randomly-chosen positive example is ranked more highly than a randomly-chosen negative example.

不同的策略来解决减少failures（把你的failures的分类）
- 可以通过模型解决
- 可以通过调参解决
- 可以通过data解决(可能你少一部分data)
- 不能解决
failures analysis的目的就是进一步提高模型的性能，进行迭代开发，retraining
Summary
- cross validation——找骨架（model selection/model infrastructure）
- mixed validation and training data into training——找肉，例如y=ax+b里面的a和b
- model evaluation——ROC，AUC，Precision，Recall
- failure analysis——确定问题在哪里，然后更新step1-3

Last updated 11 months ago