♎
Limited AI
  • Machine Learning
    • Linear Model Cheating Sheet
    • Nonlinear Model Cheating Sheet
    • General Linear Model 1
    • General Linear Model 2
    • General Linear Model 3
    • Tree Based Methods
    • Tree Based Methods Supplement
    • XG,Cat,Light__Boosting
    • KNN&PCA
    • Model Performance
    • Model Evaluation
    • Code Practice
      • KNN
      • Decision Tree Python Code
    • Data and Feature Engineering
      • Handle Bias Data
      • Cold Start Problem
  • Deep Learning
    • Summary v2
    • Basic Neural Network
      • From Linear to Deep
      • Perceptron and Activation Function
      • NN network Details
      • Backpropagation Details
      • Gradient Vanishing vs Gradient Exploding
    • Basic CNN
      • Why CNN
      • Filter/ Convolution Kernel and Its Operation
      • Padding& Stride
      • Layers
      • Extra:From Fully Connected Layers to Convolutions
      • Extra: Multiple Input and Multiple Output Channels
    • Advance CNN
      • Convolutional Neural Networks(LeNet)
      • Deep Convolution Neural Networks(AlexNet)
      • Networks Using Blocks (VGG)
      • Network in Network(NiN)
      • Multi-Branch Networks(GoogLeNet&I mageNet)
      • Residual Networks(ResNet) and ResNeXt
      • Densely Connected Networks(DenseNet)
      • Batch Normalization
    • Basic RNN
      • Seq Model
      • Raw Text to Seq
      • Language Models
      • Recurrent Neural Networks(RNN)
      • Backpropagation Through Time
    • Advance RNN
      • Gated Recurrent Units(GRU)
      • Long Short-Term Memory(LSTM)
      • Bidirectional Recurrent Neural Networks(BRNN)
      • Encoder-Decoder Architecture
      • Seuqence to Sequence Learning(Seq2Seq)
    • Attention Mechanisms and Transformers
      • Queries, Keys, and Values
      • Attention is all you need
        • Attention and Kernel
        • Attention Scoring Functions
        • The Bahdanau Attention Mechanism
        • Multi-Head Attention
        • Self-Attention
        • Attention的实现
      • The Transformer Architecture
        • Extra Reading
        • 最短的最大路径长度
      • Large-Scaling Pretraning with Transformers
        • BERT vs OpenAI GPT vs ELMo
        • Decoder Model框架
        • Bert vs XLNet
        • T5& GPT& Bert比较
        • 编码器-解码器架构 vs GPT 模型
        • Encoder vs Decoder Reference
      • Transformers for Vision
      • Transformer for Multiomodal
    • NLP Pretraining
      • Word Embedding(word2vec)
        • Extra Reading
      • Approximate Training
      • Word Embedding with Global Vectors(GloVe)
        • Extra Reading
        • Supplement
      • Encoder(BERT)
        • BERT
        • Extra Reading
      • Decoder(GPT&XLNet&Lamma)
        • GPT
        • XLNet
          • XLNet架构
          • XLNet特点与其他比较
      • Encoder-Decoder(BART& T5)
        • BART
        • T5
  • GenAI
    • Introduction
      • GenAI Paper Must Read
      • GenAI六个阶段
    • Language Models Pre-training
      • Encoder-Decoder Architecture
      • Encoder Deep Dive
      • Decoder Deep Dive
      • Encoder VS Decoder
      • Attention Mechanism
      • Transformers
    • Example: Llama 3 8B架构
    • Fine-Tuning Generation Models
    • RAG and Adavance RAG
    • AI Agent
  • Statistics and Optimization
    • A/B testing
    • Sampling/ABtesting/GradientMethod
    • Gradient Decent Deep Dive
  • Machine Learning System Design
    • Extra Reading
    • Introduction
  • Responsible AI
    • AI Risk and Uncertainty
      • What is AI risk
      • General Intro for Uncertainty Quantification
      • Calibration
      • Conformal Prediction
        • Review the linear regression
        • Exchangeability
        • Split Conformal Prediction
        • Conformalized Quantile Regression
        • Beyond marginal coverage
        • Split Conformal Classification
        • Full Conformal Coverage
        • Cross-Validation +
        • Conformal Histgram Regression
    • xAI
      • SHAP value
  • Extra Research
    • Paper Reading
    • Reference
Powered by GitBook
On this page
  1. Deep Learning
  2. Basic RNN

Seq Model

Seq Model

  • 其中,用xtx_txt​表示价格,即在时间步(time step) t∈Z+t \in \mathbb{Z}^{+}t∈Z+时,观察到的价格xtx_txt​。 请注意,ttt对于本文中的序列通常是离散的,并在整数或其子集上变化。 假设一个交易员想在ttt日的股市中表现良好,于是通过以下途径预测xtx_txt​:

  • xt∼P(xt∣xt−1,⋯ ,x1)x_t \sim P(x_t | x_{t-1}, \cdots, x_1)xt​∼P(xt​∣xt−1​,⋯,x1​)

Autoregressive model(AR)

  • 第一种策略,假设在现实情况下相当长的序列 xt−1,⋯ ,x1x_{t-1}, \cdots, x_1xt−1​,⋯,x1​可能是不必要的, 因此我们只需要满足某个长度为τ\tauτ的时间跨度, 即使用观测序列xt−1,⋯ ,xt−τx_{t-1}, \cdots, x_{t-\tau}xt−1​,⋯,xt−τ​。 当下获得的最直接的好处就是参数的数量总是不变的, 至少在t>τt > \taut>τ时如此,这就使我们能够训练一个上面提及的深度网络。 这种模型被称为自回归模型(autoregressive models), 因为它们是对自己执行回归。

  • 第二种策略, 是保留一些对过去观测的总结hth_tht​, 并且同时更新预测xt^\hat{x_t}xt​^​和总结hth_tht​。 这就产生了基于xt^=P(xt∣ht)\hat{x_t} = P(x_t | h_t)xt​^​=P(xt​∣ht​)估计xtx_txt​, 以及公式ht=g(ht−1,xt−1)h_t = g(h_{t-1}, x_{t-1})ht​=g(ht−1​,xt−1​)更新的模型。 由于hth_tht​从未被观测到,这类模型也被称为 隐变量自回归模型(latent autoregressive models)。

  • 这两种情况都有一个显而易见的问题:如何生成训练数据? 一个经典方法是使用历史观测来预测下一个未来观测。 显然,我们并不指望时间会停滞不前。

  • 然而,一个常见的假设是虽然特定值xtx_txt​可能会改变, 但是序列本身的动力学不会改变。 这样的假设是合理的,因为新的动力学一定受新的数据影响, 而我们不可能用目前所掌握的数据来预测新的动力学。 统计学家称不变的动力学为静止的(stationary)。 因此,整个序列的估计值都将通过以下的方式获得:

    P(x1,⋯ ,xT)=Πt=1T(xt∣xt−1,⋯ ,x1)P(x_1, \cdots, x_{T}) = \Pi_{t= 1}^{T}(x_t|x_{t-1}, \cdots, x_1)P(x1​,⋯,xT​)=Πt=1T​(xt​∣xt−1​,⋯,x1​)

Markov Model

  • 我们使用xt−1,⋯ ,xt−τx_{t-1}, \cdots, x_{t-\tau}xt−1​,⋯,xt−τ​ 而不是xt−1,⋯ ,x1x_{t-1},\cdots, x_1xt−1​,⋯,x1​来估计xtx_txt​。 只要这种是近似精确的,我们就说序列满足马尔可夫条件(Markov condition)。 特别是,如果τ=1\tau = 1τ=1,得到一个 一阶马尔可夫模型(first-order Markov model

  • 用P(xt+1∣xt−1)=∑xtP(xt+1∣xt)P(xt∣xt−1)P(x_{t+1} | x_{t-1}) = \sum_{x_t} P(x_{t +1}|x_{t})P(x_t|x_{t-1})P(xt+1​∣xt−1​)=∑xt​​P(xt+1​∣xt​)P(xt​∣xt−1​)

Causality

  • you cannot inverse time

PreviousBasic RNNNextRaw Text to Seq

Last updated 9 months ago