Sampling/ABtesting/GradientMethod

Google Coding Summary

import numpy as np 
import scipy.stats as st
import seaborn as sns
import matplotlib.pyplot

Linear Regression

Linear Skilearn:X_b = np.c_[np.ones((100, 1)), X] 和theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

https://colab.research.google.com/drive/1MnNENQS5j2otC7quOGuqWGHg7DqCXG7g

####################linear regression sklearn#########################
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(X, y)
lin_reg.intercept_, lin_reg.coef_

theta_best_svd, residuals, rank, s = np.linalg.lstsq(X_b, y, rcond=1e-6)
theta_best_svd

np.linalg.pinv(X_b).dot(y)

###################Least Square ##########################
import numpy as np

X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
plt.plot(X, y, "b.")
plt.xlabel("$x_1$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.axis([0, 2, 0, 15])
save_fig("generated_data_plot")
plt.show()

#least square
X_b = np.c_[np.ones((100, 1)), X]  # add x0 = 1 to each instance 
#np.r_是按列连接两个矩阵,就是把两矩阵上下相加,要求列数相等。
#np.c_是按行连接两个矩阵,就是把两矩阵左右相加,要求行数相等。
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

#prediction
X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2, 1)), X_new]  # add x0 = 1 to each instance
y_predict = X_new_b.dot(theta_best)
y_predict

#plot
plt.plot(X_new, y_predict, "r-")
plt.plot(X, y, "b.")
plt.axis([0, 2, 0, 15])
plt.show()

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
##################Statstics Summary###########################

reg = smf.ols('mpg ~ cylinders + displacement + horsepower + weight + acceleration + year + origin', df).fit()
model_fit = reg.fit()
reg.summary()

reg.coef_
# fitted values (need a constant term for intercept)
model_fitted_y = model_fit.fittedvalues

# model residuals
model_residuals = model_fit.resid

# normalized residuals
model_norm_residuals = model_fit.get_influence().resid_studentized_internal

# absolute squared normalized residuals
model_norm_residuals_abs_sqrt = np.sqrt(np.abs(model_norm_residuals))

# absolute residuals
model_abs_resid = np.abs(model_residuals)

# leverage, from statsmodels internals
model_leverage = model_fit.get_influence().hat_matrix_diag

# cook's distance, from statsmodels internals
model_cooks = model_fit.get_influence().cooks_distance[0]

xfit = np.linspace(df.x.min(), df.x.max(), 100)
yfit = model.predict(xfit[:, np.newaxis])

Regularization model

https://colab.research.google.com/drive/1MnNENQS5j2otC7quOGuqWGHg7DqCXG7g

https://colab.research.google.com/drive/1gxFuyhHKM-HXF0uiqVXfKrwGCgRDmJLN

Sampling/Bootstrap

https://www.notion.so/yangnyc/Google-interview-solution-fe08edcd81d94e78804b5e6518ddd291

Sample Normal:np.random.normal和plt.hist

Bootstrap/Median/Standard Error/CI

Important Sampling:第一步st.norm.pdf(p(x)/q(x));第二步k;第三步z是normal_q;第三步u是uniform(0,k*q(z));第四步u≤p(z)

Inverse Sampling

Shuffling algorithm(从后往前,random.randint(0,i)&array[i]↔array[random_number])

Reservoir Sampling(让前面的random.randint(0, self.count-1))

Randomization(5*random5()+5&大于21重新random25())

Gradient Method

说白了就是你有n个sample,分别分几次使用,反正每次都用在降低原来的点上,你的gradient没变,只是每次realize出来变了

xnew=xoldηif(x) x_{new} = x_{old} -\eta_i \nabla f(x)

https://www.notion.so/yangnyc/HandsOnML-Chapter4-e5d0e68590e149b9af58dbfc0e4cfaaf

Batch/Gradient Method

Stochastic Gradient Method

MiniBatch Gradient Method

AB testing

https://colab.research.google.com/drive/1Fza8JhK1QbWEpfD0nOcmrPEZJ_Dn6TXI?authuser=1#scrollTo=uzyThp5zyP7M

Generate 100000 dice rolling results

CLT Central Limit Theorem

How to calculate HT as an Algorithm(norm.cdf)

Calculate the sample size(zt_ind_solve_power, proportion_effectsize)

Test for Independence(U, p, dof, expected = chi2_contingency(table, correction = False))

print(U) #卡方值

  1. Two discrete::Simpson paradox

  2. Two Continuous variables::look at their correlation

  3. One Continuous variable and one discrete: look at the test statistics D=supF^1(x)F^2(x)D=\sup \|\hat{F}_1(x)-\hat{F}_2(x)\|

p-value:

  • 是一个数字,越小,反应出现的情况越极端,基于现有的observation,在null hypothesis下

    • court,innocent,knife on the crime scene

statistical significance:

  • a decision, which concerning a value stated in the null hypothesis

hypothesis:

  • claim/population/parameter

  • H0H_0 claim reflect default ,ex no relation/no difference

  • H1H_1 claim about population/hypothesis H_0 false

significance level α\alpha:

  • probability, reject H0H_0 under H0H_0 is true

type I error(冤狱):

  • when H0H_0 is true, but rejected

  • {p-value | H_0 is true} ~ uniform(0,1)

  • type I error rate = Pr(reject H_0| H_0 is true) = Pr(p-value < significance level| H_0 is true)= significance level

type II error(漏网):

  • when H_1 is true, but rejected

  • β=Pr(not reject H0H1 is true)=1Pr(reject H0H1 is true)=1power\beta = Pr(not \ reject \ H_0| H_1 \ is \ true) = 1- Pr( reject \ H_0| H_1 \ is \ true)= 1- power

power(火眼金睛):

  • the probability of rejecting H_0, when H_1 is true

  • $power = 1-\beta$

Last updated