GD_and_SGD | thuxuxs

机器学习中的梯度下降算法和随机梯度下降算法

梯度下降

n个样本点$(x_i,y_i)$，待拟合的函数为$h(x,w)$，其中$w$为函数的参数，误差函数为$J(w)=\sum_i (y_i-h(x_i,w))^2/2$，我们的任务是找出参数$w$，使得误差函数最小。沿着误差函数的负梯度方向改变参数即可。
$$\begin{aligned}
w&:=w-\eta\nabla J\\
&:=w+\eta\sum_i(y_i-h(x_i,w))\nabla h(x_i,w)
\end{aligned}$$
其中$\eta$为学习速度。

from pylab import *
def generate(w1,w2,_fun,p):
    x=array(linspace(-10,10,100))
    r=array((random_sample(len(x))-0.5)*2)
    y=_fun(x,p)+r
    return x,y
def fun(x,p):
    w1,w2=p
    return w1+w2*x
def jcb(x,p):
    w1,w2=p
    out=zeros([len(x),len(p)])
    out[:,0]=1
    out[:,1]=x
    return out
def gd(x,y,_fun,_jcb,eta):
    p=array([0,0])
    pp=[]
    for step in range(200):
        p=p+eta*dot(y-_fun(x,p),jcb(x,p))
        pp.append(p)
        # plot(x,fun(x,p),label=str(step))
    return p,array(pp)
w1=3
w2=1.5
x,y=generate(w1,w2,fun,[w1,w2])
p,pp=gd(x,y,fun,jcb,0.0003)
print p
subplot(121)
plot(x,y,'.')
plot(x,fun(x,p))
subplot(122)
plot(pp[:,0],pp[:,1],'o')
plot(w1,w2,'rx',ms=12)
# legend()
show()

随机梯度下降

在上面递归迭代时，所取的求和不去全部而是取一个随机的集合$A$
$$\begin{aligned}
w&:=w-\eta\nabla J\\
&:=w+\eta\sum_{i\in A}(y_i-h(x_i,w))\nabla h(x_i,w)
\end{aligned}$$

from pylab import *
import random
def generate(w1,w2,_fun,p):
    x=array(linspace(-10,10,100))
    r=array((random_sample(len(x))-0.5)*0.2)
    y=_fun(x,p)+r
    return x,y
def gd(x,y,_fun,_jcb,eta):
    p=array([1,1])
    pp=[]
    mini_batch=20
    for step in range(500):
        samp=random.sample(xrange(len(x)),mini_batch)
        # p=p+eta*dot((y-_fun(x,p)),(jcb(x,p)))
        p=p+eta*dot((y-_fun(x,p))[samp],(jcb(x,p))[samp])
        pp.append(p)
        # plot(x,fun(x,p),label=str(step))
    return p,array(pp)
# def fun(x,p):
#     w1,w2=p
#     return w1+w2*x
#
# def jcb(x,p):
#     w1,w2=p
#     out=zeros([len(x),len(p)])
#     out[:,0]=1
#     out[:,1]=x
#     return out
def fun(x,p):
    w1,w2=p
    return w1*x+w2*x**2
def jcb(x,p):
    w1,w2=p
    out=zeros([len(x),len(p)])
    out[:,0]=x
    out[:,1]=x**2
    return out
# def fun(x,p):
#     w1,w2=p
#     return w1*sin(w2*x)
#
# def jcb(x,p):
#     w1,w2=p
#     out=zeros([len(x),len(p)])
#     out[:,0]=sin(w2*x)
#     out[:,1]=w1*x*cos(w2*x)
#     return out
w1=3
w2=1.5
x,y=generate(w1,w2,fun,[w1,w2])
p,pp=gd(x,y,fun,jcb,0.00003)
print p
subplot(121)
plot(x,y,'.')
plot(x,fun(x,p))
subplot(122)
plot(pp[:,0],pp[:,1],'o')
plot(w1,w2,'rx',ms=12)
# legend()
show()