机器学习中的梯度下降算法和随机梯度下降算法

梯度下降

n个样本点$(x_i,y_i)$,待拟合的函数为$h(x,w)$,其中$w$为函数的参数,误差函数为$J(w)=\sum_i (y_i-h(x_i,w))^2/2$,我们的任务是找出参数$w$,使得误差函数最小。沿着误差函数的负梯度方向改变参数即可。
$$\begin{aligned}
w&:=w-\eta\nabla J\\
&:=w+\eta\sum_i(y_i-h(x_i,w))\nabla h(x_i,w)
\end{aligned}$$
其中$\eta$为学习速度。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
from pylab import *
def generate(w1,w2,_fun,p):
x=array(linspace(-10,10,100))
r=array((random_sample(len(x))-0.5)*2)
y=_fun(x,p)+r
return x,y
def fun(x,p):
w1,w2=p
return w1+w2*x
def jcb(x,p):
w1,w2=p
out=zeros([len(x),len(p)])
out[:,0]=1
out[:,1]=x
return out
def gd(x,y,_fun,_jcb,eta):
p=array([0,0])
pp=[]
for step in range(200):
p=p+eta*dot(y-_fun(x,p),jcb(x,p))
pp.append(p)
# plot(x,fun(x,p),label=str(step))
return p,array(pp)
w1=3
w2=1.5
x,y=generate(w1,w2,fun,[w1,w2])
p,pp=gd(x,y,fun,jcb,0.0003)
print p
subplot(121)
plot(x,y,'.')
plot(x,fun(x,p))
subplot(122)
plot(pp[:,0],pp[:,1],'o')
plot(w1,w2,'rx',ms=12)
# legend()
show()

随机梯度下降

在上面递归迭代时,所取的求和不去全部而是取一个随机的集合$A$
$$\begin{aligned}
w&:=w-\eta\nabla J\\
&:=w+\eta\sum_{i\in A}(y_i-h(x_i,w))\nabla h(x_i,w)
\end{aligned}$$

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
from pylab import *
import random
def generate(w1,w2,_fun,p):
x=array(linspace(-10,10,100))
r=array((random_sample(len(x))-0.5)*0.2)
y=_fun(x,p)+r
return x,y
def gd(x,y,_fun,_jcb,eta):
p=array([1,1])
pp=[]
mini_batch=20
for step in range(500):
samp=random.sample(xrange(len(x)),mini_batch)
# p=p+eta*dot((y-_fun(x,p)),(jcb(x,p)))
p=p+eta*dot((y-_fun(x,p))[samp],(jcb(x,p))[samp])
pp.append(p)
# plot(x,fun(x,p),label=str(step))
return p,array(pp)
# def fun(x,p):
# w1,w2=p
# return w1+w2*x
#
# def jcb(x,p):
# w1,w2=p
# out=zeros([len(x),len(p)])
# out[:,0]=1
# out[:,1]=x
# return out
def fun(x,p):
w1,w2=p
return w1*x+w2*x**2
def jcb(x,p):
w1,w2=p
out=zeros([len(x),len(p)])
out[:,0]=x
out[:,1]=x**2
return out
# def fun(x,p):
# w1,w2=p
# return w1*sin(w2*x)
#
# def jcb(x,p):
# w1,w2=p
# out=zeros([len(x),len(p)])
# out[:,0]=sin(w2*x)
# out[:,1]=w1*x*cos(w2*x)
# return out
w1=3
w2=1.5
x,y=generate(w1,w2,fun,[w1,w2])
p,pp=gd(x,y,fun,jcb,0.00003)
print p
subplot(121)
plot(x,y,'.')
plot(x,fun(x,p))
subplot(122)
plot(pp[:,0],pp[:,1],'o')
plot(w1,w2,'rx',ms=12)
# legend()
show()