神经网络库Keras解决Regression问题

Can machines think? –AM Turing

写在文前：最近在学机器学习的相关知识，由于相关文档和更新速度的关系选择了当前较为流行的Keras，一个基于python的深度学习框架。学校正在进行一个基于深度学习/强化学习(reinforcement learning)控制水下滑翔机的项目，我学习了一下Keras，并做了一些练习。这篇文章将水下滑翔机作为问题背景介绍一下用Keras和Sequential模型处理回归问题。

问题背景

水下滑翔机(Underwater glider)是通过两个方向上的可动质量块改变横滚角度和航向角度的，同时角度和滑翔机的深度和油量也有关系。现在我们想利用已知的可动质量块/深度／油量数据预测滑翔机的姿态。我们可以建立一个三输入两输出的Sequential模型，利用已有数据和神经网络算法解决这个问题。下面用两个例子说明怎么使用Keras较好地做回归。

Regression的两个例子

下面的内容包括利用Keras拟合一个单输入单输出的函数和一个双输入单输出的函数。

单输入单输出函数拟合

这里使用的函数为：
\[y=cos(arctan(x))\]
Sequential是多个网络层的线性堆叠。在这个问题中，我们输入量是一个一阶张量，所以可以通过input_dim来指定输入维度。接下来我们可以建立一个最简单的全连接网络，最后一层的Dense维度同样为1。同时由于输出量有正有负，使用tanh作为激活函数。全部代码如下：

import keras
from keras.layers import Input,Dense,Activation,Reshape,Flatten,Dropout
from keras.models import Sequential,Model
import numpy as np
import matplotlib.pyplot as plt
X_in = np.linspace(-10.,10.,1000)
print X_in
Y_in = np.cos(np.tanh(X_in))
X_verify = np.linspace(-10.,10.,1000)
Y_verify = np.cos(np.tanh(X_verify))
m = Sequential()      
m.add(Dense(128,kernel_initializer='uniform', input_dim=1))
m.add(Activation('relu'))
m.add(Dense(64,activation='relu'))
m.add(Dense(1,activation='tanh'))
m.summary()
m.compile(optimizer='sgd',loss='mean_absolute_error', metric=['accuracy'])
m.fit(X_in,Y_in,epochs=1000,batch_size=64)
y_verify = m.predict(X_verify)
x = np.linspace(1,len(y_verify),len(y_verify))
line_predict, = plt.plot(x,y_verify,label="perdict value")
line_theoretical, = plt.plot(x,Y_verify,label="theoretical value")
plt.legend(handles=[line_predict,line_theoretical])
plt.show()

代码大约分为三个部分，首先使用np.linspace生成数据，接下来将生成的x和计算得到的y扔进建好的模型里并fit。最后使用matplotlib绘图，观察预测值和真实值。得到的图像如下：

Tips：在做这个的时候发现了几个小问题：

batch_size不宜过小，不然在数据集上不收敛
要有足量的数据用于训练，否则容易欠拟合，误差极大
准确来说应该分训练集和测试集验证模型的泛化性能，我没干是因为懒

双输入单输出函数拟合

这里使用的函数是：
\[z=sin(\sqrt{x^2+y^2})\] 解决这个问题我们需要引入一个Merge层将多个Sequential合并到一个输出，结构如下图所示，图片来自Keras中文文档在这个问题中，我们要将两个一维张量输入模型，需添加一个LSTM层，但是我发现这个所谓的LSTM层并不能传入一个一维张量，要求数据的shape必须是二维或以上，不知道是我的打开方式不对还是其他原因，这里存在疑问。为了解决这个问题，我就把输入的两个一维张量reshape了。下面是代码：

import keras
from keras.layers import Input,Flatten,Dense,LSTM,merge
from keras.models import Sequential,Model
import numpy as np
import matplotlib as mpl
from  mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.gca(projection='3d')
x = np.arange(-10.,10.,0.01)
y = np.arange(-10.,10.,0.01)
z = np.sin(np.sqrt(x**2+y**2))
x = x.reshape(-1,1,1)
y = y.reshape(-1,1,1)
x_train_a=Input(shape=(1,1))
x_train_b=Input(shape=(1,1))
shared_lstm = LSTM(64)
encoded_a = shared_lstm(x_train_a)
encoded_b = shared_lstm(x_train_b)
merged_vector = merge([encoded_a,encoded_b],mode='concat',concat_axis=-1)
model = Dense(128,activation='relu')(merged_vector)
model = Dense(64,activation='relu')(model)
predictions = Dense(1,activation='tanh')(model)
m = Model(input=[x_train_a,x_train_b],output = predictions)
m.compile(optimizer='sgd',loss='mean_absolute_error',metric=['accuracy'])
m.summary()
m.fit([x,y],z,epochs=1600,batch_size=256)
z_verify = m.predict([x,y])
z_verify = z_verify.reshape(-1)
x = x.reshape(-1)
y = y.reshape(-1)
line_predict, = ax.plot(x,y,z_verify,label='predict value')
line_theoretical, = ax.plot(x,y,z,label='theoretical value')
plt.legend(handles=[line_predict,line_theoretical])
plt.show()

这个模型和前面提到的是一样的，实现的效果基本上令人满意。程序结构和上一个基本相同，分为三个部分：生成数据，训练数据，验证数据。下面是程序运行的结果图，使用Matplotlib绘制：

单输入双输出函数拟合

这里使用了两个函数：
\[y=cos(arctan(x))\] \[y=sin(x)\]
在这个问题中我们需要增加一个输出层，代码如下：

import keras
from keras.layers import Input,Dense,Activation,Reshape,Flatten,Dropout
from keras.models import Sequential,Model
from keras.callbacks import EarlyStopping
import numpy as np
import matplotlib.pyplot as plt
X_in = np.linspace(-10.,10.,2000)
Y_inA = np.cos(np.tanh(X_in))
Y_inB = np.sin(X_in)
Y_verifyA = np.cos(np.tanh(X_in))
Y_verifyB = np.sin(X_in)
X_in = X_in.reshape(-1,1,1)
Y_inA = Y_inA.reshape(-1,1,1)
Y_inB = Y_inB.reshape(-1,1,1)
inputs = Input(shape=(1,1))
m = Dense(64,activation='relu')(inputs)
m = Dense(64,activation='relu')(m)
outputA = Dense(1,activation='tanh')(m)
outputB = Dense(1,activation='tanh')(m)
m = Model(inputs=[inputs], outputs=[outputA, outputB])
m.compile(optimizer='adam',loss='mean_absolute_error',metric=['accuracy'])
m.summary()
early_stopping = EarlyStopping(monitor='loss', patience=15)
m.fit(X_in,[Y_inA,Y_inB],epochs=1000,batch_size=128,callbacks=[early_stopping])
[y_verifyA,y_verifyB] = m.predict(X_in)
y_verifyA = y_verifyA.reshape(-1)
y_verifyB = y_verifyB.reshape(-1)
x = np.linspace(1,len(y_verifyA),len(y_verifyA))
p1 = plt.subplot(211)
p2 = plt.subplot(212)
line_predictA, = p1.plot(x,y_verifyA,label="predict value A")
line_theoreticalA, = p1.plot(x,Y_verifyA,label="theoretical value A")
p1.legend(handles=[line_predictA,line_theoreticalA])
line_predictB, = p2.plot(x,y_verifyB,label="predict value B")
line_theoreticalB, = p2.plot(x,Y_verifyB,label="theoretical value B")
p2.legend(handles=[line_predictB,line_theoreticalB])
plt.show()

这个说起来并没有什么难的，值的一提的是，在解决这个问题的时候顺便学会了early_stopping的使用。两个参数一个是monitor，监视loss就可以；另一个参数patience填入一个整数x，若训练过程中monitor监视的值不减小x次，那么就提前结束训练。这样epoch就可以往大了设置了。
程序运行的结果如下图所示：

小结

嗯，没什么总结的，就这样。愚人节快乐(/ω＼)