In [1]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

1. 用lstm模型预测股票的浮动

  1. lstm模型的正向过程刚理解,反向过程还没理解, 可以做为黑盒工作做一次评估
  2. 框架用Python的tensorflow
  3. 数据用的tushare的A股数据,随便找了一个股票代码

2. 下面代码部分是tensorflow的lstm实现

  1. 基本都是tensorflow的原生实现不做解释
In [6]:
# %load /ds/github/stoneLearn/src/lstm/tensorflow/lstm.py
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.python.framework import dtypes
from tensorflow.contrib import learn as tflearn
from tensorflow.contrib import rnn 
from tensorflow.contrib import layers as tflayers

import warnings
warnings.filterwarnings("ignore")

def lstm_model(num_units, rnn_layers, dense_layers=None, learning_rate=0.1, optimizer='Adagrad'):
    """
    Creates a deep model based on:
        * stacked lstm cells
        * an optional dense layers
    :param num_units: the size of the cells.
    :param rnn_layers: list of int or dict
                         * list of int: the steps used to instantiate the `BasicLSTMCell` cell
                         * list of dict: [{steps: int, keep_prob: int}, ...]
    :param dense_layers: list of nodes for each layer
    :return: the model definition
    """

    def lstm_cells(layers):
        if isinstance(layers[0], dict):
            return [rnn.DropoutWrapper(rnn.BasicLSTMCell(layer['num_units'],
                                                                               state_is_tuple=True),
                                                  layer['keep_prob'])
                    if layer.get('keep_prob') else rnn.BasicLSTMCell(layer['num_units'],
                                                                                state_is_tuple=True)
                    for layer in layers]
        return [rnn.BasicLSTMCell(steps, state_is_tuple=True) for steps in layers]

    def dnn_layers(input_layers, layers):
        if layers and isinstance(layers, dict):
            return tflayers.stack(input_layers, tflayers.fully_connected,
                                  layers['layers'],
                                  activation=layers.get('activation'),
                                  dropout=layers.get('dropout'))
        elif layers:
            return tflayers.stack(input_layers, tflayers.fully_connected, layers)
        else:
            return input_layers

    def _lstm_model(X, y):
        stacked_lstm = rnn.MultiRNNCell(lstm_cells(rnn_layers), state_is_tuple=True)
        x_ = tf.unstack(X, axis=1, num=num_units)
        output, layers = rnn.static_rnn(stacked_lstm, x_, dtype=dtypes.float32)
        output = dnn_layers(output[-1], dense_layers)
        prediction, loss = tflearn.models.linear_regression(output, y)
        train_op = tf.contrib.layers.optimize_loss(
            loss, tf.contrib.framework.get_global_step(), optimizer=optimizer,
            learning_rate=learning_rate)
        return prediction, loss, train_op

    return _lstm_model

3. 下面代码是实现的训练和测试

  1. 随便找一支股票取数据
  2. 数据填入模型训练
  3. 完成训练,评估结果
In [7]:
# %load /ds/github/stoneLearn/src/lstm/tensorflow/htu.py

from tensorflow.contrib import learn
from sklearn.metrics import mean_squared_error

import logging
logging.basicConfig(level=logging.DEBUG)

#输入数据列数
TIMESTEPS =  13 
# 1层网络有32个神经元
RNN_LAYERS = [{'num_units': 32}]
#对过拟合的处理
DENSE_LAYERS = None
#迭代训练次数
TRAINING_STEPS = 80000
#打印log的次数
PRINT_STEPS = TRAINING_STEPS / 100
#批量梯度下降中批量的数
BATCH_SIZE = 20


regressor = learn.Estimator(model_fn=lstm_model(TIMESTEPS, RNN_LAYERS, DENSE_LAYERS))


'''
tushare data
取数据按天取
'''
import tushare as ts
tdata = ts.get_hist_data('600848',ktype='d')

'''
nday是数据错误的天数, 比如用今天的数据预测nday后的数据
这里用当日的开盘 收盘 最高 最低 交易量等 预测两天后的收盘数
'''
nday = 2
fsdata = tdata.values[:-nday]

# 数据分10分 6份训练 3份测试 1分验证
xtrain = (tdata.values.shape[0] / 10) * 6
xtest = (tdata.values.shape[0] / 10) * 3

xdata = {}
xdata['train'] = fsdata[:xtrain]
xdata['test'] = fsdata[xtrain:xtrain+xtest]
xdata['val'] = fsdata[xtrain+xtest:]
    
ydata = {}
# 用收盘数据做label则就是预测收盘数据,也可用开盘数据预测开盘等
fclose = [w for w in tdata.close]
fclose = np.array(fclose)
fclose = fclose[nday:]

ydata['train'] = fclose[:xtrain]
ydata['test'] = fclose[xtrain :xtrain + xtest]
ydata['val'] = fclose[xtrain + xtest:]

# 对数据做各种变换整理,方便模型训练
xdata['train'] = xdata['train'].reshape(xdata['train'].shape[0],xdata['train'].shape[1],1)
xdata['test'] = xdata['test'].reshape(xdata['test'].shape[0],xdata['test'].shape[1],1)
xdata['val'] = xdata['val'].reshape(xdata['val'].shape[0],xdata['val'].shape[1],1)
ydata['train'] = ydata['train'].reshape(ydata['train'].shape[0],1)
ydata['test'] = ydata['test'].reshape(ydata['test'].shape[0],1)
ydata['val'] = ydata['val'].reshape(ydata['val'].shape[0],1)

xdata['train'] = xdata['train'].astype(np.float32)
xdata['test'] = xdata['test'].astype(np.float32)
xdata['val'] = xdata['val'].astype(np.float32)
ydata['train'] = ydata['train'].astype(np.float32)
ydata['test'] = ydata['test'].astype(np.float32)
ydata['val'] = ydata['val'].astype(np.float32)


# 创建验证集
validation_monitor = learn.monitors.ValidationMonitor(xdata['val'], ydata['val'],
                                                     every_n_steps=PRINT_STEPS)


# 开始训练 训练是个漫长的过程 , 我的gtx970显卡 40000次迭代大概需要 20-30分钟
regressor.fit(xdata['train'], ydata['train'], 
              monitors=[validation_monitor], 
              batch_size=BATCH_SIZE,
              steps=TRAINING_STEPS)

# 训练完成 对测试数据集合进行预测, 评估模型 方便进一步调整模型
predicted = regressor.predict(xdata['test'])

pp = [x for x in predicted]
pp = np.array(pp)
rmse = np.sqrt(((pp - ydata['test']) ** 2).mean(axis=0))
score = mean_squared_error(pp, ydata['test'])
#打印综合打分
print ("MSE: %f" % score)
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp36rnMq
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp36rnMq
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb433b15190>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': ''}
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb433b15190>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': ''}
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/monitors.py:267: __init__ (from tensorflow.contrib.learn.python.learn.monitors) is deprecated and will be removed after 2016-12-05.
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/monitors.py:267: __init__ (from tensorflow.contrib.learn.python.learn.monitors) is deprecated and will be removed after 2016-12-05.
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
WARNING:tensorflow:From <ipython-input-7-cdcf84f80411>:84: calling fit (from tensorflow.contrib.learn.python.learn.estimators.estimator) with y is deprecated and will be removed after 2016-12-01.
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
INFO:tensorflow:global_step/sec: 262.774
INFO:tensorflow:global_step/sec: 262.047
INFO:tensorflow:loss = 0.0751316, step = 79901
INFO:tensorflow:loss = 0.0751316, step = 79901
INFO:tensorflow:Saving checkpoints for 80000 into /tmp/tmp36rnMq/model.ckpt.
INFO:tensorflow:Saving checkpoints for 80000 into /tmp/tmp36rnMq/model.ckpt.
INFO:tensorflow:Loss for final step: 0.0887386.
INFO:tensorflow:Loss for final step: 0.0887386.
WARNING:tensorflow:From <ipython-input-7-cdcf84f80411>:87: calling predict (from tensorflow.contrib.learn.python.learn.estimators.estimator) with x is deprecated and will be removed after 2016-12-01.
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:151: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
MSE: 3.696865

8万次迭代,最终训练集的loss已经到了0.0751316,在测试集里的评分才3.7, 很明显比较严重的过拟合,不要紧,慢慢来

4. 训练已经完成,已经可以评估

  1. 下边图的两个曲线分别是真是的曲线和预测曲线
  2. 模型比较粗糙,作为模型的方向评估可以做参考
  3. 橘色的是走势,蓝色的是预测走势,基本走势预测还可以,只是峰顶没有预测准确
In [8]:
pylab.plot(pp,label='predict')
pylab.plot(ydata['test'], label='test')
pylab.legend()
Out[8]:
<matplotlib.legend.Legend at 0x7fb4131ba350>
In [ ]: