1700510940
1700510941
编写Python文件make_file.py。
1700510942
1700510943
import os,sys,MySQLdbimport numpy as npdb=MySQLdb.connect(host=‘localhost’, user=‘root’, passwd=‘111111’, db=‘FOREX’)cursor=db.cursor()cursor.execute(‘USE FOREX;’)sql=‘SELECT * FROM EURUSD_TRAINING;’cursor.execute(sql)result=cursor.fetchall()for i in range(cursor.rowcount): printstr(result[i][0])+’,’+str(result[i][1])+’,’+str(result[i][2])+’,’+str(result[i][3])+’,’+str(r esult[i][4])+’,’+str(result[i][5])+’,’+str(result[i][6])+’,’+str(result[i][7])+’,’+str(result [i][8])cursor.close()db.close()
1700510944
1700510945
调用该文件,并把文件输出到record.txt。
1700510946
1700510947
python make_file.py >> record.txt
1700510948
1700510949
(1)第1次尝试
1700510950
1700510951
在这个文件中,我们设计了一个有8个维度的向量,但这8个维度在训练中并非都用得上。
1700510952
1700510953
编写训练文件train.py。
1700510954
1700510955
from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activationfrom keras.optimizers import SGDimport os,sys,timeimport numpy as npprint “Start to generate network”model=Sequential()model.add(Dense(120, input_dim=240))model.add(Activation(‘linear’))model.add(Dense(60, input_dim=120))model.add(Activation(‘linear’))model.add(Dense(30, input_dim=60))model.add(Activation(‘linear’))model.add(Dense(15, input_dim=30))model.add(Activation(‘linear’))model.add(Dense(7, input_dim=15))model.add(Activation(‘softmax’))sgd=SGD(lr=0.02)model.compile(loss=‘categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])print “start to load data”records=open(‘./record_3500000.txt’,‘r’)X_train=[]y_train=[]line_pointer=-1for line in records.readlines()
: line_pointer=line_pointer + 1 X_train.append([]) y_train.append([]) values=line.split(‘,’) if(line_pointer<=59)
: line_length=line_pointer else
: line_length=59 the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”) X_train[line_pointer].append(float(values[1])) X_train[line_pointer].append(float(values[2])) X_train[line_pointer].append(float(values[3])) X_train[line_pointer].append(float(values[4]))
#print len(X_train[line_pointer-1]) for j in range(line_length)
: X_train[line_pointer].append(X_train[line_pointer-1][j*4+4]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+5]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+6]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+7]) for i in range(240-len(X_train[line_pointer]))
: X_train[line_pointer].append(0) for k in range(7)
: y_train[line_pointer].append(0) y_train[line_pointer][int(values[8])]=1
#print y_train if line_pointer%1000==0
: print line_pointerprint “start training”model.fit(X_train, y_train, nb_epoch=100, batch_size=2000, validation_split=0.2)json_string=model.to_json()open(‘./my_model_architecture.json’, ‘w’).write(json_string)model.save_weights(‘./my_model_weights.h5’)
1700510956
1700510957
在这个文件中,构造的X_train就是输入向量,y_train就是输出向量。X_train的内容是在每一分钟向过去的方向看60分钟,这60分钟的OPEN、HIGH、LOW、CLOSE一共组成240个维度的向量,一共544万分钟左右的数据。通过这种构造,会形成一个544万×240个维度的二维数组作为训练数据,在文件开始的“空洞”部分会用“0”补齐。由于数量比例非常小,所以这部分数据不会影响训练结果。y_train是分类标签向量。由于我们分了7个类,所以构造出来的是一个544万×7的数组。在这种情况下,使用Crossentropy作为损失函数更为合适。
1700510958
1700510959
由于我的PC内存有限,如果把544万条数据都放入进行训练,会导致内存溢出,所以最后我只选用了时间最晚的350万条数据(将近10年的数据),进行如下处理。
1700510960
1700510961
head-3500000 record.txt >> record_3500000.txt
1700510962
1700510963
调用该文件。
1700510964
1700510965
THEANO_FLAGS=device=gpu,floatX=float32 python train.py
1700510966
1700510967
由于在文件中有进度显示的打印功能,因此会出现如下提示信息。
1700510968
1700510969
……3493000349400034950003496000349700034980003499000start trainingTrain on 2800000 samples, validate on 700000 samplesEpoch 1/1002800000/2800000 [==============================]-12s-loss: 1.5626-acc: 0.4333-val_loss
:1.4892-val_acc: 0.4720Epoch 2/1002800000/2800000 [==============================]-11s-loss: 1.5589-acc: 0.4337-val_loss
:1.4849-val_acc: 0.4720Epoch 3/1002800000/2800000 [==============================]-11s-loss: 1.5581-acc: 0.4337-val_loss
:1.4810-val_acc: 0.4720Epoch 4/1002800000/2800000 [==============================]-11s-loss: 1.5576-acc: 0.4337-val_loss
:1.4811-val_acc: 0.4720Epoch 5/1002800000/2800000 [==============================]-11s-loss: 1.5575-acc: 0.4337-val_loss
:1.4859-val_acc: 0.4720Epoch 6/1002800000/2800000 [==============================]-11s-loss: 1.5574-acc: 0.4337-val_loss
:1.4843-val_acc: 0.4720Epoch 7/1002800000/2800000 [==============================]-11s-loss: 1.5573-acc: 0.4337-val_loss
:1.4855-val_acc: 0.4720Epoch 8/1002800000/2800000 [==============================]-11s-loss: 1.5573-acc: 0.4337-val_loss
:1.4847-val_acc: 0.4720Epoch 9/1002800000/2800000 [==============================]-11s-loss: 1.5573-acc: 0.4337-val_loss
:1.4871-val_acc: 0.4720Epoch 10/1002800000/2800000 [==============================]-11s-loss: 1.5572-acc: 0.4337-val_loss
:1.4852-val_acc: 0.4720……Epoch 91/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4898-val_acc: 0.4720Epoch 92/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4910-val_acc: 0.4720Epoch 93/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4910-val_acc: 0.4720Epoch 94/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4914-val_acc: 0.4720Epoch 95/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4897-val_acc: 0.4720Epoch 96/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4897-val_acc: 0.4720Epoch 97/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4895-val_acc: 0.4720Epoch 98/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4889-val_acc: 0.4720Epoch 99/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4886-val_acc: 0.4720Epoch 100/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4901-val_acc: 0.4720
1700510970
1700510971
从训练的情况来看,在验证集上损失函数的值降到1.4900附近,验证集上的准确率下降到47.20%左右就不再下降了。
1700510972
1700510973
这种Loss不再下降的情况,一般来说有相对固定的套路去解决,一是调整网络参数,二是加入更多维度。调整网络参数主要是指网络结构、激励函数和损失函数的调整。加入维度是指向量信息的丰富。目前加入的只是60分钟的裸K线数据,没有其他信息。基于货币对的周期性波动规律假说,我们在下一次的尝试中可以加入月、星期、小时3个维度。
1700510974
1700510975
(2)第2次尝试
1700510976
1700510977
重新设计网络。
1700510978
1700510979
from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activationfrom keras.optimizers import SGDimport os,sys,timeimport numpy as npprint “Start to generate network”model=Sequential()model.add(Dense(120, input_dim=243))model.add(Activation(‘sigmoid’))model.add(Dense(60, input_dim=120))model.add(Activation(‘sigmoid’))model.add(Dense(30, input_dim=60))model.add(Activation(‘sigmoid’))model.add(Dense(15, input_dim=30))model.add(Activation(‘sigmoid’))model.add(Dense(7, input_dim=15))model.add(Activation(‘softmax’))model.compile(loss=‘categorical_crossentropy’, optimizer=‘rmsprop’, metrics=[‘accuracy’])print “start to load data”records=open(‘./record_3500000.txt’,‘r’)X_train=[]y_train=[]line_pointer=-1for line in records.readlines()
: line_pointer=line_pointer + 1 X_train.append([]) y_train.append([]) values=line.split(‘,’) if(line_pointer<=59)
: line_length=line_pointer else
: line_length=59 the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”) X_train[line_pointer].append(float(time.strftime(“%H”,the_time))) X_train[line_pointer].append(float(time.strftime(“%m”,the_time))) X_train[line_pointer].append(float(time.strftime(“%w”,the_time))) X_train[line_pointer].append(float(values[1])) X_train[line_pointer].append(float(values[2])) X_train[line_pointer].append(float(values[3])) X_train[line_pointer].append(float(values[4])) for j in range(line_length)
: X_train[line_pointer].append(X_train[line_pointer-1][j*4+4]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+5]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+6]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+7]) for i in range(243-len(X_train[line_pointer]))
: X_train[line_pointer].append(0) for k in range(7)
: y_train[line_pointer].append(0) y_train[line_pointer][int(values[8])]=1 if line_pointer%1000==0
: print line_pointerprint “start training”model.fit(X_train, y_train, nb_epoch=100, batch_size=2000, validation_split=0.15)json_string=model.to_json()open(‘./my_model_architecture.json’, ‘w’).write(json_string)model.save_weights(‘./my_model_weights.h5’)
1700510980
1700510981
这次调整把激励函数改成了Sigmoid函数,目的是引入更多的非线性特征。
1700510982
1700510983
the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”)X_train[line_pointer].append(float(time.strftime(“%H”,the_time)))X_train[line_pointer].append(float(time.strftime(“%m”,the_time)))X_train[line_pointer].append(float(time.strftime(“%w”,the_time)))
1700510984
1700510985
这个部分就是加入了月、星期、小时3个维度后再进行训练。
1700510986
1700510987
……Train on 2975000 samples, validate on 525000 samplesEpoch 1/1002975000/2975000 [==============================]-15s-loss: 1.5305-acc: 0.4356-val_loss
:1.4402-val_acc: 0.4623Epoch 2/1002975000/2975000 [==============================]-13s-loss: 1.5071-acc: 0.4377-val_loss
:1.4264-val_acc: 0.4623Epoch 3/1002975000/2975000 [==============================]-13s-loss: 1.5047-acc: 0.4377-val_loss
:1.4297-val_acc: 0.4623Epoch 4/1002975000/2975000 [==============================]-13s-loss: 1.5030-acc: 0.4377-val_loss
:1.4218-val_acc: 0.4623Epoch 5/1002975000/2975000 [==============================]-13s-loss: 1.5014-acc: 0.4377-val_loss
:1.4288-val_acc: 0.4623Epoch 6/1002975000/2975000 [==============================]-13s-loss: 1.5001-acc: 0.4377-val_loss
:1.4178-val_acc: 0.4623Epoch 7/1002975000/2975000 [==============================]-13s-loss: 1.4991-acc: 0.4377-val_loss
:1.4165-val_acc: 0.4623Epoch 8/1002975000/2975000 [==============================]-13s-loss: 1.4983-acc: 0.4377-val_loss
:1.4335-val_acc: 0.4623Epoch 9/1002975000/2975000 [==============================]-13s-loss: 1.4976-acc: 0.4377-val_loss
:1.4125-val_acc: 0.4623Epoch 10/1002975000/2975000 [==============================]-13s-loss: 1.4969-acc: 0.4377-val_loss
:1.4089-val_acc: 0.4623……Epoch 91/1002975000/2975000 [==============================]-13s-loss: 1.4645-acc: 0.4380-val_loss
:1.4858-val_acc: 0.4405Epoch 92/1002975000/2975000 [==============================]-13s-loss: 1.4642-acc: 0.4379-val_loss
:1.4997-val_acc: 0.4252Epoch 93/1002975000/2975000 [==============================]-13s-loss: 1.4640-acc: 0.4380-val_loss
:1.4827-val_acc: 0.4447Epoch 94/1002975000/2975000 [==============================]-13s-loss: 1.4638-acc: 0.4379-val_loss
:1.4780-val_acc: 0.4330Epoch 95/1002975000/2975000 [==============================]-13s-loss: 1.4635-acc: 0.4380-val_loss
:1.4947-val_acc: 0.4308Epoch 96/1002975000/2975000 [==============================]-13s-loss: 1.4633-acc: 0.4380-val_loss
:1.5014-val_acc: 0.4367Epoch 97/1002975000/2975000 [==============================]-13s-loss: 1.4631-acc: 0.4380-val_loss
:1.4762-val_acc: 0.4432Epoch 98/1002975000/2975000 [==============================]-13s-loss: 1.4629-acc: 0.4379-val_loss
:1.4838-val_acc: 0.4398Epoch 99/1002975000/2975000 [==============================]-13s-loss: 1.4629-acc: 0.4379-val_loss
:1.4888-val_acc: 0.4400Epoch 100/100
1700510988
1700510989
损失函数的值比第1次设计的要小,但准确率没有提升。这说明月、星期、小时3个参考值的加入对模型精度的提升没有帮助。
[
上一页 ]
[ :1.70051094e+09 ]
[
下一页 ]