1700510954
1700510955
from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activationfrom keras.optimizers import SGDimport os,sys,timeimport numpy as npprint “Start to generate network”model=Sequential()model.add(Dense(120, input_dim=240))model.add(Activation(‘linear’))model.add(Dense(60, input_dim=120))model.add(Activation(‘linear’))model.add(Dense(30, input_dim=60))model.add(Activation(‘linear’))model.add(Dense(15, input_dim=30))model.add(Activation(‘linear’))model.add(Dense(7, input_dim=15))model.add(Activation(‘softmax’))sgd=SGD(lr=0.02)model.compile(loss=‘categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])print “start to load data”records=open(‘./record_3500000.txt’,‘r’)X_train=[]y_train=[]line_pointer=-1for line in records.readlines()
: line_pointer=line_pointer + 1 X_train.append([]) y_train.append([]) values=line.split(‘,’) if(line_pointer<=59)
: line_length=line_pointer else
: line_length=59 the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”) X_train[line_pointer].append(float(values[1])) X_train[line_pointer].append(float(values[2])) X_train[line_pointer].append(float(values[3])) X_train[line_pointer].append(float(values[4]))
#print len(X_train[line_pointer-1]) for j in range(line_length)
: X_train[line_pointer].append(X_train[line_pointer-1][j*4+4]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+5]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+6]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+7]) for i in range(240-len(X_train[line_pointer]))
: X_train[line_pointer].append(0) for k in range(7)
: y_train[line_pointer].append(0) y_train[line_pointer][int(values[8])]=1
#print y_train if line_pointer%1000==0
: print line_pointerprint “start training”model.fit(X_train, y_train, nb_epoch=100, batch_size=2000, validation_split=0.2)json_string=model.to_json()open(‘./my_model_architecture.json’, ‘w’).write(json_string)model.save_weights(‘./my_model_weights.h5’)
1700510956
1700510957
在这个文件中,构造的X_train就是输入向量,y_train就是输出向量。X_train的内容是在每一分钟向过去的方向看60分钟,这60分钟的OPEN、HIGH、LOW、CLOSE一共组成240个维度的向量,一共544万分钟左右的数据。通过这种构造,会形成一个544万×240个维度的二维数组作为训练数据,在文件开始的“空洞”部分会用“0”补齐。由于数量比例非常小,所以这部分数据不会影响训练结果。y_train是分类标签向量。由于我们分了7个类,所以构造出来的是一个544万×7的数组。在这种情况下,使用Crossentropy作为损失函数更为合适。
1700510958
1700510959
由于我的PC内存有限,如果把544万条数据都放入进行训练,会导致内存溢出,所以最后我只选用了时间最晚的350万条数据(将近10年的数据),进行如下处理。
1700510960
1700510961
head-3500000 record.txt >> record_3500000.txt
1700510962
1700510963
调用该文件。
1700510964
1700510965
THEANO_FLAGS=device=gpu,floatX=float32 python train.py
1700510966
1700510967
由于在文件中有进度显示的打印功能,因此会出现如下提示信息。
1700510968
1700510969
……3493000349400034950003496000349700034980003499000start trainingTrain on 2800000 samples, validate on 700000 samplesEpoch 1/1002800000/2800000 [==============================]-12s-loss: 1.5626-acc: 0.4333-val_loss
:1.4892-val_acc: 0.4720Epoch 2/1002800000/2800000 [==============================]-11s-loss: 1.5589-acc: 0.4337-val_loss
:1.4849-val_acc: 0.4720Epoch 3/1002800000/2800000 [==============================]-11s-loss: 1.5581-acc: 0.4337-val_loss
:1.4810-val_acc: 0.4720Epoch 4/1002800000/2800000 [==============================]-11s-loss: 1.5576-acc: 0.4337-val_loss
:1.4811-val_acc: 0.4720Epoch 5/1002800000/2800000 [==============================]-11s-loss: 1.5575-acc: 0.4337-val_loss
:1.4859-val_acc: 0.4720Epoch 6/1002800000/2800000 [==============================]-11s-loss: 1.5574-acc: 0.4337-val_loss
:1.4843-val_acc: 0.4720Epoch 7/1002800000/2800000 [==============================]-11s-loss: 1.5573-acc: 0.4337-val_loss
:1.4855-val_acc: 0.4720Epoch 8/1002800000/2800000 [==============================]-11s-loss: 1.5573-acc: 0.4337-val_loss
:1.4847-val_acc: 0.4720Epoch 9/1002800000/2800000 [==============================]-11s-loss: 1.5573-acc: 0.4337-val_loss
:1.4871-val_acc: 0.4720Epoch 10/1002800000/2800000 [==============================]-11s-loss: 1.5572-acc: 0.4337-val_loss
:1.4852-val_acc: 0.4720……Epoch 91/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4898-val_acc: 0.4720Epoch 92/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4910-val_acc: 0.4720Epoch 93/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4910-val_acc: 0.4720Epoch 94/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4914-val_acc: 0.4720Epoch 95/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4897-val_acc: 0.4720Epoch 96/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4897-val_acc: 0.4720Epoch 97/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4895-val_acc: 0.4720Epoch 98/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4889-val_acc: 0.4720Epoch 99/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4886-val_acc: 0.4720Epoch 100/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss
:1.4901-val_acc: 0.4720
1700510970
1700510971
从训练的情况来看,在验证集上损失函数的值降到1.4900附近,验证集上的准确率下降到47.20%左右就不再下降了。
1700510972
1700510973
这种Loss不再下降的情况,一般来说有相对固定的套路去解决,一是调整网络参数,二是加入更多维度。调整网络参数主要是指网络结构、激励函数和损失函数的调整。加入维度是指向量信息的丰富。目前加入的只是60分钟的裸K线数据,没有其他信息。基于货币对的周期性波动规律假说,我们在下一次的尝试中可以加入月、星期、小时3个维度。
1700510974
1700510975
(2)第2次尝试
1700510976
1700510977
重新设计网络。
1700510978
1700510979
from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activationfrom keras.optimizers import SGDimport os,sys,timeimport numpy as npprint “Start to generate network”model=Sequential()model.add(Dense(120, input_dim=243))model.add(Activation(‘sigmoid’))model.add(Dense(60, input_dim=120))model.add(Activation(‘sigmoid’))model.add(Dense(30, input_dim=60))model.add(Activation(‘sigmoid’))model.add(Dense(15, input_dim=30))model.add(Activation(‘sigmoid’))model.add(Dense(7, input_dim=15))model.add(Activation(‘softmax’))model.compile(loss=‘categorical_crossentropy’, optimizer=‘rmsprop’, metrics=[‘accuracy’])print “start to load data”records=open(‘./record_3500000.txt’,‘r’)X_train=[]y_train=[]line_pointer=-1for line in records.readlines()
: line_pointer=line_pointer + 1 X_train.append([]) y_train.append([]) values=line.split(‘,’) if(line_pointer<=59)
: line_length=line_pointer else
: line_length=59 the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”) X_train[line_pointer].append(float(time.strftime(“%H”,the_time))) X_train[line_pointer].append(float(time.strftime(“%m”,the_time))) X_train[line_pointer].append(float(time.strftime(“%w”,the_time))) X_train[line_pointer].append(float(values[1])) X_train[line_pointer].append(float(values[2])) X_train[line_pointer].append(float(values[3])) X_train[line_pointer].append(float(values[4])) for j in range(line_length)
: X_train[line_pointer].append(X_train[line_pointer-1][j*4+4]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+5]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+6]) X_train[line_pointer].append(X_train[line_pointer-1][j*4+7]) for i in range(243-len(X_train[line_pointer]))
: X_train[line_pointer].append(0) for k in range(7)
: y_train[line_pointer].append(0) y_train[line_pointer][int(values[8])]=1 if line_pointer%1000==0
: print line_pointerprint “start training”model.fit(X_train, y_train, nb_epoch=100, batch_size=2000, validation_split=0.15)json_string=model.to_json()open(‘./my_model_architecture.json’, ‘w’).write(json_string)model.save_weights(‘./my_model_weights.h5’)
1700510980
1700510981
这次调整把激励函数改成了Sigmoid函数,目的是引入更多的非线性特征。
1700510982
1700510983
the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”)X_train[line_pointer].append(float(time.strftime(“%H”,the_time)))X_train[line_pointer].append(float(time.strftime(“%m”,the_time)))X_train[line_pointer].append(float(time.strftime(“%w”,the_time)))
1700510984
1700510985
这个部分就是加入了月、星期、小时3个维度后再进行训练。
1700510986
1700510987
……Train on 2975000 samples, validate on 525000 samplesEpoch 1/1002975000/2975000 [==============================]-15s-loss: 1.5305-acc: 0.4356-val_loss
:1.4402-val_acc: 0.4623Epoch 2/1002975000/2975000 [==============================]-13s-loss: 1.5071-acc: 0.4377-val_loss
:1.4264-val_acc: 0.4623Epoch 3/1002975000/2975000 [==============================]-13s-loss: 1.5047-acc: 0.4377-val_loss
:1.4297-val_acc: 0.4623Epoch 4/1002975000/2975000 [==============================]-13s-loss: 1.5030-acc: 0.4377-val_loss
:1.4218-val_acc: 0.4623Epoch 5/1002975000/2975000 [==============================]-13s-loss: 1.5014-acc: 0.4377-val_loss
:1.4288-val_acc: 0.4623Epoch 6/1002975000/2975000 [==============================]-13s-loss: 1.5001-acc: 0.4377-val_loss
:1.4178-val_acc: 0.4623Epoch 7/1002975000/2975000 [==============================]-13s-loss: 1.4991-acc: 0.4377-val_loss
:1.4165-val_acc: 0.4623Epoch 8/1002975000/2975000 [==============================]-13s-loss: 1.4983-acc: 0.4377-val_loss
:1.4335-val_acc: 0.4623Epoch 9/1002975000/2975000 [==============================]-13s-loss: 1.4976-acc: 0.4377-val_loss
:1.4125-val_acc: 0.4623Epoch 10/1002975000/2975000 [==============================]-13s-loss: 1.4969-acc: 0.4377-val_loss
:1.4089-val_acc: 0.4623……Epoch 91/1002975000/2975000 [==============================]-13s-loss: 1.4645-acc: 0.4380-val_loss
:1.4858-val_acc: 0.4405Epoch 92/1002975000/2975000 [==============================]-13s-loss: 1.4642-acc: 0.4379-val_loss
:1.4997-val_acc: 0.4252Epoch 93/1002975000/2975000 [==============================]-13s-loss: 1.4640-acc: 0.4380-val_loss
:1.4827-val_acc: 0.4447Epoch 94/1002975000/2975000 [==============================]-13s-loss: 1.4638-acc: 0.4379-val_loss
:1.4780-val_acc: 0.4330Epoch 95/1002975000/2975000 [==============================]-13s-loss: 1.4635-acc: 0.4380-val_loss
:1.4947-val_acc: 0.4308Epoch 96/1002975000/2975000 [==============================]-13s-loss: 1.4633-acc: 0.4380-val_loss
:1.5014-val_acc: 0.4367Epoch 97/1002975000/2975000 [==============================]-13s-loss: 1.4631-acc: 0.4380-val_loss
:1.4762-val_acc: 0.4432Epoch 98/1002975000/2975000 [==============================]-13s-loss: 1.4629-acc: 0.4379-val_loss
:1.4838-val_acc: 0.4398Epoch 99/1002975000/2975000 [==============================]-13s-loss: 1.4629-acc: 0.4379-val_loss
:1.4888-val_acc: 0.4400Epoch 100/100
1700510988
1700510989
损失函数的值比第1次设计的要小,但准确率没有提升。这说明月、星期、小时3个参考值的加入对模型精度的提升没有帮助。
1700510990
1700510991
(3)第3次尝试
1700510992
1700510993
通过前两次尝试我们发现,这个分类过程对熵减没有太多帮助。为什么这么说呢?如果用训练出来的模型进行回测,会发现绝大部分的样本都集中在波幅最小的那一部分,这一点在下面更细的定量分析中就能看出来。所以,如果希望熵减更为明显,就要引入更多的维度,而且尽量不要将分类设置得太“倾斜”,也就是尽量不要让其中个别分类所占的比例太大。
1700510994
1700510995
我们思考一下:既然预测1小时的情况可能会比较复杂,那么是不是可以考虑将这个时间缩短呢?因为时间越长,混沌所产生的影响就越明显;熵越大,预测难度也就越大。这一次我们试试用15分钟内的波幅来分析。
1700510996
1700510997
建立表SLIDEWINDOW_15M。
1700510998
1700510999
CREATE TABLE SLIDEWINDOW_15M(DT DATETIME,RISE_PIP DECIMAL(10,4),FALL_PIP DECIMAL(10,4),PROPORTION DECIMAL(10,4));
1700511000
1700511001
将分析结果插入这个临时表,编写Python文件slidewindow_15m_insert.py。
1700511002
1700511003
#!/usr/bin/pythonimport os,sys,MySQLdbtry
: db=MySQLdb.connect(host=‘localhost’, user=‘root’, passwd=‘111111’, db=‘FOREX’) cursor=db.cursor() counter=0 cursor.execute(‘USE FOREX;’) sql=‘SELECT * FROM EURUSD_1M’ cursor.execute(sql); result=cursor.fetchall() for i in range(0, cursor.rowcount)
: startdt=str(result[i][0]) startpip=str(result[i][4]) cursor1=db.cursor() cursor1.execute(‘USE FOREX;’) sql1=‘INSERT INTO SLIDEWINDOW_15M SELECT DT, MAX(HIGH)-‘ + startpip + ‘ AS RISE_PIP, ‘ + startpip + ‘-MIN(LOW)AS FALL_PIP, CASE WHEN ‘+ startpip + ‘<>MIN(LOW)THEN(MAX(HIGH)-‘ + startpip + ‘)/(‘ + startpip + ‘-MIN(LOW))ELSE(MAX(HIGH)-‘ + startpip + ‘)/0.0001 END FROM EURUSD_1M WHERE DT BETWEEN ”’ + startdt + ’” AND DATE_ADD(”’ + startdt + ’”, INTERVAL 15 MINUTE)’ cursor1.execute(sql1) if i%1000==0
: db.commit() db.commit()except MySQLdb.Error,e
: print “Error %s” %(str(e.args[0])+’
:’+str(e.args[1])) exit(1)cursor1.close()cursor.close()db.close()
[
上一页 ]
[ :1.700510954e+09 ]
[
下一页 ]