打字猴:1.700510954e+09
1700510954
1700510955 from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activationfrom keras.optimizers import SGDimport os,sys,timeimport numpy as npprint “Start to generate network”model=Sequential()model.add(Dense(120, input_dim=240))model.add(Activation(‘linear’))model.add(Dense(60, input_dim=120))model.add(Activation(‘linear’))model.add(Dense(30, input_dim=60))model.add(Activation(‘linear’))model.add(Dense(15, input_dim=30))model.add(Activation(‘linear’))model.add(Dense(7, input_dim=15))model.add(Activation(‘softmax’))sgd=SGD(lr=0.02)model.compile(loss=‘categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])print “start to load data”records=open(‘./record_3500000.txt’,‘r’)X_train=[]y_train=[]line_pointer=-1for line in records.readlines() line_pointer=line_pointer + 1  X_train.append([])  y_train.append([])  values=line.split(‘,’)  if(line_pointer<=59):    line_length=line_pointer  else:    line_length=59  the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”)  X_train[line_pointer].append(float(values[1]))  X_train[line_pointer].append(float(values[2]))  X_train[line_pointer].append(float(values[3]))  X_train[line_pointer].append(float(values[4]))  #print len(X_train[line_pointer-1])  for j in range(line_length):    X_train[line_pointer].append(X_train[line_pointer-1][j*4+4])    X_train[line_pointer].append(X_train[line_pointer-1][j*4+5])    X_train[line_pointer].append(X_train[line_pointer-1][j*4+6])    X_train[line_pointer].append(X_train[line_pointer-1][j*4+7])  for i in range(240-len(X_train[line_pointer])):    X_train[line_pointer].append(0)  for k in range(7):    y_train[line_pointer].append(0)  y_train[line_pointer][int(values[8])]=1  #print y_train  if line_pointer%1000==0:    print line_pointerprint “start training”model.fit(X_train, y_train, nb_epoch=100, batch_size=2000, validation_split=0.2)json_string=model.to_json()open(‘./my_model_architecture.json’, ‘w’).write(json_string)model.save_weights(‘./my_model_weights.h5’)
1700510956
1700510957 在这个文件中,构造的X_train就是输入向量,y_train就是输出向量。X_train的内容是在每一分钟向过去的方向看60分钟,这60分钟的OPEN、HIGH、LOW、CLOSE一共组成240个维度的向量,一共544万分钟左右的数据。通过这种构造,会形成一个544万×240个维度的二维数组作为训练数据,在文件开始的“空洞”部分会用“0”补齐。由于数量比例非常小,所以这部分数据不会影响训练结果。y_train是分类标签向量。由于我们分了7个类,所以构造出来的是一个544万×7的数组。在这种情况下,使用Crossentropy作为损失函数更为合适。
1700510958
1700510959 由于我的PC内存有限,如果把544万条数据都放入进行训练,会导致内存溢出,所以最后我只选用了时间最晚的350万条数据(将近10年的数据),进行如下处理。
1700510960
1700510961 head-3500000 record.txt >> record_3500000.txt
1700510962
1700510963 调用该文件。
1700510964
1700510965 THEANO_FLAGS=device=gpu,floatX=float32 python train.py
1700510966
1700510967 由于在文件中有进度显示的打印功能,因此会出现如下提示信息。
1700510968
1700510969 ……3493000349400034950003496000349700034980003499000start trainingTrain on 2800000 samples, validate on 700000 samplesEpoch 1/1002800000/2800000 [==============================]-12s-loss: 1.5626-acc: 0.4333-val_loss:1.4892-val_acc: 0.4720Epoch 2/1002800000/2800000 [==============================]-11s-loss: 1.5589-acc: 0.4337-val_loss:1.4849-val_acc: 0.4720Epoch 3/1002800000/2800000 [==============================]-11s-loss: 1.5581-acc: 0.4337-val_loss:1.4810-val_acc: 0.4720Epoch 4/1002800000/2800000 [==============================]-11s-loss: 1.5576-acc: 0.4337-val_loss:1.4811-val_acc: 0.4720Epoch 5/1002800000/2800000 [==============================]-11s-loss: 1.5575-acc: 0.4337-val_loss:1.4859-val_acc: 0.4720Epoch 6/1002800000/2800000 [==============================]-11s-loss: 1.5574-acc: 0.4337-val_loss:1.4843-val_acc: 0.4720Epoch 7/1002800000/2800000 [==============================]-11s-loss: 1.5573-acc: 0.4337-val_loss:1.4855-val_acc: 0.4720Epoch 8/1002800000/2800000 [==============================]-11s-loss: 1.5573-acc: 0.4337-val_loss:1.4847-val_acc: 0.4720Epoch 9/1002800000/2800000 [==============================]-11s-loss: 1.5573-acc: 0.4337-val_loss:1.4871-val_acc: 0.4720Epoch 10/1002800000/2800000 [==============================]-11s-loss: 1.5572-acc: 0.4337-val_loss:1.4852-val_acc: 0.4720……Epoch 91/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4898-val_acc: 0.4720Epoch 92/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4910-val_acc: 0.4720Epoch 93/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4910-val_acc: 0.4720Epoch 94/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4914-val_acc: 0.4720Epoch 95/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4897-val_acc: 0.4720Epoch 96/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4897-val_acc: 0.4720Epoch 97/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4895-val_acc: 0.4720Epoch 98/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4889-val_acc: 0.4720Epoch 99/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4886-val_acc: 0.4720Epoch 100/1002800000/2800000 [==============================]-11s-loss: 1.5568-acc: 0.4337-val_loss:1.4901-val_acc: 0.4720
1700510970
1700510971 从训练的情况来看,在验证集上损失函数的值降到1.4900附近,验证集上的准确率下降到47.20%左右就不再下降了。
1700510972
1700510973 这种Loss不再下降的情况,一般来说有相对固定的套路去解决,一是调整网络参数,二是加入更多维度。调整网络参数主要是指网络结构、激励函数和损失函数的调整。加入维度是指向量信息的丰富。目前加入的只是60分钟的裸K线数据,没有其他信息。基于货币对的周期性波动规律假说,我们在下一次的尝试中可以加入月、星期、小时3个维度。
1700510974
1700510975 (2)第2次尝试
1700510976
1700510977 重新设计网络。
1700510978
1700510979 from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activationfrom keras.optimizers import SGDimport os,sys,timeimport numpy as npprint “Start to generate network”model=Sequential()model.add(Dense(120, input_dim=243))model.add(Activation(‘sigmoid’))model.add(Dense(60, input_dim=120))model.add(Activation(‘sigmoid’))model.add(Dense(30, input_dim=60))model.add(Activation(‘sigmoid’))model.add(Dense(15, input_dim=30))model.add(Activation(‘sigmoid’))model.add(Dense(7, input_dim=15))model.add(Activation(‘softmax’))model.compile(loss=‘categorical_crossentropy’, optimizer=‘rmsprop’, metrics=[‘accuracy’])print “start to load data”records=open(‘./record_3500000.txt’,‘r’)X_train=[]y_train=[]line_pointer=-1for line in records.readlines() line_pointer=line_pointer + 1  X_train.append([])  y_train.append([])  values=line.split(‘,’)  if(line_pointer<=59):    line_length=line_pointer  else:    line_length=59  the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”)  X_train[line_pointer].append(float(time.strftime(“%H”,the_time)))  X_train[line_pointer].append(float(time.strftime(“%m”,the_time)))  X_train[line_pointer].append(float(time.strftime(“%w”,the_time)))  X_train[line_pointer].append(float(values[1]))  X_train[line_pointer].append(float(values[2]))  X_train[line_pointer].append(float(values[3]))  X_train[line_pointer].append(float(values[4]))  for j in range(line_length):    X_train[line_pointer].append(X_train[line_pointer-1][j*4+4])    X_train[line_pointer].append(X_train[line_pointer-1][j*4+5])    X_train[line_pointer].append(X_train[line_pointer-1][j*4+6])    X_train[line_pointer].append(X_train[line_pointer-1][j*4+7])  for i in range(243-len(X_train[line_pointer])):    X_train[line_pointer].append(0)  for k in range(7):    y_train[line_pointer].append(0)  y_train[line_pointer][int(values[8])]=1  if line_pointer%1000==0:    print line_pointerprint “start training”model.fit(X_train, y_train, nb_epoch=100, batch_size=2000, validation_split=0.15)json_string=model.to_json()open(‘./my_model_architecture.json’, ‘w’).write(json_string)model.save_weights(‘./my_model_weights.h5’)
1700510980
1700510981 这次调整把激励函数改成了Sigmoid函数,目的是引入更多的非线性特征。
1700510982
1700510983 the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”)X_train[line_pointer].append(float(time.strftime(“%H”,the_time)))X_train[line_pointer].append(float(time.strftime(“%m”,the_time)))X_train[line_pointer].append(float(time.strftime(“%w”,the_time)))
1700510984
1700510985 这个部分就是加入了月、星期、小时3个维度后再进行训练。
1700510986
1700510987 ……Train on 2975000 samples, validate on 525000 samplesEpoch 1/1002975000/2975000 [==============================]-15s-loss: 1.5305-acc: 0.4356-val_loss:1.4402-val_acc: 0.4623Epoch 2/1002975000/2975000 [==============================]-13s-loss: 1.5071-acc: 0.4377-val_loss:1.4264-val_acc: 0.4623Epoch 3/1002975000/2975000 [==============================]-13s-loss: 1.5047-acc: 0.4377-val_loss:1.4297-val_acc: 0.4623Epoch 4/1002975000/2975000 [==============================]-13s-loss: 1.5030-acc: 0.4377-val_loss:1.4218-val_acc: 0.4623Epoch 5/1002975000/2975000 [==============================]-13s-loss: 1.5014-acc: 0.4377-val_loss:1.4288-val_acc: 0.4623Epoch 6/1002975000/2975000 [==============================]-13s-loss: 1.5001-acc: 0.4377-val_loss:1.4178-val_acc: 0.4623Epoch 7/1002975000/2975000 [==============================]-13s-loss: 1.4991-acc: 0.4377-val_loss:1.4165-val_acc: 0.4623Epoch 8/1002975000/2975000 [==============================]-13s-loss: 1.4983-acc: 0.4377-val_loss:1.4335-val_acc: 0.4623Epoch 9/1002975000/2975000 [==============================]-13s-loss: 1.4976-acc: 0.4377-val_loss:1.4125-val_acc: 0.4623Epoch 10/1002975000/2975000 [==============================]-13s-loss: 1.4969-acc: 0.4377-val_loss:1.4089-val_acc: 0.4623……Epoch 91/1002975000/2975000 [==============================]-13s-loss: 1.4645-acc: 0.4380-val_loss:1.4858-val_acc: 0.4405Epoch 92/1002975000/2975000 [==============================]-13s-loss: 1.4642-acc: 0.4379-val_loss:1.4997-val_acc: 0.4252Epoch 93/1002975000/2975000 [==============================]-13s-loss: 1.4640-acc: 0.4380-val_loss:1.4827-val_acc: 0.4447Epoch 94/1002975000/2975000 [==============================]-13s-loss: 1.4638-acc: 0.4379-val_loss:1.4780-val_acc: 0.4330Epoch 95/1002975000/2975000 [==============================]-13s-loss: 1.4635-acc: 0.4380-val_loss:1.4947-val_acc: 0.4308Epoch 96/1002975000/2975000 [==============================]-13s-loss: 1.4633-acc: 0.4380-val_loss:1.5014-val_acc: 0.4367Epoch 97/1002975000/2975000 [==============================]-13s-loss: 1.4631-acc: 0.4380-val_loss:1.4762-val_acc: 0.4432Epoch 98/1002975000/2975000 [==============================]-13s-loss: 1.4629-acc: 0.4379-val_loss:1.4838-val_acc: 0.4398Epoch 99/1002975000/2975000 [==============================]-13s-loss: 1.4629-acc: 0.4379-val_loss:1.4888-val_acc: 0.4400Epoch 100/100
1700510988
1700510989 损失函数的值比第1次设计的要小,但准确率没有提升。这说明月、星期、小时3个参考值的加入对模型精度的提升没有帮助。
1700510990
1700510991 (3)第3次尝试
1700510992
1700510993 通过前两次尝试我们发现,这个分类过程对熵减没有太多帮助。为什么这么说呢?如果用训练出来的模型进行回测,会发现绝大部分的样本都集中在波幅最小的那一部分,这一点在下面更细的定量分析中就能看出来。所以,如果希望熵减更为明显,就要引入更多的维度,而且尽量不要将分类设置得太“倾斜”,也就是尽量不要让其中个别分类所占的比例太大。
1700510994
1700510995 我们思考一下:既然预测1小时的情况可能会比较复杂,那么是不是可以考虑将这个时间缩短呢?因为时间越长,混沌所产生的影响就越明显;熵越大,预测难度也就越大。这一次我们试试用15分钟内的波幅来分析。
1700510996
1700510997 建立表SLIDEWINDOW_15M。
1700510998
1700510999 CREATE TABLE SLIDEWINDOW_15M(DT DATETIME,RISE_PIP DECIMAL(10,4),FALL_PIP DECIMAL(10,4),PROPORTION DECIMAL(10,4));
1700511000
1700511001 将分析结果插入这个临时表,编写Python文件slidewindow_15m_insert.py。
1700511002
1700511003 #!/usr/bin/pythonimport os,sys,MySQLdbtry db=MySQLdb.connect(host=‘localhost’, user=‘root’, passwd=‘111111’, db=‘FOREX’)  cursor=db.cursor()  counter=0  cursor.execute(‘USE FOREX;’)  sql=‘SELECT * FROM EURUSD_1M’  cursor.execute(sql);  result=cursor.fetchall()  for i in range(0, cursor.rowcount):    startdt=str(result[i][0])    startpip=str(result[i][4])    cursor1=db.cursor()    cursor1.execute(‘USE FOREX;’)    sql1=‘INSERT INTO SLIDEWINDOW_15M SELECT DT, MAX(HIGH)-‘ + startpip + ‘ AS RISE_PIP, ‘ + startpip + ‘-MIN(LOW)AS FALL_PIP, CASE WHEN ‘+ startpip + ‘<>MIN(LOW)THEN(MAX(HIGH)-‘ + startpip + ‘)/(‘ + startpip + ‘-MIN(LOW))ELSE(MAX(HIGH)-‘ + startpip + ‘)/0.0001 END FROM EURUSD_1M WHERE DT BETWEEN ”’ + startdt + ’” AND DATE_ADD(”’ + startdt + ’”, INTERVAL 15 MINUTE)’    cursor1.execute(sql1)    if i%1000==0:      db.commit()  db.commit()except MySQLdb.Error,e print “Error %s” %(str(e.args[0])+’:’+str(e.args[1]))  exit(1)cursor1.close()cursor.close()db.close()
[ 上一页 ]  [ :1.700510954e+09 ]  [ 下一页 ]