打字猴:1.700511016e+09
1700511016
1700511017 这是用10个点做分割的最小区间来做的,涨跌幅度看后面的F和R就知道了。例如,“| 3920354 | 0 | 0 |”这一条,表示涨跌都在10个点以内的振幅小且方向不确定的数据超过392万条,占全部样本的72%。不得不说,这个比例实在是太大了,相当于其他178个分类总共才占28%——这是一个非常“倾斜”的分类方式。
1700511018
1700511019 我们可以尝试再做一下细分,用5个点作为一个分档。
1700511020
1700511021 mysql> SELECT COUNT(*), F, R FROM(SELECT FALL_PIP DIV 0.0005 AS F, RISE_PIP DIV 0.0005 AS R FROMEURUSD_TRAINING_15M)R GROUP BY F, R;+–––-+––+––+| COUNT(*)| F    | R    |+–––-+––+––+|  1916865 |    0 |    0 ||   895584 |    0 |    1 ||   334411 |    0 |    2 ||   134729 |    0 |    3 ||    60612 |    0 |    4 ||    29028 |    0 |    5 |……|   894166 |    1 |    0 ||   213739 |    1 |    1 ||    71067 |    1 |    2 ||    31033 |    1 |    3 ||    15198 |    1 |    4 |……|        4 |   34 |    0 ||        1 |   34 |    1 ||        1 |   34 |    3 ||        9 |   35 |    0 ||        1 |   37 |    0 ||        1 |   37 |    1 ||        1 |   38 |    2 ||        1 |   38 |    3 ||        1 |   40 |    0 ||        1 |   40 |    2 ||        1 |   42 |    1 ||        1 |   42 |    3 ||        1 |   44 |    0 ||        1 |   44 |   14 ||        1 |   46 |    4 |+–––-+––+––+513 rows in set(4.31 sec)
1700511022
1700511023 在这个划分中,虽然涨跌都在5个点的分类“| 1916865 | 0 | 0 |”仍然有191万条记录,但是占比已经下降到35%左右。
1700511024
1700511025 下面,我们用其他处理技巧作为辅助,继续这个熵减过程。
1700511026
1700511027 ALTER TABLE EURUSD_TRAINING_15M ADD COLUMN(F INT);ALTER TABLE EURUSD_TRAINING_15M ADD COLUMN(R INT);UPDATE EURUSD_TRAINING_15M SET R=RISE_PIP DIV 0.0005, F=FALL_PIP DIV 0.0005;COMMIT;
1700511028
1700511029 要想给EURUSD_TRAINING_15M这个表上加上标签,可以先画一个表,看看它们的分布是什么样子(如图18-21所示)。
1700511030
1700511031
1700511032
1700511033
1700511034 图18-21 分类分布
1700511035
1700511036 我是这样做的:把F和R两种数值在二维表格中画出来,在没有做任何分类之前,整个表格出现的都是“0”这个类别,即熵最大的情况;“1”是对角线的部分,就是F和R振幅相当的情况;右下侧的部分是振幅很大(虽然两侧振幅不相等,但是幅度很大)的分类,明显是震荡盘面,不适合下单;“2”和“3”是单边振幅明显较大的分类,“2”是上涨,“3”是下跌;“4”和“5”分别是涨幅6~10个点与跌幅0~5个点的分类、跌幅0~5个点与涨幅6~10个点的分类,仍属于单方面比较确定的方向,只不过是幅度很小的情况。
1700511037
1700511038 对其他分类,分别按照其R和F的标识即可进行标注。从理论上说,除了“0”和“1”两个分类以外,其他的分类都可以尝试入场。
1700511039
1700511040 用SQL做一下更新。
1700511041
1700511042 #WHITEUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=0;#YELLOWUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=1WHERE R>=10 AND F>=10;UPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=1WHERE ABS(R-F)<=1;#RED-RUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=2WHERE R>=F+10;#RED-FUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=3WHERE F>=R+10;#CYAN-RUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=4WHERE R=1 AND F=0;#CYAN-FUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=5WHERE R=0 AND F=1;#GREEN-RUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=6WHERE R=2 AND F=0;#GREEN-FUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=7WHERE R=0 AND F=2;#BLUE-RUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=8WHERE R=3 AND F=0;UPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=8WHERE R=2 AND F=1;#BLUE-FUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=9WHERE R=0 AND F=3;UPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=9WHERE R=1 AND F=2;#PURPLE-RUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=10WHERE R=4 AND F=0;UPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=10WHERE R=3 AND F=1;#PURPLE-FUPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=11WHERE R=0 AND F=4;UPDATE EURUSD_TRAINING_15MSET CLASSIFICATION=11WHERE R=1 AND F=3;
1700511043
1700511044 将这些训练样本导出为文本,编辑文件make_file_15m.py。
1700511045
1700511046 import os,sys,MySQLdbimport numpy as npdb=MySQLdb.connect(host=‘localhost’, user=‘root’, passwd=‘111111’, db=‘FOREX’)cursor=db.cursor()cursor.execute(‘USE FOREX;’)sql=‘SELECT * FROM EURUSD_TRAINING_15M;’cursor.execute(sql)result=cursor.fetchall()for i in range(cursor.rowcount) printstr(result[i][0])+’,’+str(result[i][1])+’,’+str(result[i][2])+’,’+str(result[i][3])+’,’+str(r esult[i][4])+’,’+str(result[i][5])+’,’+str(result[i][6])+’,’+str(result[i][7])+’,’+str(result[i][8])cursor.close()db.close()
1700511047
1700511048 在Shell下调用。
1700511049
1700511050 python make_file_15m.py >> record_15M.txt
1700511051
1700511052 截断文件,只保留最后的350万条记录。
1700511053
1700511054 tail-3500000 record_15M.txt >> record_15M_3500000.txt
1700511055
1700511056 编辑训练文件train_1.py。
1700511057
1700511058 from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activationfrom keras.optimizers import SGDimport os,sys,timeimport numpy as npprint “Start to generate network”model=Sequential()model.add(Dense(120, input_dim=63))model.add(Activation(‘sigmoid’))model.add(Dense(80, input_dim=100))model.add(Activation(‘sigmoid’))model.add(Dense(60, input_dim=70))model.add(Activation(‘sigmoid’))model.add(Dense(50, input_dim=120))model.add(Activation(‘sigmoid’))model.add(Dense(40, input_dim=50))model.add(Activation(‘sigmoid’))model.add(Dense(30, input_dim=40))model.add(Activation(‘sigmoid’))model.add(Dense(12, input_dim=30))model.add(Activation(‘softmax’))model.compile(loss=‘categorical_crossentropy’, optimizer=‘rmsprop’, metrics=[‘accuracy’])print “start to load data”records=open(‘./record_15M_3500000.txt’,‘r’)X_train=[]y_train=[]line_pointer=-1for line in records.readlines() line_pointer=line_pointer + 1  X_train.append([])  y_train.append([])  values=line.split(‘,’)  if(line_pointer<=14):    line_length=line_pointer  else:    line_length=14  the_time=time.strptime(str(values[0]),”%Y-%m-%d %H:%M:%S”)  X_train[line_pointer].append(float(time.strftime(“%H”,the_time)))  X_train[line_pointer].append(float(time.strftime(“%m”,the_time)))  X_train[line_pointer].append(float(time.strftime(“%w”,the_time)))  X_train[line_pointer].append(float(values[1]))  X_train[line_pointer].append(float(values[2]))  X_train[line_pointer].append(float(values[3]))  X_train[line_pointer].append(float(values[4]))  for j in range(line_length):    X_train[line_pointer].append(X_train[line_pointer-1][j*4+4])    X_train[line_pointer].append(X_train[line_pointer-1][j*4+5])    X_train[line_pointer].append(X_train[line_pointer-1][j*4+6])    X_train[line_pointer].append(X_train[line_pointer-1][j*4+7])  for i in range(63-len(X_train[line_pointer])):    X_train[line_pointer].append(0)  for k in range(12):    y_train[line_pointer].append(0)  y_train[line_pointer][int(values[8])]=1  if line_pointer%10000==0:    print line_pointerprint “start training”#print X_train[0]#print X_train[100]model.fit(X_train, y_train, nb_epoch=20, batch_size=2000, validation_split=0.15)json_string=model.to_json()open(‘./my_model_architecture_1.json’, ‘w’).write(json_string)model.save_weights(‘./my_model_weights_1.h5’)pre=model.predict(X_train)predicted=np.zeros((12,12))for i in range(len(pre)) max_train=0  max_pre=0  for m in range(12):    if(y_train[i][m]==1):      max_train=m  for m in range(12):    if(pre[i][max_pre] < pre[i][m]):      max_pre=m  predicted[max_train][max_pre]=predicted[max_train][max_pre] + 1for i in range(12) for j in range(12):    print predicted[i][j],  print ””
1700511059
1700511060 这段代码只执行了20轮,执行后会输出下面的内容。
1700511061
1700511062 Train on 2975000 samples, validate on 525000 samplesEpoch 1/202975000/2975000 [==============================]-14s-loss: 1.8795-acc: 0.3742-val_loss:1.6941-val_acc: 0.4185Epoch 2/202975000/2975000 [==============================]-14s-loss: 1.8547-acc: 0.3751-val_loss:1.6928-val_acc: 0.4185Epoch 3/202975000/2975000 [==============================]-14s-loss: 1.8531-acc: 0.3751-val_loss:1.6813-val_acc: 0.4185Epoch 4/202975000/2975000 [==============================]-14s-loss: 1.8504-acc: 0.3751-val_loss:1.6750-val_acc: 0.4185Epoch 5/202975000/2975000 [==============================]-14s-loss: 1.8476-acc: 0.3750-val_loss:1.6702-val_acc: 0.4185Epoch 6/202975000/2975000 [==============================]-14s-loss: 1.8458-acc: 0.3750-val_loss:1.6696-val_acc: 0.4180Epoch 7/202975000/2975000 [==============================]-14s-loss: 1.8448-acc: 0.3751-val_loss:1.6637-val_acc: 0.4187Epoch 8/202975000/2975000 [==============================]-14s-loss: 1.8439-acc: 0.3752-val_loss:1.6783-val_acc: 0.4174Epoch 9/202975000/2975000 [==============================]-14s-loss: 1.8428-acc: 0.3752-val_loss:1.6555-val_acc: 0.4186Epoch 10/202975000/2975000 [==============================]-14s-loss: 1.8415-acc: 0.3752-val_loss:1.6528-val_acc: 0.4186Epoch 11/202975000/2975000 [==============================]-14s-loss: 1.8405-acc: 0.3752-val_loss:1.6534-val_acc: 0.4185Epoch 12/202975000/2975000 [==============================]-14s-loss: 1.8398-acc: 0.3753-val_loss:1.6525-val_acc: 0.4185Epoch 13/202975000/2975000 [==============================]-14s-loss: 1.8391-acc: 0.3753-val_loss:1.6504-val_acc: 0.4186Epoch 14/202975000/2975000 [==============================]-14s-loss: 1.8384-acc: 0.3754-val_loss:1.6573-val_acc: 0.4186Epoch 15/202975000/2975000 [==============================]-14s-loss: 1.8376-acc: 0.3754-val_loss:1.6475-val_acc: 0.4185Epoch 16/202975000/2975000 [==============================]-14s-loss: 1.8366-acc: 0.3754-val_loss:1.6498-val_acc: 0.4186Epoch 17/202975000/2975000 [==============================]-14s-loss: 1.8348-acc: 0.3754-val_loss:1.6688-val_acc: 0.4168Epoch 18/202975000/2975000 [==============================]-14s-loss: 1.8321-acc: 0.3756-val_loss:1.6759-val_acc: 0.4167Epoch 19/202975000/2975000 [==============================]-13s-loss: 1.8280-acc: 0.3760-val_loss:1.6991-val_acc: 0.4127Epoch 20/202975000/2975000 [==============================]-13s-loss: 1.8239-acc: 0.3762-val_loss:1.7493-val_acc: 0.407727434.0 144944.0 0.0 0.0 0.0 197.0 0.0 0.0 0.0 0.0 0.0 0.028574.0 1305778.0 0.0 0.0 9.0 1268.0 0.0 0.0 0.0 0.0 0.0 0.01115.0 5092.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 0.0 0.0 0.01067.0 5173.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.017999.0 533833.0 0.0 0.0 5.0 1337.0 0.0 0.0 0.0 0.0 0.0 0.017493.0 528829.0 0.0 0.0 36.0 1471.0 0.0 0.0 0.0 0.0 0.0 0.013725.0 205288.0 0.0 0.0 0.0 728.0 0.0 0.0 0.0 0.0 0.0 0.013521.0 202727.0 0.0 0.0 10.0 713.0 0.0 0.0 0.0 0.0 0.0 0.015244.0 135881.0 0.0 0.0 0.0 375.0 0.0 0.0 0.0 0.0 0.0 0.014773.0 134573.0 0.0 0.0 0.0 416.0 0.0 0.0 0.0 0.0 0.0 0.08522.0 61368.0 0.0 0.0 0.0 125.0 0.0 0.0 0.0 0.0 0.0 0.08434.0 61777.0 0.0 0.0 0.0 141.0 0.0 0.0 0.0 0.0 0.0 0.0
1700511063
1700511064 为了展示得清晰一些,我们把这个表格放到Excel里面去,如图18-22所示。
1700511065
[ 上一页 ]  [ :1.700511016e+09 ]  [ 下一页 ]