使用Python和機(jī)器學(xué)習(xí)進(jìn)行股價(jià)預(yù)測

追夢文庫 2020-01-21

展開全文

在本文中，我將向您展示如何使用稱為長短期記憶（LSTM）的機(jī)器學(xué)習(xí)技術(shù)編寫一個(gè)預(yù)測股票價(jià)格的python程序。這個(gè)程序真的很簡單，這個(gè)程序會帶來一些重大收益，總之是比猜測的要好！請記住，股價(jià)可能受許多不同因素的影響。這里所做的預(yù)測僅僅是預(yù)測而已。

長短期記憶（LSTM）是在深度學(xué)習(xí)領(lǐng)域中使用的循環(huán)神經(jīng)網(wǎng)絡(luò)（RNN）架構(gòu)。與標(biāo)準(zhǔn)前饋神經(jīng)網(wǎng)絡(luò)不同，LSTM具有反饋連接。它不僅可以處理單個(gè)數(shù)據(jù)點(diǎn)（例如圖像），而且可以處理整個(gè)數(shù)據(jù)序列（例如語音或視頻）。

LSTM被廣泛用于序列預(yù)測問題，并被證明是非常有效的。之所以如此有效，是因?yàn)長STM能夠存儲重要的過去信息，而忘記了不重要的信息。

LSTM的通用體系結(jié)構(gòu)：
Forget Gate
Input Gate
Output Gate

開始編程：

我將首先說明我希望該程序執(zhí)行的操作。我希望該程序根據(jù)當(dāng)前的收盤價(jià)來預(yù)測蘋果公司股票未來60天的價(jià)格。

首先，我將在程序編寫的開頭中注入對該程序進(jìn)行描述。

# Description: This program uses an artificial recurrent neural network called Long Short Term Memory (LSTM) to predict the closing stock price of a corporation (Apple Inc.) using the past 60 day stock price.

接下來，我將導(dǎo)入將在整個(gè)程序中使用的庫。

#Import the librariesimport mathimport pandas_datareader as webimport numpy as npimport pandas as pdfrom sklearn.preprocessing import MinMaxScalerfrom keras.models import Sequentialfrom keras.layers import Dense, LSTMimport matplotlib.pyplot as pltplt.style.use('fivethirtyeight')

我將從2012年1月1日到2019年12月20日，通過股票行情公司獲得“Apple Inc.”的股票報(bào)價(jià)（AAPL）。

#Get the stock quote df = web.DataReader('AAPL', data_source='yahoo', start='2012-01-01', end='2019-12-20') #Show the data df

使用Python和機(jī)器學(xué)習(xí)進(jìn)行股價(jià)預(yù)測

蘋果股票行情

我們可以在最后一行看見顯示了數(shù)據(jù)集中的行數(shù)和列數(shù)。我們記錄了2006天的股票價(jià)格和6列股票的分類。

創(chuàng)建一個(gè)圖表以可視化數(shù)據(jù)。

plt.figure(figsize=(16,8))plt.title('Close Price History')plt.plot(df['Close'])plt.xlabel('Date',fontsize=18)plt.ylabel('Close Price USD ($)',fontsize=18)plt.show()

使用Python和機(jī)器學(xué)習(xí)進(jìn)行股價(jià)預(yù)測

該圖顯示了蘋果公司的收盤價(jià)歷史。

創(chuàng)建一個(gè)僅包含收盤價(jià)的新數(shù)據(jù)框，并將其轉(zhuǎn)換為數(shù)組。
然后創(chuàng)建一個(gè)變量以存儲訓(xùn)練數(shù)據(jù)集的長度。我希望訓(xùn)練數(shù)據(jù)集包含大約80％的數(shù)據(jù)。

#Create a new dataframe with only the 'Close' columndata = df.filter(['Close'])#Converting the dataframe to a numpy arraydataset = data.values#Get /Compute the number of rows to train the model ontraining_data_len = math.ceil( len(dataset) *.8)

現(xiàn)在將數(shù)據(jù)集縮放為0和1之間（含0和1）的值，我這樣做是因?yàn)樵趯⑵涮峁┙o神經(jīng)網(wǎng)絡(luò)之前通常將數(shù)據(jù)縮放是一種很好的做法。

#Scale the all of the data to be values between 0 and 1 scaler = MinMaxScaler(feature_range=(0, 1)) scaled_data = scaler.fit_transform(dataset)

創(chuàng)建一個(gè)包含過去60天收盤價(jià)的訓(xùn)練數(shù)據(jù)集，我們希望使用它來預(yù)測第61個(gè)收盤價(jià)。

因此，“ x_train ”數(shù)據(jù)集中的第一列將包含從索引0到索引59（總共60個(gè)值）的數(shù)據(jù)集中的值，第二列將包含從索引1到索引60的數(shù)據(jù)集的值（60個(gè)值）以此類推。

“ y_train ”數(shù)據(jù)集將包含第一個(gè)列在索引60處的第61個(gè)值，第二個(gè)列在索引61處的第62個(gè)值，以此類推。

#Create the scaled training data settrain_data = scaled_data[0:training_data_len  , : ]#Split the data into x_train and y_train data setsx_train=[]y_train = []for i in range(60,len(train_data)):    x_train.append(train_data[i-60:i,0])    y_train.append(train_data[i,0])

現(xiàn)在將獨(dú)立的訓(xùn)練數(shù)據(jù)集“ x_train ”和從屬的訓(xùn)練數(shù)據(jù)集“ y_train ”轉(zhuǎn)換為numpy數(shù)組，以便將它們用于訓(xùn)練LSTM模型。

#Convert x_train and y_train to numpy arraysx_train, y_train = np.array(x_train), np.array(y_train)

將數(shù)據(jù)重構(gòu)為3維格式，形式為[ 樣本數(shù)量、時(shí)間步長、特征數(shù)量]。LSTM模型期望使用3維數(shù)據(jù)集。

#Reshape the data into the shape accepted by the LSTMx_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1))

建立LSTM模型，使其具有兩個(gè)包含50個(gè)神經(jīng)元的LSTM層和兩個(gè)密集層，一個(gè)包含25個(gè)神經(jīng)元，另一個(gè)包含1個(gè)神經(jīng)元。

#Build the LSTM network modelmodel = Sequential()model.add(LSTM(units=50, return_sequences=True,input_shape=(x_train.shape[1],1)))model.add(LSTM(units=50, return_sequences=False))model.add(Dense(units=25))model.add(Dense(units=1))

使用均方誤差（MSE）損失函數(shù)和adam優(yōu)化器編譯模型。

#Compile the modelmodel.compile(optimizer='adam', loss='mean_squared_error')

使用訓(xùn)練數(shù)據(jù)集訓(xùn)練模型。注意，fit是train的另一個(gè)名字。Batch_size是單個(gè)批處理中存在的訓(xùn)練示例的總數(shù)，epoch是整個(gè)數(shù)據(jù)集通過神經(jīng)網(wǎng)絡(luò)向前和向后傳遞時(shí)的迭代次數(shù)。

#Train the modelmodel.fit(x_train, y_train, batch_size=1, epochs=1)

創(chuàng)建一個(gè)測試數(shù)據(jù)集。

#Test data settest_data = scaled_data[training_data_len - 60: , : ]#Create the x_test and y_test data setsx_test = []y_test =  dataset[training_data_len : , : ] #Get all of the rows from index 1603 to the rest and all of the columns (in this case it's only column 'Close'), so 2003 - 1603 = 400 rows of datafor i in range(60,len(test_data)):    x_test.append(test_data[i-60:i,0])

然后將獨(dú)立的測試數(shù)據(jù)集“ x_test ”轉(zhuǎn)換為numpy數(shù)組，以便可以將其用于測試LSTM模型。

#Convert x_test to a numpy array x_test = np.array(x_test)

將數(shù)據(jù)重構(gòu)為3維格式，形式為[ 樣本數(shù)量、時(shí)間步長、特征數(shù)量]。這需要完成，因?yàn)長STM模型需要一個(gè)3維數(shù)據(jù)集。

#Reshape the data into the shape accepted by the LSTMx_test = np.reshape(x_test, (x_test.shape[0],x_test.shape[1],1))

現(xiàn)在，使用測試數(shù)據(jù)從模型中獲得預(yù)測值。

#Getting the models predicted price valuespredictions = model.predict(x_test) predictions = scaler.inverse_transform(predictions)#Undo scaling

獲取均方根誤差（RMSE），這是衡量模型準(zhǔn)確性的一個(gè)很好的方法。值為0表示模型預(yù)測值與測試數(shù)據(jù)集中的實(shí)際值完全匹配。

值越低，模型執(zhí)行的越好。但是通常最好也使用其他指標(biāo)來真正了解模型的執(zhí)行情況。

#Calculate/Get the value of RMSErmse=np.sqrt(np.mean((predictions- y_test)**2))rmse

6.70350807645975---RMSE值

讓我們繪制和可視化數(shù)據(jù)。

#Plot/Create the data for the graphtrain = data[:training_data_len]valid = data[training_data_len:]valid['Predictions'] = predictions#Visualize the dataplt.figure(figsize=(16,8))plt.title('Model')plt.xlabel('Date', fontsize=18)plt.ylabel('Close Price USD ($)', fontsize=18)plt.plot(train['Close'])plt.plot(valid[['Close', 'Predictions']])plt.legend(['Train', 'Val', 'Predictions'], loc='lower right')plt.show()

使用Python和機(jī)器學(xué)習(xí)進(jìn)行股價(jià)預(yù)測

顯示訓(xùn)練（藍(lán)色），實(shí)際（紅色）和預(yù)測（黃色）價(jià)格的圖表。

顯示實(shí)際和預(yù)測的價(jià)格。

#Show the valid and predicted pricesvalid

使用Python和機(jī)器學(xué)習(xí)進(jìn)行股價(jià)預(yù)測

實(shí)際（收盤價(jià)）和預(yù)測價(jià)格的值。

我想進(jìn)一步測試模型，以獲取Apple Inc.在2019年12月23日的預(yù)計(jì)收盤價(jià)。

因此，我將獲得報(bào)價(jià)，將數(shù)據(jù)轉(zhuǎn)換為僅包含收盤價(jià)的數(shù)組。然后，我將獲得最近60天的收盤價(jià)，并將數(shù)據(jù)縮放為介于0和1之間（含0和1）的值。

之后，我將創(chuàng)建一個(gè)空列表并將過去60天的價(jià)格附加到該列表中，然后將其轉(zhuǎn)換為numpy數(shù)組并重塑形狀，以便可以將數(shù)據(jù)輸入到模型中。

最后，我將數(shù)據(jù)輸入模型，得到預(yù)測的價(jià)格。

#Get the quoteapple_quote = web.DataReader('AAPL', data_source='yahoo', start='2012-01-01', end='2019-12-20')#Create a new dataframenew_df = apple_quote.filter(['Close'])#Get teh last 60 day closing price last_60_days = new_df[-60:].values#Scale the data to be values between 0 and 1last_60_days_scaled = scaler.transform(last_60_days)#Create an empty listX_test = []#Append teh past 60 daysX_test.append(last_60_days_scaled)#Convert the X_test data set to a numpy arrayX_test = np.array(X_test)#Reshape the dataX_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))#Get the predicted scaled pricepred_price = model.predict(X_test)#undo the scaling pred_price = scaler.inverse_transform(pred_price)print(pred_price)

[[269.60187]]----2019/12/23的預(yù)測價(jià)格

現(xiàn)在，讓我們看看當(dāng)天的實(shí)際價(jià)格是多少。

#Get the quoteapple_quote2 = web.DataReader('AAPL', data_source='yahoo', start='2019-12-23', end='2019-12-24')print(apple_quote2['Close'])

使用Python和機(jī)器學(xué)習(xí)進(jìn)行股價(jià)預(yù)測

2019年12月23日的實(shí)際價(jià)格

總結(jié)

本文我們使用LSTM來預(yù)測蘋果公司的股票價(jià)格，由于我們的均方根誤差值過大，影響了我們最后的預(yù)測，不過這個(gè)并不是很重要，我們不僅需要一個(gè)模型來預(yù)測，有時(shí)我們可能會需要使用很多的模型來預(yù)測同一個(gè)問題，這樣子可以優(yōu)選出更好的模型，來為我們服務(wù)。

本站是提供個(gè)人知識管理的網(wǎng)絡(luò)存儲空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自：追夢文庫 > 《技術(shù)分析》

舉報(bào)/認(rèn)領(lǐng)