title | author | Update | output | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HOW TO USE TENSORFLOW 2.X TO PREDICT BIKE RENTAL USING ANNs (Regression) |
Dr LEBEDE Ngartera |
1--07--2024 |
|
- SuperDataScience
- Laboratory of Artificial Intelligence and Decision Support (LIAAD), University of Porto INESC Porto, Campus da FEUP Rua Dr. Roberto Frias, 378 4200 - 465 Porto, Portugal
- instant: record index
- dteday : date
- season : season (1:springer, 2:summer, 3:fall, 4:winter)
- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- hr : hour (0 to 23)
- holiday : wether day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule)
- weekday : day of the week
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0. weathersit :
a) Clear, Few clouds, Partly cloudy
b) Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
c) Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
d) Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp : Normalized temperature in Celsius. The values are divided to 41 (max)
- hum: Normalized humidity. The values are divided to 100 (max)
- windspeed: Normalized wind speed. The values are divided to 67 (max)
- casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and registered
We will need to mount your drive using the following commands:
For more information regarding mounting, please check this out: https://stackoverflow.com/questions/46986398/import-data-into-google-colaboratory
from google.colab import drive drive.mount('/content/drive')
Mounted at /content/drive We have to include the full link to the csv file containing your dataset
bike = pd.read_csv('/content/drive/My Drive/bike_sharing_daily.csv')
bike
bike.info()
sns.heatmap(bike.isnull())
bike = bike.drop(labels = ['instant'], axis = 1)
bike.index = pd.DatetimeIndex(bike.dteday)
bike
bike['cnt'].asfreq('W').plot(linewidth = 3)
plt.title('Bike Usage Per week')
plt.xlabel('Week')
plt.ylabel('Bike Rental')
bike['cnt'].asfreq('M').plot(linewidth = 3)
plt.title('Bike Usage Per Month')
plt.xlabel('Month')
plt.ylabel('Bike Rental')
bike['cnt'].asfreq('Q').plot(linewidth = 3)
plt.title('Bike Usage Per Quarter')
plt.xlabel('Quarter')
plt.ylabel('Bike Rental')
sns.pairplot(bike)
X_numerical = bike[['temp', 'hum', 'windspeed', 'cnt']]
X_numerical
sns.pairplot(X_numerical)
sns.heatmap(X_numerical.corr(), annot =True)
X_cat = bike[['season', 'yr', 'mnth', 'holiday', 'weekday', 'workingday', 'weathersit']]
X_cat
from sklearn.preprocessing import OneHotEncoder onehotencoder = OneHotEncoder() X_cat = onehotencoder.fit_transform(X_cat).toarray()
X_cat.shape
(731, 32)
X_cat = pd.DataFrame(X_cat)
X_numerical
X_numerical = X_numerical.reset_index()
X_all = pd.concat([X_cat, X_numerical], axis = 1)
X_all account_circle
X_all = X_all.drop('dteday', axis = 1)
X_all
X = X_all.iloc[:, :-1].values y = X_all.iloc[:, -1:].values [ ] 1 X.shape
(731, 35)
y.shape
(731, 1)
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() y = scaler.fit_transform(y)
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
X_train.shape
(584, 35)
X_test.shape
(147, 35)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=100, activation='relu', input_shape=(35, )))
model.add(tf.keras.layers.Dense(units=100, activation='relu'))
model.add(tf.keras.layers.Dense(units=100, activation='relu'))
model.add(tf.keras.layers.Dense(units=1, activation='linear'))
model.summary()
Model: "sequential"
dense (Dense) (None, 100) 3600
dense_1 (Dense) (None, 100) 10100
dense_2 (Dense) (None, 100) 10100
dense_3 (Dense) (None, 1) 101
=================================================================
Total params: 23,901
Trainable params: 23,901
Non-trainable params: 0
model.compile(optimizer='Adam', loss='mean_squared_error')
epochs_hist = model.fit(X_train, y_train, epochs = 20, batch_size = 50, validation_split = 0.2)
Epoch 1/20 10/10 [==============================] - 1s 24ms/step - loss: 0.1408 - val_loss: 0.0675 Epoch 2/20 10/10 [==============================] - 0s 5ms/step - loss: 0.0410 - val_loss: 0.0331 Epoch 3/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0199 - val_loss: 0.0213 Epoch 4/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0135 - val_loss: 0.0176 Epoch 5/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0104 - val_loss: 0.0182 Epoch 6/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0095 - val_loss: 0.0169 Epoch 7/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0076 - val_loss: 0.0156 Epoch 8/20 10/10 [==============================] - 0s 6ms/step - loss: 0.0064 - val_loss: 0.0154 Epoch 9/20 10/10 [==============================] - 0s 8ms/step - loss: 0.0057 - val_loss: 0.0159 Epoch 10/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0053 - val_loss: 0.0148 Epoch 11/20 10/10 [==============================] - 0s 6ms/step - loss: 0.0049 - val_loss: 0.0144 Epoch 12/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0047 - val_loss: 0.0143 Epoch 13/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0043 - val_loss: 0.0143 Epoch 14/20 10/10 [==============================] - 0s 5ms/step - loss: 0.0042 - val_loss: 0.0147 Epoch 15/20 10/10 [==============================] - 0s 5ms/step - loss: 0.0042 - val_loss: 0.0149 Epoch 16/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0041 - val_loss: 0.0152 Epoch 17/20 10/10 [==============================] - 0s 5ms/step - loss: 0.0038 - val_loss: 0.0145 Epoch 18/20 10/10 [==============================] - 0s 5ms/step - loss: 0.0035 - val_loss: 0.0159 Epoch 19/20 10/10 [==============================] - 0s 5ms/step - loss: 0.0035 - val_loss: 0.0153 Epoch 20/20 10/10 [==============================] - 0s 7ms/step - loss: 0.0034 - val_loss: 0.0145
epochs_hist.history.keys()
dict_keys(['loss', 'val_loss'])
plt.plot(epochs_hist.history['loss'])
plt.plot(epochs_hist.history['val_loss'])
plt.title('Model Loss Progress During Training')
plt.xlabel('Epoch')
plt.ylabel('Training and Validation Loss')
plt.legend(['Training Loss', 'Validation Loss'])
y_predict = model.predict(X_test) plt.plot(y_test, y_predict, "^", color = 'g') plt.xlabel('Model Predictions') plt.ylabel('True Values')
y_predict_orig = scaler.inverse_transform(y_predict) y_test_orig = scaler.inverse_transform(y_test)
plt.plot(y_test_orig, y_predict_orig, "^", color = 'b') plt.xlabel('Model Predictions') plt.ylabel('True Values')
k = X_test.shape[1] n = len(X_test) n
147
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from math import sqrt
RMSE = float(format(np.sqrt(mean_squared_error(y_test_orig, y_predict_orig)),'.3f'))
MSE = mean_squared_error(y_test_orig, y_predict_orig)=
MAE = mean_absolute_error(y_test_orig, y_predict_orig)
r2 = r2_score(y_test_orig, y_predict_orig)
adj_r2 = 1-(1-r2)*(n-1)/(n-k-1)
RMSE = 855.646
MSE = 732130.1093405469
MAE = 614.8364324245323
R2 = 0.805684986705684
Adjusted R2 = 0.744414487018287