Introduction to RNN through Air Passengers Data Set
The main objective of this article is to provide an introduction to Recurrent Neural Network (RNN) by implementing it on a data-set using keras. Keras are powerful free open source Python library,which are mainly used for developing and evaluating deep learning models.
CONTENTS:
- Recurrent Neural Networks
1.1 Working of RNN
1.2 Types of Recurrent Neural Network
1.3 Gradient Problem in RNN
1.4 Long Short-Term Memory Networks(LSTM) - Implementation through Keras
2.1 Importing Libraries
2.2 Data collection and normalization
2.3 Creating RNN & add the LSTM layers and some dropout regularization
2.4 Training the model
2.5 Visualizing the predicted results - Summary
1. Recurrent Neural Networks
- Recurrent neural networks (RNN) are powerful artificial neural networks for modeling sequence data, such as time series or natural language.
- In this neural network, the output of the previous step becomes the input of the current step and so-on which makes them applicable to tasks such as handwriting or speech recognition, text processing, etc.
- RNN can also memorize previous inputs due to their internal memory.
- The nodes in different layers of the neural network are compressed to form a single layer of recurrent neural networks. A, B, and C are the parameters of the network.
- The below figure depicts the model of RNN :-
1.1 Working of RNN
x, y & z are the input, hidden & output layer respectively. A, B, and C are the network parameters used to improve the Model output. At any given time t, is the combination of x(t) and x(t-1) constitutes the current input . The output will be brought back to the network at any time to improve the performance of the model.
1.2 Types of Recurrent Neural Network
One to One RNN or Vanilla Neural Network: As the name suggests, it consists only a single input and a single output.
One to Many RNN : This type consist of a single input and multiple outputs. An example of this is the image caption.
Many to One RNN : These networks gets a series of inputs and creates a single output. Sentiment analysis is a prime example of this kind of network that can be categorized as expressing positive or negative emotions in a given sentence.
Many to Many RNN : This type takes a sequence of inputs and generates a sequence of outputs. Machine translation is one of the examples.
1.3 Gradient Problem in RNN
- There are certain gradient problems in the RNN which makes the network hard to train the model.
- Poor performance and bad accuracy are the major issues caused due to gradient problems.They are mainly of two types.
- Vanishing Gradients: The vanishing gradients are encountered when training the neural networks with back propagation and gradient-based learning methods. When the gradient is too small, the parameter updates will be negligible. This makes the study of long data sequences difficult.
- Exploding Gradients: If the slope tends to grow exponentially instead of decaying, while training a neural network then it is called as the Exploding Gradients. This problem occurs when large error gradients are collected, resulting in very large updates on the neural network model weights during the training process.
1.4 Long Short-Term Memory Networks(LSTM)
LSTMs are a special kind of Recurrent Neural Network, which prevents back-propagated errors from vanishing or exploding gradients. LSTM can learn tasks that require memories of events that occurred at thousands or even millions of specific time point.
In standard RNNs, there are repeating module which will have simple structure, like a single tanh layer. LSTMs also have a chain-like structure where four interacting layers are communicating extraordinarily instead of a single layer.
Working of LSTM:
The working of LSTM includes three steps.
Step 1: Decide the amount of past data to be remembered
The first step in the LSTM is to decide which information should be omitted from the cell in that particular time step. This is determined by sigmoid function which checks the previous state (ht-1) along with the current input xt and computes the function.
Step 2: Decide how much unit adds to the current state
In the second layer, there are two parts. One is the sigmoid function, that decides which values to let through (0 or 1) and the other is the tanh function which gives weight-age to the values which are passed, deciding their level of importance (-1 to 1).
Step 3: Decide which part of the current cell state forms the output
The third step is to decide what the output will be. First, we run a sigmoid layer, which decides what parts of the cell state make it to the output. Then, we put the cell state through tanh to push the values to be between -1 and 1 and multiply it by the output of the sigmoid gate.
2. Implementation through Keras
Keras are mainly used which is a powerful and easy-to-use free open source Python library for developing and evaluating deep learning models.
2.1 Importing Libraries
The necessary libraries are: keras.models, sklearn.processing, numpy and pandas.
2.2 Data collection and normalization
Data collection : The data set used in the implementation is Air passenger Data-Set which provides monthly totals of a US airline passengers from 1949 to 1960. The task is to predict the number of international airline passengers in units of 1,000. Data-set is available in the following link: https://www.kaggle.com/rakannimer/air-passengers
Data normalization : The data is now re scaled in the range 0 –1 for normalization purposes. For this the minmaxscaler function from sklearn library is used. Now, we have to split the available data into training and test data-sets, since the used data-set is time sensitive, the sequence is very important. Here, using the below code, we split 67% of the data into training and the rest as test data.
2.3 Creating RNN & add the LSTM layers and some dropout regularization.
The Long-Short Term Memory(LSTM) network expects the input data to be provided with a specific array structure. But our current data is not of that form. so we have to convert into into that form using the reshape() function from numpy library. Now, the LSTM network fits.
The network has a visible layer with 1 input, a hidden layer with 4 LSTM blocks or neurons, and an output layer that makes a single value prediction. Here, the batch size used is 1 and the default function used is sigmoid activation function.
2.4 Training the model
For training the model, we use model.fit() function mentioning the number of epochs and giving the training data and loss function as parameters. Running the code will display the epochs and loss associated with each epoch.
2.5 Visualizing the predicted results
Finally, we predict the model for both training and test datasets. The predictions need to be shifted to fit with the x-axis of the original data-set.
- Original data-set is shown in blue
- Prediction on training data-set is shown in red.
- Prediction on test data is shown in green.
3. Summary
In conclusion, we looked into the basics of a recurrent neural network (RNN), the different types of RNN, Gradient problems that arise and how these are rectified with LSTM and finally created a model of RNN using keras, trained it on Air Passenger data set and proved how the model works well for fitting both the training and the test datasets.
References :
- https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/
- https://towardsdatascience.com/recurrent-neural-networks-by-example-in-python-ffd204f99470
- https://ailabpage.com/2019/01/08/deep-learning-introduction-to-recurrent-neural-networks/
- https://youtu.be/y7qrilE-Zlc
Authors : Arya C.S. , Karthika , Murali Krishna