AI modeling techniques
In the following sections, we will introduce the Autoregressive Integrated Moving Average (ARIMA), the most traditional type of forecasting model. We will also introduce a neural network model. ARIMA is a class of statistical models that is used to forecast a time series using past values. ARIMA is an acronym for the following:
- AR (autoregression): Autoregression is a process that takes previous data values as inputs, applies this to the regression equation, and generates resultant prediction-based data values.
- I (integrated): ARIMA uses an integrated approach by using differences in observations to make the time series equally spaced. This is done by subtracting the observation from an observation on a previous step or time value.
- MA (moving average): A model that uses the observation and the residual error applied to past observations.
Introducing the time series model – ARIMA
For this project, we will fit data into a time series model calledARIMA. ARIMA is a specific type of time series model in statistics, which is commonly used to predict data points in the future, with parameters on autoregressive terms (p), non-seasonal differences (d), and lagged terms (q).
This ARIMA model belongs to parametric modeling—models that are fitted by known parameters. Normally, we classify this type of model as a statistical model because we need to make assumptions about what the data looks like. This is considerably different for wider machine learning models that do not have any preset assumptions about what the data looks like.
However, in a real banking scenario, a statistical approach is still prevalent among the econometrics, quantitative finance, and risk management domains. This approach works when we have a handful of data points, for example, around 30 to 100 data points. However, when we have a wealth of data, this approach may not fare as well as other machine learning approaches.
ARIMA assumes that there is a stationary trend that we can describe. The autoregressive terms, p and d, are each significant in their own way:
- pmeans the number of past period(s) that is affecting the current period value (for example,p = 1: Y current period = Y current -1 period * coefficient + constant).
- Non-seasonal difference (d) refers to the number of past periods progression impacting the current period values (for example,d = 1: the difference betweenY now versusYin the past period).
- Lagged terms (q) means the number of the past period's forecast errors impacting the current period values.
Consider an example in whichq = 1: Yimpacted by an error in the t - 1period—here, error refers to the difference between the actual and predicted values.
In a nutshell, ARIMA specifies how the previous period's coefficient, constant, error terms, and even predicted values impact the current predicted values. It sounds scary, but it is, in fact, very understandable.
After the model is fit, it will be asked to make a prediction and be compared against the actual testing data. The deviation of the prediction from the testing data will record the accuracy of the model. We will use a metric called theMean Square Error(MSE) in this chapter to determine the fitness of the model to the data.
Introducing neural networks – the secret sauce for accurately predicting demand
We may have a good data source, but we should not forget that we also need a smart algorithm. You may have read about neural networks thousands of times, but let's look at a short explanation before we use them extensively throughout the book. A neural network is an attempt by a computer to mimic how our brain works—it works by connecting different computing points/neurons with different settings.
Architecture-wise, it looks like layers of formulas. Those of you reading this book probably have some background in algebra, and can see how the interested outcomeYis related toX, the variable, with b being the coefficient and c being the constant term:
Yis what we wish to predict on the left-hand side; on the right-hand side,bX + care the forms that describe how the feature (X) is related toY. In other words,Yis the output, whileXis the input. The neural network describes the relationship between the input and the output.
Suppose that Zis what we want to predict:
It seems that the formulas are linked:
This is the simplest form of a neural network, with one input layer, one hidden layer, and one output layer. Each of the layers has one neuron (point).
Backpropagation
There are other concepts in neural networks, such as backpropagation. This refers to the feedback mechanism that fine-tunes the neural network's parameters, which mostly connect neurons within the network (except when it is a constant parameter at the layer). It works by comparing the output at output layerZ(predicted) versus the actual value of Z (actual). The wider the gap between actual and predicted, the more adjustment of b, c, d, and e is needed.
Understanding how gaps are measured is also an important piece of knowledge—this is called metrics and will be addressed in Chapter 3, Using Features and Reinforcement Learning to Automate Bank Financing.
Neural network architecture
Architecture concerns the layers and number of neurons at each layer, as well as how the neurons are interconnected in a neural network. The input layer is represented as features. The output layer can be a single number or a series of numbers (called a vector), which generates a number ranging from 0 to 1 or a continuous value—subject to the problem domain.
For example, to understand the structure of a neural network, we can project that it will look like the following screenshot from TensorFlow Playground (https://playground.tensorflow.org/), which is the visualization of another network with the same hidden layers—three layers with a size of 6:
Using epochs for neural network training
Besides the design of the neural network, we also utilize theepochparameter, which indicates the number of times the same set of data is fed to the neural network.
We need to increase the number of epochs if we do not have enough data to satisfy the number of parameters in neural networks. Given that we have X parameters in the neural network, we need at least X data points to be fed to the network. Unfortunately, if our data point is only X/2, we need to set epoch to 2 in order to make sure that we can feed X data points (all of them are fed twice) to the network.
Scaling
Before feeding the features to the machine learning model, we will normalize the input features of different magnitudes to be of the same magnitude. For example, the price and volume of goods are different types of numeric data. The scaling process will make sure that both of them are scaled to the same range, from 0 to 1. In classical statistical modeling processes, this step is very important to avoid a particular feature of bigger scales that dominate the influence on the prediction.
Sampling
Apart from data column-level scaling, we also need to pay attention to the sampling bias of the model. Normally, we will set aside a portion of the data unseen by the machine while it is training and learning on another set of data—which is called a training set. Later on, the testing set (which is the dataset kept aside) will be used to check against the prediction made by the model.