Forecasting Sales with Neural Networks
We have 5 years of sales history from 10 stores selling 50 products. Our goal is to predict the future daily sales of each product.
The data originally came from the Kaggle Store Item Demand Forecasting Challenge. And the original goal was to predict sales over the following 3 months. But we are going to blow that goal out of the water by predicting daily sales over the next 2 years.
Deep Learning Neural Networks are designed to identify patterns in data. But Neural Networks will perform terribly when the data is not tagged with the right seasonality triggers. That is why most sales forecasting is done with traditional Time Series analysis.
But done right, Neural Networks have advantages over the old Time Series analysis. All Time Series sales must be attributed to either a daily, monthly, or annual trend. Neural Networks make no such rigid distinction. Furthermore, while a Time Series analysis must be manually pieced together, a Neural Network automatically returns a single model.
KNIME Installation Requirements:
This data process was first developed in Python then ported over to KNIME. Both Python and KNIME are required.
- KNIME Extension: KNIME Deep Learning – Keras Integration (Labs)
- Python with Keras: https://www.knime.com/deeplearning
Step 1 – Load Data
Clean Training Data and Test Data:
- Convert the String Date to a Date-Time field
- Calculate the ‘Day of Year’ as the number of days since the start of each year
- Remove the extra day (#366) from Leap Years
- Rename the Columns with Proper formatted names
Data: 5 years of store-item sales data for 50 different items from 10 different stores
Step 2 – Check Data
Visualize Training Data:
- Look for trends in the data
- Distinguish Year-Over-Year Growth vs. Seasonality
Exploration shows that 80% of Products have a sales correlation of > 0.95 and 100% of Stores have a sales correlation of > 0.99 !! In other words, sale of all items across all stores are highly correlated. This indicates that sales are driven more by seasonal foot-traffic than by product characteristics. We used this finding to aggregate item sales across all stores to eliminate random noise from the input data.
Step 3 – Annual Growth
Remove Explainable Trends (deprecated):
Data Scientists typically recommend more traditional time-series analysis be used to model seasonality. Or, if neural networks were necessary, then to first remove all the explainable trends from the data.
You should strip every single explainable phenomenon out of the data and leave only the artifacts before you start modeling. [source]
In an earlier version of this workflow an attempt was made to remove explainable phenomenon including the annual growth trend. To this end, all sales data was rescaled to 2017 levels. Regression was used to calculate the growth rate.
But after further analysis this step proved unnecessary. Neural Networks can do a great job of identifying all trends provided the data is properly tagged. This step is not used and should be removed from the workflow.
Step 4 – Normalize Sales
Normalize Training Sales Data:
Node weights are randomly assigned when the tuning of a Neural Network first starts. Hence it is necessary to normalized all inputs to be of roughly the same order-of-magnitude. This step:
- Sums Sales by Store (to smooth out the random Store-to-Store variation),
- Calculates Max / Min / Mean / SD of Sales for all Items,
- Scales Historic Sales to Unit Distributions (Mean = 0.0, SD = 1.0).
Step 5 – Feature Enhancement
Tag Input Data:
There are three types of seasonality we are going to account for:
- Day of Week
- Monthly Season
- Long-Term Growth
Weekday sales are different from weekend sales, so we can one-hot encode the historic sales data with a day of the week tag. And to help the Neural Network identify the long-term growth rate, we can tag sales with the time since the beginning of the historic data.
But we cannot use these trivial methods to tag monthly seasonality.
We cannot one-hot encode each month as we lose the fact that March and April are adjacent (important if sales change in the spring). And we cannot use the numbers 1 through 12 to identify months as we lose the continuity between December and January.
Periodic Encoding solves these problems by informing the Neural Network not only the month, but which other months are nearby and how nearby.
Periodic Encoding works by applying some energy (between 0.0 and 1.0) to each of the 12-month input-nodes depending upon the distance the training day is from the month.
For example, if the historic sales data is from June-15 then the “June” energy would be highly concentrated, and the “June” node would be completely turned on (set to 1.0). The adjacent “May” and “July” nodes would also receive some energy (set to 0.6). But the more distant months would get much less.
Alternatively, if the sales data is from September-01 then the energy would be more distributed. Having just started, the September node would not yet be completely turned on (being set to only 0.9). And as August just finished, it would receive almost as much energy (set to 0.88).
Finally, to provide continuity, sales data from December would spill energy over into the January node.
Step 6 – Collect Input
Prepare Training Data for the Neural Networks:
Training Data is grouped into past 7-day and past 30-day collections for Training the Neural Networks:
- Lag the Last 30 days of Normalized Sales data
- Reverse Lag the Next 30 days of Sales Data
- Group Training Data into Batches
- Group all Training Data into a single Collection ready for the Neural Network
The final 30-day training set comprises of:
- Days Since Start (1) +
- Encoded Day of Week (7) +
- Month Node Energy (12) +
- Past 31 Days Sales (31)
Total Fields: 1+7+12+31 = 51 columns
Step 7 – Train Networks
Train the Neural Networks:
A variety of Neural Networks were trained to predict future sales, including:
- Feed Forward networks
- Recurrent Neural Networks with Long Short-Term Memory (LSTM)
- Concatenated Networks, and
- Convolutional Networks
The size of the training data was varied between:
- Past-7 days of sales data, and
- Past-30 days
The number of hidden layers and the number of training epochs were also varied.
In all, 7 Neural Networks were trained, with the best selected to predict future sales (in Step 8).
An example Neural Network was this Concatenated LSTM model (shown above).
To the top branch was passed the last 30 days of historic data (plus sales data from ‘today’ data – totaling 31 input nodes in all). The Long Short-Term Memory (LSTM) layer remembered previous time-series inputs so could more easily identify trends in the data.
To the bottom branch was passed the feature enhancements to the sales data, including the:
- Day of Week (to model intra-week trends)
- Periodic Encoded Month (to model intra-year seasonality)
- Days Since Start (to model long-term growth)
The modular Keras library made it easy to construct the model and tune it. But some neural networks used Python and were based upon the DL Python Network nodes.
The best model was based upon the research paper “LSTM Fully Convolutional Networks for Time Series Classification” (by Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Shun Chen). This model was then selected to predict the daily sales over the next two years.
Step 8 – Make Predictions
Make 2 Years of Future Sales Predictions:
The selected model was used to make predictions about future sales. Input data was rolled so that predictions were used to make even more remote predictions:
- Get last 31 days of sales data
- Append the next day of future unknown data – predict sales for this day
- Drop oldest historic day of sales data
- Jump to #2 and repeat
Step 9 – Final Analysis
The “Day of the Week” sales trend shows two things:
First, that the selected Neural Network is correctly predicting long-term annual growth.
The historic sales data from 2013 to 2017 is represented by the bottom 5 curves in the chart. The predicted sales for 2018 and 2019 are represented by the top 2 curves. The upward shift indicates that there has been annual sales growth. Regressions of both the historic and predicted data agree that total sales are growing at about 5 units per day.
Second, that the “Day of Week” sales are consistent. Monday sales are typically the lowest, with sales growing each day through to a peek on Sunday. Historic sales and predicted sales agree on the shape of this curve.
The “Annual Sales” trend shows the seasonality of sales throughout the year. Again, the shape of the bottom 5 historic sales curves generally agrees with the shape of the top 2 predicted sales curves.
However, the predicted sales curves are smooth while the historic sales curves exhibit more of a step function shape. Upon closer inspection, the sales plateaus in the historic data seem to occur during the last week of every month. This may be due to stores regularly running Out-of-Stock.
The historic sales data was not tagged to catch this trend. If a Week-of-the-Month tag was used, then it is likely that this step function shape would have also been detected by the Neural Network.