Chaotic systems are highly nonlinear and extremely sensitive to initial conditions, making them notoriously unpredictable: Despite intense interest in the future behavior of financial markets, weather patterns, seismic movements, and similarly chaotic phenomena, researchers have found it difficult to generate accurate long-range predictions from measured time series data.
We and our colleagues have developed a system for improving the accuracy of long-term forecasts of chaotic time series. Our system uses a self-organizing map (SOM) neural network to select and combine predictive models. Designed, tuned, and validated with MATLAB® and Neural Network Toolbox™, this system analyzes the time series data to identify the best predictive models to use for various portions of the data and then uses the SOM to create an ensemble solution that outperforms any individual model.
Dr. Gómez-Gil had been using MATLAB for several years before she began working on chaotic time series prediction, and was well aware of its versatility. This versatility proved to be important for her current work, which involves not only neural networks but also statistical analysis and signal processing. Several other factors contributed to our decision to use MATLAB. We’ve found that students learn MATLAB quickly, which means that even complete beginners can rapidly come up to speed on our research projects. Most importantly, MATLAB makes it easy to experiment with and evaluate new ideas, algorithms, and models.
Preprocessing Time Series Data and Generating Basic Predictive Models
Our primary data preprocessing tasks were noise reduction and data reduction. To filter and reduce the data, we applied a number of signal processing techniques, including fast Fourier transforms, signal smoothing, moving averages, and Gaussian noise filters.
We incorporated two basic types of predictive models into our system:
- Autoregressive integrated moving average (ARIMA) models, which use a mean of past observations and errors for forecasting
- Nonlinear autoregressive exogenous (NARX) models, which use a feed-forward neural network to find approximate future time series values based on previous values
We varied the parameters of these models to create a more diverse pool of models for our self-organizing map. For the ARIMA models, we varied the number of autoregressive terms and lagged forecast errors as well as nonseasonal difference and seasonality parameters to generate 54 variants. We generated 27 different NARX models by varying the number of delay neurons, the number of hidden layer neurons, and the training algorithms. We used three training algorithms from Neural Network Toolbox: Bayesian regularization backpropagation, conjugate gradient backpropagation with Fletcher-Reeves updates, and Levenberg-Marquardt backpropagation.
Case Study: Predicting ATM Withdrawals
We validated our system against several chaotic time series data sets. These included measured values, such as airline passengers in transit, births and accidental deaths in the U.S., sunspots, and automatic teller machine (ATM) withdrawals, as well as values generated using well-known chaotic systems, equations, and solutions, such as Mackey-Glass, the Lorenz attractor, and the Hénon map.
The ATM data set provides a useful illustration of our approach. The values in this time series represent the total amount withdrawn from a set of individual ATM machines. The data includes no information on factors that might influence withdrawal patterns, such as the day of the week, proximate holidays, or the weather at the time of the withdrawal. Our goal was to predict the total daily withdrawals for 56 consecutive days based solely on withdrawal totals from the preceding 700+ days (Figure 1).
First, our system employs a strategy called temporal validated combination, which splits the prediction horizon into short-term, medium-term, and long-term windows to account for the dynamic behavior changes in chaotic time series that occur over varying time scales. For each of these windows, the system uses a Monte Carlo cross-validation process to evaluate each NARX and ARIMA predictive model. It computes two metrics for each model: performance and representative error. The SOM neural network then automatically organizes clusters of models, grouped by their prediction skills. Finally, the system selects high-performing models from different groups to create a diverse ensemble. The results of the ensemble are shown in Figure 2.
In addition to predicting ATM withdrawals, the system can be used with a completely different data set from another domain. It will automatically identify and combine the best underlying predictive models for that time series to maximize prediction accuracy.
Current Projects and Future Plans
As we refine and enhance the chaotic time series prediction system, we continue to apply it in new domains. Dr. Gomez-Gil team is about to publish the results of a study in which we used the system to forecast exchange rates for the U.S. dollar and Mexican peso. We plan to turn the MATLAB system we developed into an application that provides an interface to the system’s core capabilities, to make it easier for other researchers to use.
National Institute of Astrophysics, Optics, and Electronics, Mexico is among the nearly 1000 universities worldwide that provide campus-wide access to MATLAB and Simulink. With the Total Academic Headcount (TAH) License, researchers, faculty, and students have access to a common configuration of products, at the latest release level, for use anywhere—in the classroom, at home, in the lab, or in the field.