MATLAB Examples

Commodities Trading with MATLAB - Importing and cleaning data

Importing data from a variety of sources and aligning / cleaning up the data consumes a significant portion of an analyst workflow. It can be challenging to align and synchronize data from multiple sources, and to transform the data into multiple useful formats, e.g. rolling up daily data into weekly and monthly data.

This script demonstrates one possible way to automate this workflow; commodity data is stored across multiple data sources: as a group of CSV files in the .\Datasets folder, as an SQLite database (ohlcdata.db) in the current folder, as well as index data from Yahoo! Finance . The script figures out what data to retrieve using the CommodityMetadata spreadsheet in the current folder (CommodityMetadata.xlsx), and then connects to the appropriate data source to retrieve the data. Once the data has been retrieved, the next step in the workflow is to align the data for each of the commodities to the same starting and ending dates, after which we can compute data sets on a weekly and monthly basis.

Finally, once the data has been retrieved, aligned and transformed, we save it into a MAT file for easy access later.

Contents

1. Read commodity metadata and initialize other boring housekeeping

In this section, commodity metadata is imported into the workspace, and literals are setup for use later. We also setup the connection parameters for the local SQLite database that we connect to.

clc;clear
ImportCommodityMetadata;
SetupLiterals;
SetupDBParameters;

2. Get all data into workspace

In this section, we iterate through the commodity metadata information in order to figure out which data source to access to retrieve that particular data. We set up separate struct arrays for commodities data and equities data, and add the retrieved data to the corresponding struct as a dataset, along with its metadata information.

fprintf('Getting data into workspace.\n');

DataContainer = struct;
EquitiesContainer = struct;

for i = 1:length(CommodityMetadata)
    symbol = CommodityMetadata.Symbol{i};
    source = CommodityMetadata.Source{i};

    switch source
        case 'CSV' % Import data from CSV files
            DataContainer.(symbol).Metadata = CommodityMetadata(i,:);
            ReadAllContractsFromCSV;
        case 'Database' % Import data from SQLite database
            DataContainer.(symbol).Metadata = CommodityMetadata(i,:);
            ReadAllContractsFromDB;
        case 'Yahoo' % Import data from Yahoo! Finance
            EquitiesContainer.(symbol) = currDataset;
            ReadDataFromYahoo;
    end
end
Getting data into workspace.
Reading from CSV: .\Datasets\CL1.csv
Reading from CSV: .\Datasets\CL2.csv
Reading from CSV: .\Datasets\CL4.csv
Reading from CSV: .\Datasets\BRN1.csv
Reading from CSV: .\Datasets\BRN2.csv
Reading from CSV: .\Datasets\BRN4.csv
Reading from CSV: .\Datasets\HO1.csv
Reading from CSV: .\Datasets\HO2.csv
Reading from CSV: .\Datasets\HO4.csv
Reading from CSV: .\Datasets\NG1.csv
Reading from CSV: .\Datasets\NG2.csv
Reading from CSV: .\Datasets\NG4.csv
Reading from CSV: .\Datasets\XRB1.csv
Reading from CSV: .\Datasets\XRB2.csv
Reading from CSV: .\Datasets\XRB4.csv
Reading from CSV: .\Datasets\GC1.csv
Reading from CSV: .\Datasets\GC2.csv
Reading from CSV: .\Datasets\GC4.csv
Reading from CSV: .\Datasets\HG1.csv
Reading from CSV: .\Datasets\HG2.csv
Reading from CSV: .\Datasets\HG4.csv
Reading from CSV: .\Datasets\SI1.csv
Reading from CSV: .\Datasets\SI2.csv
Reading from CSV: .\Datasets\SI4.csv
Reading from CSV: .\Datasets\C1.csv
Reading from CSV: .\Datasets\C2.csv
Reading from CSV: .\Datasets\C4.csv
Reading from CSV: .\Datasets\S1.csv
Reading from CSV: .\Datasets\S2.csv
Reading from CSV: .\Datasets\S4.csv
Reading from CSV: .\Datasets\BO1.csv
Reading from CSV: .\Datasets\BO2.csv
Reading from CSV: .\Datasets\BO4.csv
Reading from CSV: .\Datasets\W1.csv
Reading from CSV: .\Datasets\W2.csv
Reading from CSV: .\Datasets\W4.csv
Reading from CSV: .\Datasets\SB1.csv
Reading from CSV: .\Datasets\SB2.csv
Reading from CSV: .\Datasets\SB4.csv
Reading from CSV: .\Datasets\KC1.csv
Reading from CSV: .\Datasets\KC2.csv
Reading from CSV: .\Datasets\KC4.csv
Reading from CSV: .\Datasets\CC1.csv
Reading from CSV: .\Datasets\CC2.csv
Reading from CSV: .\Datasets\CC4.csv
Reading from CSV: .\Datasets\CT1.csv
Reading from CSV: .\Datasets\CT2.csv
Reading from CSV: .\Datasets\CT4.csv
Database: LH.1
Database: LH.2
Database: LH.4
Reading from CSV: .\Datasets\LC1.csv
Reading from CSV: .\Datasets\LC2.csv
Reading from CSV: .\Datasets\LC4.csv
Yahoo: SPY

4. Align the data to the same starting and ending dates

The retrieved data can have different starting and ending dates; in this section, we call a custom filtering function that aligns every dataset to the same start and end dates.

fprintf('Aligning data to same starting and ending dates.\n');
DataContainer = FilterByDate(DataContainer,FirstDay,LastDay);
Aligning data to same starting and ending dates.

5. Create monthly and weekly data containers

For parts of our analysis, we will need to "roll up" daily OHLC data into weekly and monthly formats. In this section, we call a custom filtering function that performs this task easily.

fprintf('Computing monthly and weekly OHLC data containers.\n');
DataContainerWeekly=RollupDailyData(DataContainer,'w');
DataContainerMonthly=RollupDailyData(DataContainer,'m');
Computing monthly and weekly OHLC data containers.

6. Divide into training and testing datasets

Finally, in order to protect ourselves against the worst excesses of data mining, we will split our data into separate training and test sets. Our test set will only be used at the very end of our analysis workflow.

fprintf('Creating training and test data containers.\n');
LastDayOfTrainingSet='12/31/2005';
SetupTrainingAndTestSets;
Creating training and test data containers.

7. Perform book-keeping and save clean data into MAT file

Once we are done with importing and cleaning up our data, we save everything into a MAT file (StageA.mat) for access by other scripts.

ClearBunchOfVariables;
save('StageA')