Since you are new, let us show you how Trendy works.

Close-button

Immunisation Optimisation: When should I get a flu shot?

  • Created by: Hugo Carr
  • Latest result: Plot created
  • Created on: 22 Sep 2012

This plot takes in flu indicator information from Google and then fits a linear model to the data to predict next week's value. This is done by taking the values from the previous week, month and year.

Link: Google Flu Trends

N.B. This plot does not actually use any trends from Trendy, but reads in data from an URL provided by Google Flu Trends. The data is then parsed as a commas separated variable spreadsheet with textscan.

Plot Image
%% Initialize variables.
fluData = urlread('http://www.google.org/flutrends/data.txt');
delimiter = ',';
startRow = 10;

%% Read columns of data as strings:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';

%% Read columns of data according to format string.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
dataArray = textscan(fluData, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);

%% Convert the contents of columns containing numeric strings to numbers.
% Replace non-numeric strings with NaN.
raw = [dataArray{:,1:end-1}];
numericData = NaN(size(dataArray{1},1),size(dataArray,2));

for col=[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]
    % Converts strings in the input cell array to numbers. Replaced non-numeric
    % strings with NaN.
    rawData = dataArray{col};
    for row=1:size(rawData, 1);
        % Create a regular expression to detect and remove non-numeric prefixes and
        % suffixes.
        regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
        try
            result = regexp(rawData{row}, regexstr, 'names');
            numbers = result.numbers;
            
            % Detected commas in non-thousand locations.
            invalidThousandsSeparator = false;
            if any(numbers==',');
                thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
                if isempty(regexp(thousandsRegExp, ',', 'once'));
                    numbers = NaN;
                    invalidThousandsSeparator = true;
                end
            end
            % Convert numeric strings to numbers.
            if ~invalidThousandsSeparator;
                numbers = textscan(strrep(numbers, ',', ''), '%f');
                numericData(row, col) = numbers{1};
                raw{row, col} = numbers{1};
            end
        catch me
        end
    end
end

% Convert the contents of column with dates to serial date numbers using
% date format string (datenum).
for row=1:size(rawData, 1);
    try
        numericData(row, 1) = datenum(dataArray{1}{row}, 'yyyy-mm-dd');
        raw{row, 1} = numericData(row, 1);
    catch me
    end
end


%% Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells

%% Allocate imported array to column variable names
Date = cell2mat(raw(:, 1));
UnitedStates = cell2mat(raw(:, 28));

%% Clear temporary variables
clearvars filename delimiter startRow formatSpec fileID dataArray ans raw numericData col rawData row regexstr result numbers invalidThousandsSeparator thousandsRegExp me row me R;

%% Prepare predictors
Date = Date(~isnan(UnitedStates));
UnitedStates = UnitedStates(~isnan(UnitedStates));

Lag1Days = [NaN; UnitedStates(1:end-1)];
Lag4Weeks = [nan(4,1); UnitedStates(1:end-4)];
Lag1Year = [nan(52,1); UnitedStates(1:end-52)];

%% Linear Model fit
predictors = [Lag1Days Lag4Weeks Lag1Year];

Y = UnitedStates(53:end, :);
x = predictors(53:end, :);
x = [x ones(size(x,1), 1)];
coeff = x\Y;

%% Plot
fitDates = Date(53:end);
plot(fitDates, Y, '-', 'LineWidth', 2, 'Color', [1, 0.5, 0.5]);
hold all
plot(fitDates, x*coeff, 'k-');

nextWeekPredictors = [Y(end) Y(end-4) Y(end-52) 1];
prediction = nextWeekPredictors * coeff;
plot(fitDates(end)+7, prediction, 'rd', 'MarkerSize', 4, 'MarkerFaceColor', 'r')

title('Prediction for flu in United States for next week');
legend({'Flu Indicators', 'Previous Predictions', 'Next Week''s prediction'}, 'Location', 'NorthWest')

datetick
xlim([fitDates(104) fitDates(end)+56])
Tags:

    Add Tags


    0 comments