Daywise differences in array

Hello
I have the following time series
1982 5 1 3 25
1982 5 1 6 30
1982 5 1 12 35
1982 5 1 18 40
1982 5 2 0 45
1982 5 2 3 45
1982 5 2 6 50
1982 5 2 12 55
1982 5 2 18 55
1982 5 3 0 60
1982 5 3 3 65
1982 5 3 6 80
1982 5 3 12 90
1982 5 3 18 105
1982 5 4 0 115
1982 5 4 3 115
1982 5 4 6 115
1982 5 4 12 115
1982 5 4 18 115
1982 5 5 3 30
The first four columns show year, month, day and hour and the last column shows the 3-hourly rainfall. I wish to find out if the rainfall value at each 3-hourly time series is more than 30 mm within the next 24 hours. For example, the difference in rainfall value of 1982/5/1 at 3 hrs - 1982/5/2 at 3hrs should be greater than 30 mm. This I need to search for the entire series and locate the rows where the difference in rainfall values are more than 30 mm from its previous 24 hrour values. I am trying with caldays(1) but getting stuck for the end rows. Also, my approach ionvolves for loop, which is not efficient. Pls. help!

 Accepted Answer

dpb
dpb on 8 Oct 2024
Edited: dpb on 9 Oct 2024
R=[1982 5 1 3 25
1982 5 1 6 30
1982 5 1 12 35
1982 5 1 18 40
1982 5 2 0 45
1982 5 2 3 45
1982 5 2 6 50
1982 5 2 12 55
1982 5 2 18 55
1982 5 3 0 60
1982 5 3 3 65
1982 5 3 6 80
1982 5 3 12 90
1982 5 3 18 105
1982 5 4 0 115
1982 5 4 3 115
1982 5 4 6 115
1982 5 4 12 115
1982 5 4 18 115
1982 5 5 3 30];
ttR=timetable(datetime(R(:,1:3))+hours(R(:,4)),R(:,5),'VariableNames',{'Rainfall'})
ttR = 20x1 timetable
Time Rainfall ____________________ ________ 01-May-1982 03:00:00 25 01-May-1982 06:00:00 30 01-May-1982 12:00:00 35 01-May-1982 18:00:00 40 02-May-1982 00:00:00 45 02-May-1982 03:00:00 45 02-May-1982 06:00:00 50 02-May-1982 12:00:00 55 02-May-1982 18:00:00 55 03-May-1982 00:00:00 60 03-May-1982 03:00:00 65 03-May-1982 06:00:00 80 03-May-1982 12:00:00 90 03-May-1982 18:00:00 105 04-May-1982 00:00:00 115 04-May-1982 03:00:00 115
LastTime=ttR.Time(end);
t=[]; r=[];
for i=1:height(ttR)
nextDay=ttR.Time(i)+caldays(1);
%[ttR.Time(i) nextDay LastTime]
if nextDay>LastTime, break, end
% account for not accumulating globally as initially thought, Umar's solution
inDay=timerange(ttR.Time(i),nextDay); % indexing expression for 24hr period
%dRF=ttR.Rainfall(nextDay)-ttR.Rainfall(i); % amount difference at now+24hr
dRF=max(ttR.Rainfall(inDay))-ttR.Rainfall(i); % amount difference in 24hr accounting for rollover
%[ttR.Rainfall(i) ttR.Rainfall(nextDay) dRF];
if dRF>30
t=[t;ttR.Time(i)];
r=[r;dRF];
end
end
ttDRF=timetable(t,r,'VariableNames',{'24hr Rainfall'});
ttDRF.Properties.DimensionNames(1)={'Begin Date'};
ttDRF
ttDRF = 4x1 timetable
Begin Date 24hr Rainfall ____________________ _____________ 02-May-1982 18:00:00 35 03-May-1982 00:00:00 45 03-May-1982 03:00:00 50 03-May-1982 06:00:00 35
I couldn't think of a really clever way to use indexting but I doubt this will be too bad unless your data file is really huge....although I did use dynamic reallocation here rather than allocating space for the temporaries.

11 Comments

Hi @Poulomi,

To address your request for analyzing the rainfall data and determining whether the difference in rainfall values exceeds 30 mm within a 24-hour period, we need to modify the approach slightly. Your initial code iterates through each time point but only checks the next day's value, rather than considering all values within the previous 24 hours. This requires a more efficient method that avoids nested loops. You can utilize MATLAB’s capabilities with timetables to streamline this process. The following code implements a more efficient approach using logical indexing and vectorized operations:

% Input data as provided
R = [1982	5	1	3	25;
   1982	5	1	6	30;
   1982	5	1	12	35;
   1982	5	1	18	40;
   1982	5	2	0	45;
   1982	5	2	3	45;
   1982	5	2	6	50;
   1982	5	2	12	55;
   1982	5	2	18	55;
   1982	5	3	0	60;
   1982	5	3	3	65;
   1982	5	3	6	80;
   1982	5	3	12	90;
   1982	5	3	18	105;
   1982	5	4	0	115;
   1982	5	4	3	115;
   1982	5	4	6	115;
   1982	5	4	12	115;
   1982	5	4	18	115;
   1982	5	5    3   30];
% Create a timetable
ttR = timetable(datetime(R(:,1:3)) + hours(R(:,4)), R(:,5),
'VariableNames', 
{'Rainfall'});
% Initialize variables to hold results
t = [];
r = [];
% Loop through each time point
for i = 1:height(ttR)
  % Define the time window for the previous 24 hours
  startTime = ttR.Time(i) - caldays(1);
  endTime = ttR.Time(i);
    % Find all rainfall values within this time window
    rainInWindow = ttR.Rainfall(ttR.Time >= startTime & ttR.Time <
    endTime);
    if ~isempty(rainInWindow)
        % Calculate the difference between current rainfall and 
        maximum in window
        dRF = max(rainInWindow) - ttR.Rainfall(i);
        if dRF > 30
            t = [t; ttR.Time(i)];
            r = [r; dRF];
        end
    end
  end
% Create result timetable
ttDRF = timetable(t, r, 'VariableNames', {'24hr_Rainfall'});
ttDRF.Properties.DimensionNames(1) = {'Begin_Date'};
% Display results
disp(ttDRF);

first timetable is created from your data which allows for easier time-based indexing. For each time point, we calculate the start and end of the previous 24-hour window, using logical indexing to extract all rainfall values within that time window, computing the difference between the maximum rainfall in that window and the current time's rainfall. If the difference exceeds 30 mm, we store the timestamp and difference.

Feel free to run this code snippet in your MATLAB environment, and it should yield results based on your specified criteria for identifying significant rainfall differences over a defined time frame.

@Umar - the code is mine, not @Poulomi's and the data are cumulative totals at the time, not rainfall in the three hour period. The code computes the difference in the 24hr period which is the cumulative difference over the time.
Well, the above isn't strictly true,either, though, I now observe that I missed in the original data because it apparently does start over again on 5/5, but we don't have data for either 5/1 00hr or 5/5 00hr.
It appears that accumulations are collected until there is a 24-hr consecutive period without rainfall, but that's a surmise that is not provided; it would probably need some preprocessing with a more clear definition of what the data really are and without the missing observations to really do it absolutely correctly.
"the difference in rainfall value of 1982/5/1 at 3 hrs - 1982/5/2 at 3hrs should be greater than 30 mm."
@Poulomi -- I would disagree from the provided data unless one were to impute the missing data from 5/1 at 00 hr and it were 0 (or at least <10). From what can be determined from the data provided, the difference from 5/1 3hr to 5/2 3hr is 45-25 = 20 which <30.
As noted in the response to @Umar, we need a better definition of what the data truly represent and what is the criterion for resetting the accumulations, but most importantly, the missing data at 00 hrs.
@Umar - I agree you saw what I missed on the very last sample data that the gauge accumulator was reset at some point--given the data we have without further clarification the use of max() over the 24hr period will catch out those periods.
Missing the 00 hr datapoints is going to introduce some uncertainty/inaccuracy, but there's nothing can be done about that if the data are truly missing and weren't just inadvertently left out in the posting...
Poulomi
Poulomi on 9 Oct 2024
Edited: Poulomi on 9 Oct 2024
The data is provided 3-hourly and there will be a few missing hour in between. I am filling with NaN values for those missing cases.
I believe the above code does precisely what you asked for, then...to the best approximation available.
It seems unlikely coincidence that the missing data are both 00 hr in such a short sample--is there any chance that 0 amounts were inadvertently omitted when the gauge accumulation was reset and imputing a zero instead might be the more nearly correct answer?
I still don't see a way without either an explicit or implicit loop, though, as @Umar noted the inner computation over the day period can be vectorized by extracting those as a vector.
Dear @dpb,
Thank you for your detailed observations regarding the rainfall data and the associated code. Your insights into the nuances of the cumulative totals and the importance of accurate data handling are both appreciated and crucial for this analysis. Below, I’ll summarize the key points from your comments and provide some additional technical feedback.
Data Clarification: You rightly pointed out that the data represents cumulative totals rather than rainfall over specific intervals. This distinction is essential for accurately interpreting the results, especially when calculating differences over time.
Missing Data Points: The absence of data for 5/1 at 00 hr and 5/5 at 00 hr introduces uncertainty in calculations. As you mentioned, without these points, any imputation (like assuming zero rainfall) can lead to inaccuracies, particularly in understanding the accumulation resets.
Accumulation Resets: Your hypothesis regarding accumulations being collected until a 24-hour period without rainfall is a logical approach but requires explicit confirmation from the dataset documentation. Without clarity on this criterion, any analysis may yield misleading conclusions.
Vectorization vs. Looping: Your discussion around vectorizing the inner computation is particularly insightful. Leveraging vectorization can significantly enhance performance when processing larger datasets, avoiding unnecessary loops where possible.
In nutshell, your attention to detail in addressing these complexities is commendable.
Actually, the
dRF=max(ttR.Rainfall(inDay))-ttR.Rainfall(i); % amount difference in 24hr accounting for rollover
way is not terribly robust; it works with the specific sample of data, but a case such as
Hr Accum
0 10
3 15
6 0
12 20
18 25
0 30
would extract the second maximum instead of the first even though the rollover occurred. As one of my earlier comments noted, I think one would have to determine whether a rollover did occur within the 24hr period and ensure chose from within the prior section to have fully consistent results.
As per the problem definition, the accumulated precipitation at 3-hourly timepoint should increased to > 30mm within 24 hrs, so instead of max(ttR.Rainfall(inDay)), it should be nanmin(ttR.Rainfall(inDay)) as we are looking whether there is an increase of > 30, with max, we may discard a few events. Also, missing can't be imputed with zeros. 0 rainfall means no rain at all on that hour, so I have replaced them with nan
"it should be nanmin(ttR.Rainfall(inDay)) as we are looking whether there is an increase of > 30, with max, we may discard a few events."
No, the min() won't find any events; the max() is returning the largest amount in the day in question; the delta computed is then the difference between it and the starting amount of the current 24hr period. The min will return the starting value for almost every 24hr period; only those in which the accumulator been reset will it find another value, and then the computed difference will turn out to be negative.
max() silently discards NaN anyway, so the nanmin, nanmax functions aren't required...
max([10;20;nan])
ans = 20
The potential issue about the reset period is still the one outlined above in the (probably rare) event that the second set of data produce a larger accumulated total than a first set within the 24 hr period after a reset.
One could check for that by finding the locations in the overall at which the sequential diff() of the rainfall is <0 and the checking if any of those dates are within the 24hr band after any of the returned exceedances.
dpb
dpb on 12 Oct 2024
Edited: dpb on 12 Oct 2024
"The potential issue about the reset period..."
Probably the cleanest solution is to add a little more logic...
...
inDay=timerange(ttR.Time(i),nextDay); % indexing expression for 24hr period
rfDay=ttR.Rainfall(inDay); % the accmulations in the 24hr
ixReset=find(diff(rfDay)<0);
if isempty(ixReset) % no reset in this period
dRF=rfdDay(end)-ttR.Rainfall(i); % amount difference in 24hr
else % there is a reset in this period
dRF=rfDay(ixReset)-ttR.Rainfall(i); % amount difference before reset
end
...
The above does the comparison to the period before the reset; the question is then one of a definition of what event(s) are actually wanted --the maximum total in the 24hr period would also include the amout after the reset, if any, in which case one would need the total of the two maxima in the period.
That would be more like
...
ixReset=find(diff(rfDay)<0);
if isempty(ixReset)
dRF=rfdDay(end)-ttR.Rainfall(i);
else
dRF=rfDay(ixReset)-ttR.Rainfall(i)+rfdDay(end); % amount before and after reset
end
...

Sign in to comment.

More Answers (0)

Categories

Asked:

on 8 Oct 2024

Edited:

dpb
on 12 Oct 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!