Code covered by the BSD License  

Highlights from
Demo Files for "Data Analysis with Statistics and Curve Fitting Toolboxes" 2007 Webinar

image thumbnail

Demo Files for "Data Analysis with Statistics and Curve Fitting Toolboxes" 2007 Webinar

by

 

08 Aug 2007 (Updated )

Demo file from the August 7, 2007 Live Webinar

Analysis of Wing Stresses on a Prototype Aircraft

Analysis of Wing Stresses on a Prototype Aircraft

Demo file from August 7, 2007 webinar titled Data Analysis with Statistics Toolbox and Curve Fitting Toolbox. View the recorded webinar: http://www.mathworks.com/company/events/archived_webinars.html

We have test results from a prototype aircraft tested in a wind tunnel. The test focused on understanding the wing stresses during the climbing phases of the flight, where stresses are typically high. The wind tunnel's gust generator induces a controlled gust field to simulate severe wind conditions. We measure stress at 7 locations on each wing.

Our primary goals include:

  • Identify outliers or bad data
  • Compare the stress levels on the 2 wings to make sure they're not substantially different.
  • Calculate the shear force the wing experiences as a result of the wind gust
  • Determine if the max shear force is above acceptable limits (160 kN)

Contents

Show Configuration of Aircraft

Wind gusts are directed at the aircraft as shown in this image. Sensor locations are shown on the left/right wings.

imshow('TestConfig.jpg')

Raw Data File

The data is from testing is stored in a Microsoft Excel spreadsheet named 'WingStress.xls'. The data contains measurements of wind velocity (m/s), wind angle (degrees), and stress measurements (kPa) for the left (L) and right (R) wings where 1 denotes the sensor location near the base of the wing and 7 denotes the sensor near the tip of the wing.

imshow('WingStress.jpg')

Load in the Data

Take advantage of the dataset array provided in Statistics Toolbox for manipulating and storing my data. Note: You need to have Statistics Toolbox version 6.0 or higher to use dataset arrays.

clear all, close all % clean up previous work

% Load in data to a dataset array (Statistics Toolbox data type)
ds = dataset('XLSFile','WingStress.xls');

% Add units to my dataset
strainUnits = repmat({'kPa = kN/m^2'},1,14);
ds.Properties.Units = {'m/s','degrees',strainUnits{:}};

% Show summary
summary(ds)
Velocity: [72x1 double, Units = m/s]
     min         1st Q       median       3rd Q        max     
     4.4538      7.8577      12.0013      15.9402      20.5982 

Angle: [72x1 double, Units = degrees]
     min       1st Q      median      3rd Q      max 
     -120        -60           0         60      120 

L1: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     65.2751      140.8844      199.7904      252.5293      370.1140 

L2: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     60.9627      154.7150      201.6100      252.2235      341.5571 

L3: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q        median       3rd Q         max      
     27.7622      75.4219      95.7420      128.4913      198.7315 

L4: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     61.0577      145.4255      206.3261      239.7442      404.9529 

L5: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     56.0411      134.7558      169.9957      217.6420      358.0392 

L6: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     46.0159      103.9926      128.3909      169.0685      248.7240 

L7: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q        median       3rd Q        max      
     13.2258      40.1803      52.8888      61.8663      119.4480 

R1: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     44.1058      139.1650      185.5733      239.7453      384.9178 

R2: [72x1 double, Units = kPa = kN/m^2]
     min              1st Q            median        3rd Q         max      
     6.1899e-009      2.8778e-008      134.2293      213.2698      306.1857 

R3: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     67.9553      149.8835      197.8903      248.5895      402.9298 

R4: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     61.3721      149.9545      190.6260      257.7519      358.4477 

R5: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     51.9460      131.8459      168.3736      228.3369      366.4075 

R6: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     33.0364      101.2601      125.0345      170.1178      252.2392 

R7: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q        median       3rd Q        max      
     12.8590      39.0228      52.9290      66.3789      105.4329 

You can see how the dataset loaded the data from the spreadsheet file and automatically used the column names as variable names within the dataset. The summary command lists the dataset contents, including the type, size, assigned units, and summary statistics for each variable. You can access the variables directly using the "dot" notation.

ds.Velocity(1:5)
ans =

    5.2848
    5.0441
    4.8292
    4.9279
    4.8810

Example of Data Management with Datasets

Datasets make it easy to access my data by name and operate on the data within the dataset as a group. For example, I can access only 90 degree data for L1/R1 and L7/R7.

L1R190 = ds(ds.Angle == 90,{'Angle','L1','R1','L7','R7'})
L1R190 = 

    Angle    L1          R1          L7         R7     
    90         159.36      156.15    39.5049     38.814
    90       160.7978     163.922    43.9022    42.7627
    90       150.5968    137.5926      41.79    51.4079
    90       173.6168    131.1749    45.1426    44.1388
    90       183.4965    160.3539    39.8596    53.0917
    90       151.0959    164.9913    50.1638    43.0593
    90       166.0129    123.9435    48.1589    39.2316
    90       155.5653    137.8742    45.9231    36.5771

I can also calculate descriptive statistics by Angle for L1 and R1.

ByAngle = grpstats(ds,{'Angle'},{@mean,@std},'DataVars',{'L1','R1'})
ByAngle = 

            Angle    GroupCount    mean_L1     std_L1     mean_R1     std_R1 
    -120    -120     8             100.1556    24.1595    103.0061    29.7064
    -90      -90     8             149.3126    26.5415    153.6784    12.1865
    -60      -60     8             230.3514    27.3318    222.6771     33.259
    -30      -30     8             262.2175      44.05    251.4625    35.3024
    0          0     8             275.3241    51.6807    277.3864    68.5707
    30        30     8             260.9625    60.5256    245.3882    46.0502
    60        60     8             216.2059    12.6545    216.3821    36.0055
    90        90     8             162.5677    11.3975    147.0003    16.1507
    120      120     8              102.593    29.6774    99.98004    39.1901

The dataset array provides much more functionality than I have shown here. I introduced the dataset array so that you would be familiar with it since I use it in my analysis that follows.

Show Plot by Angle

Let's start exploring my data. I'll use a scatter plot to visualize my data.

gscatter(ds.L1, ds.R1, ds.Angle)

This plot makes it easy to see how L1 trends with R1, in a linear fashion. I can also see how angle affects the stress level. It appears that -120/120 has the lowest stress and 0 is the highest. So flying directly into the wind causes the highest stress on the wings.

Show My Custom Plot

I created a custom plot using plottool to show left and right sensors plotted with a 45 degree line. I expect my data to follow the 45 degree line if the left and right sensors are behaving the same.

createfigure(ds.L1,ds.R1,[0 500]), title('L1 vs. R1')
createfigure(ds.L2,ds.R2,[0 500]), title('L2 vs. R2')

We can see from the plot of L1 vs. R1 that my data on the left and right wings are similar and do follow a 45 degree line. The plot of L2 vs. R2 shows that I have some bad data in R2, as can be seen by the zero values in the plot. I know from testing that R2 failed during the latter part of testing. I need to remove the bad data from R2 and keep the good data that follows the 45 degree line.

Hypothesis Testing of Left/Right Wing

Let's test the hypothesis that L1 = R1 and L2 = R2. This will perform the same test as my custom plot did, but does not require me to visually inspect the data.

We'll use the hypothesis test function kstest2 from the Statistics Toolbox to test this assumption.

h1 = kstest2(ds.L1,ds.R1,0.1)
h2 = kstest2(ds.L2,ds.R2,0.1)
h1 =

     0


h2 =

     1

Note that the hypothesis test (h1) returns 0 for the assumption that L1 = R1. This tells me that my assumption L1 = R1 is true, or in other words, the differences I observe in L1 and R1 are due to normal and expected variation.

The hypothesis test (h2) returns a 1 for the assumption that L2 = R2. This tells me L2 is not equal to R2, which we already know has bad data in it.

Hypothesis testing allows me to systematically test my data for difference in measured values where I normally would only detect this differences by visual inspection.

Use Hypothesis Testing to Screen Data

Combine hypothesis testing with my customized plot to automate the process of identifying differences across sensor locations. This next section of code will scan my data and only flag data that may contain bad or suspect data (i.e. any data that fails Lx = Rx will be shown in my custom plot).

% Left and right wing sensor labels
Left  = {'L1','L2','L3','L4','L5','L6','L7'};
Right = {'R1','R2','R3','R4','R5','R6','R7'};

for i = 1:length(Left)
    Lx = double(ds,Left{i});
    Rx = double(ds,Right{i});

    h(i) = kstest2(Lx,Rx,0.1); % test Lx = Rx

    if h(i) > 0 % Plot potentially bad data
        createfigure(Lx,Rx,[0 500]) % modified plot
        label = [Left{i},' vs. ',Right{i}];
        title(label)
        fprintf('%s failed test, h = %i\n',label,h(i))
    end
end
L2 vs. R2 failed test, h = 1
L3 vs. R3 failed test, h = 1

We can see that L2 and R2 were flagged, which we expect. We can also see that L3 and R3 were flagged. Looking at the L3 vs. R3 plot, we can see that R3 appears to have a higher stress measurement than L3. But which one is correct?

Add Flag for Sensor Health

At this point I would like to start removing data that I know is bad from further analysis. I'll use categorical arrays from Statistics Toolbox to categorize each sensor as Alive, Dead, or Suspect. This will allow me to filter my data based upon these categories.

I'll first declare that all sensors are good.

Health = nominal('Alive');
Health = repmat(Health,length(ds),14);

HLeft = {'HL1','HL2','HL3','HL4','HL5','HL6','HL7'};
HRight = {'HR1','HR2','HR3','HR4','HR5','HR6','HR7'};ds = [ds dataset({Health,HLeft{:},HRight{:}})];
HealthUnits = repmat({'[-]'},1,14);
ds.Properties.Units= [ds.Properties.Units HealthUnits];

Now I'll flag the data in R2 that I know for sure is bad. Note that the summary command now displays information about the categorical data. Specifically, if we look at HR2, we can see how many data points are good (Alive = 46) vs bad (Dead = 26).

R2L2 = ds.R2./ds.L2; % ratio and remove values below 50% (0.5)
indx = R2L2 < 0.5;
ds.HR2(indx) = 'Dead';

summary(ds)

% Plot to verify we can removed R2 bad data
createfigure(ds.L2(~indx),ds.R2(~indx),[0 500])
title('L2 vs. R2')
Warning: Categorical level 'Dead' being added.

Velocity: [72x1 double, Units = m/s]
     min         1st Q       median       3rd Q        max     
     4.4538      7.8577      12.0013      15.9402      20.5982 

Angle: [72x1 double, Units = degrees]
     min       1st Q      median      3rd Q      max 
     -120        -60           0         60      120 

L1: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     65.2751      140.8844      199.7904      252.5293      370.1140 

L2: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     60.9627      154.7150      201.6100      252.2235      341.5571 

L3: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q        median       3rd Q         max      
     27.7622      75.4219      95.7420      128.4913      198.7315 

L4: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     61.0577      145.4255      206.3261      239.7442      404.9529 

L5: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     56.0411      134.7558      169.9957      217.6420      358.0392 

L6: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     46.0159      103.9926      128.3909      169.0685      248.7240 

L7: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q        median       3rd Q        max      
     13.2258      40.1803      52.8888      61.8663      119.4480 

R1: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     44.1058      139.1650      185.5733      239.7453      384.9178 

R2: [72x1 double, Units = kPa = kN/m^2]
     min              1st Q            median        3rd Q         max      
     6.1899e-009      2.8778e-008      134.2293      213.2698      306.1857 

R3: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     67.9553      149.8835      197.8903      248.5895      402.9298 

R4: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     61.3721      149.9545      190.6260      257.7519      358.4477 

R5: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     51.9460      131.8459      168.3736      228.3369      366.4075 

R6: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q         median        3rd Q         max      
     33.0364      101.2601      125.0345      170.1178      252.2392 

R7: [72x1 double, Units = kPa = kN/m^2]
     min          1st Q        median       3rd Q        max      
     12.8590      39.0228      52.9290      66.3789      105.4329 

HL1: [72x1 nominal, Units = [-]]
     Alive 
        72 

HL2: [72x1 nominal, Units = [-]]
     Alive 
        72 

HL3: [72x1 nominal, Units = [-]]
     Alive 
        72 

HL4: [72x1 nominal, Units = [-]]
     Alive 
        72 

HL5: [72x1 nominal, Units = [-]]
     Alive 
        72 

HL6: [72x1 nominal, Units = [-]]
     Alive 
        72 

HL7: [72x1 nominal, Units = [-]]
     Alive 
        72 

HR1: [72x1 nominal, Units = [-]]
     Alive 
        72 

HR2: [72x1 nominal, Units = [-]]
     Alive      Dead 
        46        26 

HR3: [72x1 nominal, Units = [-]]
     Alive 
        72 

HR4: [72x1 nominal, Units = [-]]
     Alive 
        72 

HR5: [72x1 nominal, Units = [-]]
     Alive 
        72 

HR6: [72x1 nominal, Units = [-]]
     Alive 
        72 

HR7: [72x1 nominal, Units = [-]]
     Alive 
        72 

Analysis of Variance (ANOVA)

I can use ANOVA techniques to test the assumption that all of my sensors have the same mean (i.e. L1 = R1 = R2 = ...L7 = R7). This test will tell me if the differences in stress measurements are different or a result of the normal and expected variation in the data.

stress = double(ds,[Left,Right]);
stress(ds.HR2 == 'Dead',9) = NaN;
[p,t,stats] = anova1(stress,[Left,Right]);

Let's dig deeper into the differences between L3 and R3. You can see a box plot is produced that shows me the amount of variation each sensor has in my data. The y-axis shows me the median (red bar), the quartile range of the data (end of boxes), the range of the data (dashed bars), and any outliers that might be in my data (red plus symbols). We can see a trend for each wing (looking left to right) that shows decreasing (median) values as we go from the wing base (L1 or R1) to the wing tip (L7 or R7). Also note that L3 and R3 do not behave similarly. We can conclude from this plot that L3 is the sensor that has different behavior.

The ANOVA table (p-value, prob>F) tells me that the differences across sensors are significant and not due to random variation. If the p-value was at or above 0.05, this test would tell me that my different sensors are really measuring the same thing. Now the ANOVA test is not all interesting in this case, and I used ANOVA to generate the box plot for visualization. But I also want to test for differences across sensors individually. I can do this using the multcompare function and the output stats structure from ANOVA.

multcompare(stats);

The multcompare function provides an interactive plot that I can use to test for differences across groups. For example, if you click on L3, you can see that it is different than all of the other sensors, 12 to be precise. So L3 must be the sensor that is wrong and R3 is good. Now I don't konw the casue of L3's difference. It could be bad data acquisition hardware, improper installation or placement of the L3 sensor, or some other reason. The point is that I do not trust L3's values and will remove it from further analysis.

ds.HL3(:) = 'Suspect'; % Label L3 as Suspect (I should look into why)
stress(:,3) = NaN;
Warning: Categorical level 'Suspect' being added.

Clustering (show groups, outliers)

We looked at our data by sensor location and found R2 to contain bad data. We also found L3 to be suspect, and should remove it from analysis based upon it near behaving as expected. Now let's take a look at angle effects on our data. I'll use cluster analysis here instead of hypothesis testing or ANOVA, to show a different method from Statistics Toolbox that can be used to identify difference in your data.

rmv = isnan(stress(1,:));
[d,p,stats] = manova1(stress(:,~rmv),ds.Angle);
manovacluster(stats)
xlabel('Wind Gust Angle (deg)')

You can see the dendrogram plot show that -120/120 is a group. You can see the same for -90/90. The height of the bars indicate how closely related the different cluster are to each other. For example, if I cut the bars at around a value of 6, I would get the grouping of -120/120, -90/90, and -60 for the left half of the plot.

60 is interesting in that it is more like -90 and 90 than it is like -60. This is surprising in that we should have a -60/60 grouping, based upon symmetry of the airplane. I need to investigate why this might be so.

Notice that 0 is in a cluster all by itself. This is expected, since it is the only measurement that does not have a symmetric counterpart.

My ultimate goal is to determine the maximum shear force that the wing was exposed to during testing. And from the gscatter plot shown earlier, I know the 0 degree condition produces the maximum stress on the wings. Let's focus attention on this test condition.

Curve Fitting Toolbox

I want to estimate the maximum shear force the wing was exposed to during testing. Shear force is simply the integration of the bending moment over the wing length, or span. The Curve Fitting Toolbox provides integration capability of curves, so let's use it here to calculate the shear force.

Bending moment (M) as a function of wing span can be calculated from my stress measurements:

 M = I/t * stress

where I is the cross-sectional moment of inertia for the wing and t is the location of the sensor from the neutral axis of the wing. So let's fit a curve to M, extrapolate down to the base of the wing (span = 0), and then integrate the curve to get Shear Force. I'll do this for the 0 degree wind gust data for the highest wind gust velocity tested (this is the maximum stress condition).

I    = [0.0127 0.0120 0.0113 0.0107 0.0095 0.0084 0.0073]; % m^4
t    = [0.1105 0.1085 0.1065 0.1045 0.1001 0.0962 0.920]; % m
span = [0.5, 1, 1.5, 2, 3, 4, 5]; % m

% Moment for max wind gust velocity at 0 degrees (head on)
maxV = max( ds.Velocity(ds.Angle == 0) );
s = stress(ds.Angle == 0 & ds.Velocity == maxV, :);

ML = I./t .* s(1:7);
MR = I./t .* s(8:end);

% Open curve fitting tool with left and right wing data loaded
cftool([span span],[ML MR])

Show cftool analysis.

cubicfit([span span],[ML MR]) % generated from cftool
open('analysis.fig') % show analysis figure (saved from Analysis in cftool)

We can see in the upper plot that the extrapolation of the cubic fit down to 0 appears to be reasonable. Also note that our confidence in this area has been reduced (wider confidence intervals).

The bottom plot shows the integration of the upper plot. The maximum shear force appears to be 151 kN.

Summary

We can se from the integration curve that the maximum shear force is around 151 kN, which is below our requirement of 160 kN. We also found the left and right wing measurements to be similar after we removed bad and suspect data.

Appendix

Custom plot createfigure

type createfigure
function createfigure(X1, Y1, X2)
%CREATEFIGURE(X1,Y1,X2)
%  X1:  vector of x data
%  Y1:  vector of y data
%  X2:  vector of x data

%  Auto-generated by MATLAB on 27-Jul-2007 12:20:13

% Create figure
figure1 = figure;

% Create axes
axes('Parent',figure1,'YGrid','on','XGrid','on');
% Uncomment the following line to preserve the X-limits of the axes
% xlim([0 5e-005]);
% Uncomment the following line to preserve the Y-limits of the axes
% ylim([0 5e-005]);
box('on');
hold('all');

% Create plot
plot(X1,Y1,'Marker','o','LineStyle','none');

% Create xlabel
xlabel('Left');

% Create ylabel
ylabel('Right');

% Create plot
plot(X2,X2,'DisplayName','[0 4e-5] vs [0 4e-5]','LineWidth',2,...
    'Color',[0 0 0]);

axis equal tight

Generated code from cftool

type cubicfit
function cubicfit(x,y)
%CUBICFIT    Create plot of datasets and fits
%   CUBICFIT(X,Y)
%   Creates a plot, similar to the plot in the main curve fitting
%   window, using the data that you provide as input.  You can
%   apply this function to the same data you used with cftool
%   or with different data.  You may want to edit the function to
%   customize the code and this help message.
%
%   Number of datasets:  1
%   Number of fits:  1

 
% Data from dataset "y vs. x":
%    X = x:
%    Y = y:
%    Unweighted
%
% This function was automatically generated on 06-Aug-2007 16:03:00

% Set up figure to receive datasets and fits
f_ = clf;
figure(f_);
set(f_,'Units','Pixels','Position',[852 350 680 477]);
legh_ = []; legt_ = {};   % handles and text for legend
xlim_ = [Inf -Inf];       % limits of x axis
ax_ = axes;
set(ax_,'Units','normalized','OuterPosition',[0 0 1 1]);
set(ax_,'Box','on');
axes(ax_); hold on;

 
% --- Plot data originally in dataset "y vs. x"
x = x(:);
y = y(:);
h_ = line(x,y,'Parent',ax_,'Color',[0.333333 0 0.666667],...
     'LineStyle','none', 'LineWidth',1,...
     'Marker','.', 'MarkerSize',12);
xlim_(1) = min(xlim_(1),min(x));
xlim_(2) = max(xlim_(2),max(x));
legh_(end+1) = h_;
legt_{end+1} = 'y vs. x';

% Nudge axis limits beyond data limits
if all(isfinite(xlim_))
   xlim_ = xlim_ + [-1 1] * 0.01 * diff(xlim_);
   set(ax_,'XLim',xlim_)
end


% --- Create fit "Cubic"
ok_ = isfinite(x) & isfinite(y);
ft_ = fittype('poly3');

% Fit this model using new data
cf_ = fit(x(ok_),y(ok_),ft_);

% Or use coefficients from the original fit:
if 0
   cv_ = { -0.5107375651142, 1.784383271041, -4.397432434511, 42.28260479588};
   cf_ = cfit(ft_,cv_{:});
end

% Plot this fit
h_ = plot(cf_,'fit',0.95);
legend off;  % turn off legend from plot method call
set(h_(1),'Color',[0 0 1],...
     'LineStyle','-', 'LineWidth',2,...
     'Marker','none', 'MarkerSize',6);
legh_(end+1) = h_(1);
legt_{end+1} = 'Cubic';

% Done plotting data and fits.  Now finish up loose ends.
hold off;
leginfo_ = {'Orientation', 'vertical', 'Location', 'NorthEast'}; 
h_ = legend(ax_,legh_,legt_,leginfo_{:});  % create legend
set(h_,'Interpreter','none');
xlabel(ax_,'');               % remove x label
ylabel(ax_,'');               % remove y label

Contact us