MATLAB Examples

# Regress_Bivariate

## Contents

Regress_Bivariate:

This code produces a weighted least squares fit of a straight line to a set of points with error in both coordinates. It can handle bivariate regression where the errors in both coordinates are correlated and is capable of performing force-fit regression.

For details on the algorithm used in the code, see Unified Equations for the slope, intercept, and standard errors of the best straight line (York et al., 2004)

Written by Kaustubh Thirumalai (kau@ig.utexas.edu) of the University of Texas Institute for Geophysics and Arvind Singh of the University of Kiel at Physical Research Laboratory, Ahmedabad, India.

Citation: Thirumalai, K., A. Singh, and R. Ramesh (2011), A MATLAB code to perform weighted linear regression with (correlated or uncorrelated) errors in bivariate data, Journal of the Geological Society of India, 77(4), 377380, doi: 10.1007/s12594-011-0044-1.

```%===================================== % INPUT %===================================== % Excel file/array with the following format: % % Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | % X data | sigX | Y data | sigY | r^2(sigmas) | % %===================================== % OUTPUT %===================================== % %  Graph of Y vs. X data (with errorbars) %  Comparison of Weighted Linear Regression vs. Simple Linear Regression %  Errors on slope and intercept of the line of best-fit % % NOTE: Correlation coefficient between errors in X(i) and Y(i) has been % taken as zero in this program. If there is any correlation between the % errors, then you can include it directly in the worksheet in column 5 % and you can de-comment it in the code below. By default it will be taken % as zero. For force-fit regression, place a sufficiently low error (i.e. % ~infinite weight) on the coordinates of the point at which the line is % to be forced. ```

## 2. Input

A .xlsx (.xls) file with the following format:

• Column 1 - X Data
• Column 2 - X Error
• Column 3 - Y Data
• Column 4 - Y Error
• Column 5 - Corr.Coef. between errors (if needed)

Otherwise, similar input for an array in mat

```tic clc; clear; tol = 1e-8; % Tolerance %---- Excel Data Extraction ---- mat = xlsread('Pearson.xls'); % Note that data can be entered directly as an array in |mat| X = mat(:,1); sigX = mat(:,2); Y = mat(:,3); sigY = mat(:,4); ```

## 3. Simple Linear Regression

Performs simple linear regression (SLR) for comparison.

```R = corrcoef(X,Y); n = length(X); ri = 0; X1 = [n,sum(X);sum(X),sum(X.*X)]; % LSE Value of 'b' Y1 = [sum(Y);sum(X.*Y)]; % Polyfit can also be used Z1 = X1\Y1; % Matrix division a1 = Z1(1); b1 = Z1(2); sigres = sum((Y - a1 - b1.*X).^2)/(n-2); % Sigma Residual delta = det(X1); % Determinant varx = var(X); sigb1sq = (n^2)*(n-1)*varx*sigres/(delta^2); % Sigma(b) without weights sigb1 = sqrt(sigb1sq); Xbar = mean(X); siga1sq = (sigres/(varx*n))*(varx + Xbar^2); siga1 = sqrt(siga1sq); A_SLR = [a1 siga1]; % SLR Intercept B_SLR = [b1 sigb1]; % SLR Slope %---- SLR p-value ---- B1 = 0; t1=(b1-B1)/sigb1; Pval1=2*(1-tcdf(abs(t1),1)); ```

## 4. Maximum Likelihood Method (Weighted Linear Regression)

Implements the York et al. [2004] algorithm to perform weighted linear regression using the maximum likelihood method (MLM).

```%---- Weighting Errors ---- wX = abs(1./(sigX.^2)); wY = abs(1./(sigY.^2)); alpha = sqrt(wX.*wY); %ri = mat(:,5); % CorrCoef between sig(X) and sig(Y) b = b1; d = tol; i = 0; %---- York et al. (2004) Algorithm ---- while (d > tol || d == tol) % Tolerance check loop i = i+1; b2 = b; W = wX.*wY./((wX) + ((b^2).*wY) - (2.*b.*alpha.*ri)); meanX = sum(W.*X)/sum(W); meanY = sum(W.*Y)/sum(W); U = X(:) - meanX; V = Y(:) - meanY; Beta = W.*((U./wY)+((b.*V)./wX) - (b.*U + V).*(ri./alpha)); meanBeta = sum(W.*Beta)/sum(W); b = sum(W.*Beta.*V)/sum(W.*Beta.*U); dif = b - b2; d = abs(dif); end U2 = U.^2; V2 = V.^2; a = meanY - b.*meanX; x = meanX + Beta; meanx = sum(W.*x)/sum(W); u = x - meanx; sigbsq = 1./(sum(W.*(u.*u))); sigb = sqrt(sigbsq); sigasq = 1./(sum(W)) + meanx^2.*(sigbsq); siga = sqrt(sigasq); S = sum(W.*((Y - b.*X - a)).^2); %---- MLE p-value ---- B = 0; t=(b-B)/sigb; Pval=2*(1-tcdf(abs(t),length(X)-2)); ```

## 5. Plot

```figure(1) clf; hold on; errorbar_x(X,Y,sigX,'o'); set(gca,'FontSize',12,'FontName','Myriad Pro'); % See http://www.mathworks.com/matlabcentral/fileexchange/12751) h1 = errorbar(X,Y,sigY,'o','markersize',... 6,'markeredgecolor','k','markerfacecolor','w','linewidth',2); grid on; p = min(X)-(max(X)-min(X))/10:(max(X)-min(X))/10:max(X)+(max(X)-min(X))/10; q = a + (b.*p); h2 = plot(p,q,'r','linewidth',2); p = min(X)-(max(X)-min(X))/10:(max(X)-min(X))/10:max(X)+(max(X)-min(X))/10; q = a1 + (b1.*p); h3 = plot(p,q,'--k'); pl = legend([h1,h2,h3],'Data','Weighted Line (MLM)',... 'Simple Line (SLR)','Location','NorthEast'); htext=findobj(get(pl,'children'),'type','text'); set(htext,'FontName','Avenir','Fontsize',14); xl = xlabel('X'); set(xl,'FontName','Avenir','FontSize',14); yl = ylabel('Y'); set(yl,'FontName','Avenir','FontSize',14); ```

## 6. Output

```display(' '); display('Regress_Bivariate:'); display(' '); D = [X sigX Y sigY]; display(' X sigX Y sigY'); disp(D); A = [a siga]; B = [b sigb]; disp('--------------------------------------'); display('Weighted Linear Regression:');disp(' '); wr = sum(U.*V)./sqrt((sum(U2).*sum(V2))); display('Y = a + bX');display(' '); disp(' a siga');disp(A); disp(' b sigb');disp(B); disp(['r: ',num2str(wr)]); disp(['r^2: ',num2str(wr^2)]); disp(['p-value: ',num2str(Pval)]); disp('--------------------------------------'); display('Simple Linear Regression:');disp(' '); display('Y = a + bX');display(' '); disp(' a siga');disp(A_SLR); disp(' b sigb');disp(B_SLR); disp(['r: ',num2str(R(1,2))]); disp(['r^2: ',num2str(R(1,2)^2)]); disp(['p-value: ',num2str(Pval1)]); disp('--------------------------------------'); disp(['Total no. of iterations: ',num2str(i)]); toc ```
``` Regress_Bivariate: X sigX Y sigY 0 0.0316 5.9000 1.0000 0.9000 0.0316 5.4000 0.7454 1.8000 0.0447 4.4000 0.5000 2.6000 0.0354 4.6000 0.3536 3.3000 0.0707 3.5000 0.2236 4.4000 0.1118 3.7000 0.2236 5.2000 0.1291 2.8000 0.1195 6.1000 0.2236 2.8000 0.1195 6.5000 0.7454 2.4000 0.1000 7.4000 1.0000 1.5000 0.0447 -------------------------------------- Weighted Linear Regression: Y = a + bX a siga 5.4799 0.2950 b sigb -0.4805 0.0580 r: -0.98034 r^2: 0.96107 p-value: 3.3847e-05 -------------------------------------- Simple Linear Regression: Y = a + bX a siga 5.7612 0.1825 b sigb -0.5396 0.0421 r: -0.97648 r^2: 0.9535 p-value: 0.049602 -------------------------------------- Total no. of iterations: 6 Elapsed time is 3.461433 seconds. ```