Clear Filters
Clear Filters

Plotting from a large data set and identifying outliers

4 views (last 30 days)
a. Write a script which loads up data from the National Health Interview Survey 2017 data, extracts the heights and masses and converts them from inches into cm (multiply by 2.54) and pounds into kg (multiply by 0.454).
b. make a plot of the data and remove any obvious outliers from the data set , making any outlier criteria clear.
c. create a histogram for each of height and mass, setting the binwidth in each case to units of 5. Overlay the pdf of the corresponding modelled normal distribution for each of height and mass. Ensure your plots are properly labelled, with suitable axes labels and legend.
d. Use the corrcoef function to output the numerical value of the height-mass Pearson coefficient r.
I have done part (a) but when i try to plot the date it looks horrendous so not sure how this can be done? (data is attached) Also, for part c how do we overlay the pdf of corresponding normal distribution?
Thanks in advance
This is what i have so far:
% code for Q3
xlsread("NHIS2007data.xlsx") %import data
data1 = xlsread("NHIS2007data.xlsx") %load data into a variable
[height] = data1(:,[8]) %extract heights which is 8th column of data
[masses] = data1(:,[9]) %extract masses which is 9th column of data
y=[height]*2.54 %convert to cm
x=[masses]*0.454 %convert to kg
histogram(y,'BinWidth',5) %histogram of heights
histogram(x,'BinWidth',5) %histrogram of masses
corrcoef(x,y)

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!