# How can i extract for specific set of rows from large set of data to find statistical operations. And set was incremented by 1 for each set of rows and create a columns for each feature extraction in machine learning!

7 views (last 30 days)
Ram on 10 Jun 2018
Edited: dpb on 11 Jun 2018
Hallo all, I would like to find some statiscal operations(Machine Learning) such as mean, median, stdev and varience from a large dataset(.mat).let say,it contains 1000*1. Here, would like to find for each 10 rows find mean and median etc and keep it in a new column. i.e,
1 to 10 rows has mean value is x.
2 to 11 rows has mean then value is x.
....................................................
...................................................
untill 989 to 999 mean value is x.
last row(1000) is eleiminated. similarly median,stdev etc these values keep in
col1(mean); col2(median) col3(varience) and col4(stdev).
I am able to solve these features for my entire dataset. But here every time the index is incremented by 1 both direction!
should i write mean, meadian... functions in FOR loop itself or seperately. can you give me any example!!!
and also I would like to implement this example(Task) to other programming platforms such as Python.

dpb on 10 Jun 2018
"... 1000*1. Here, would like to find for each 10 rows"
N=10; % size of groups
L=size(x,1); % length of data array, x
n=ceil(N/L); % how many groups in array x (including odd remainder)
g=repmat(1:N,n,1); g=g(:); % grouping variable for accumulation
t=array2table(x); % make the array into a table for convenience
g=g(1:L); % group variable match in length if uneven multiples of N
s=grpstats(t,g); % compute summary statistics by group
See
doc grpstats
for details on syntax and how to specify which statistics are desired.
dpb on 11 Jun 2018
"is it incremented by both sides(x = 1to10;x2 = 2 to 11)! "
What is "it" and both sides of what? statistics s will be by the grouping variable which was defined to be sequence of 10 observations per the problem definition.
What's the point of Matlab without the builtin functions? If you don't have Statistics toolbox then the old-timey way is either use accumarray or the historic way before even it is to use the internal storage order of Matlab as being column major--
x=reshape(x,N,[]); % rearrange column-wise by N elements
s=[mean(x);median(x);std(x)]; % compute statistics by column