Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Organizing My Data by First Column Entry

Subject: Organizing My Data by First Column Entry

From: Kevin

Date: 2 Jul, 2013 15:35:07

Message: 1 of 2

Hey guys,

I have a massive set of data that covers all 23 human chromosomes. Although I'd like it to be divided based on the chromosome. The first column reads 'chr1'....all the way down to 'chr23' for approximately 1300000 entries, so I want to divide the matrix into separate ones based on that feature. Something like de-concatenate...I'm just not sure how to do that as I import it, since I think it would be faster to combine the two steps. The file has no header rows, and is tab delimited. All other columns contain doubles.

I appreciate the advice!

Kevin

Subject: Organizing My Data by First Column Entry

From: dpb

Date: 2 Jul, 2013 17:09:08

Message: 2 of 2

On 7/2/2013 10:35 AM, Kevin wrote:
> Hey guys,
>
> I have a massive set of data that covers all 23 human chromosomes.
> Although I'd like it to be divided based on the chromosome. The first
> column reads 'chr1'....all the way down to 'chr23' for approximately
> 1300000 entries, so I want to divide the matrix into separate ones based
> on that feature. Something like de-concatenate...I'm just not sure how
> to do that as I import it, since I think it would be faster to combine
> the two steps. The file has no header rows, and is tab delimited. All
> other columns contain doubles.
> I appreciate the advice!
> Kevin

OK, what I'd suggest is one of two ways...first would be to read in as
cell array and then you can create a structure array that holds each in
the name.

If have sotoo...

 >> l={'chr1';'chr1';'chr1';'chr2';'chr2';'chr3'}; % a list of first 3
 >> l(:,2)={1;1;1;2;2;3}; % identifiable data
 >> s=struct(l{1,1},[l{1:3,2}]')
s =
     chr1: [3x1 double]
 >> s.chr1
ans =
      1
      1
      1
 >>

Just find the start:stop indices for each in the data.

Or, if you have the Statistics Toolbox the dataset object can handle
such named fields quite nicely.

--

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us