Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Sharing big data across parfor or spmd

Subject: Sharing big data across parfor or spmd

From: Chuck37

Date: 11 Dec, 2012 18:09:08

Message: 1 of 9

I have a big piece of never changing data that I'd like to be used by all the workers in a parallel setting. Since I'm only working with local workers, I don't understand why I have to eat the huge latency associated with transferring the data to workers each time the parfor loop is called. Can someone explain?

I tried to use spmd to send data once at the beginning and have it stay there, but the data is kind of big (~2 GB going to 10 workers), and I got "error during serialization" anyway. Is there a solution to this problem with local workers where they can all access the data from the same memory? Accesses are infrequent, so simultaneous access shouldn't cause a big slowdown.

Any ideas would be great.

My setup is something like this by the way:

M = big data;
for x = 1:m
   stuff
   parfor y = 1:n
      a(y) = function(M,a(y));
   end
   stuff
end

Parfor is presently worse than 'for' because of the overhead from sending M every time.

Subject: Sharing big data across parfor or spmd

From: Steven_Lord

Date: 11 Dec, 2012 18:41:28

Message: 2 of 9



"Chuck37 " <chuck3737@yahooremovethis.com> wrote in message
news:ka7ss4$4f6$1@newscl01ah.mathworks.com...
> I have a big piece of never changing data that I'd like to be used by all
> the workers in a parallel setting. Since I'm only working with local
> workers, I don't understand why I have to eat the huge latency associated
> with transferring the data to workers each time the parfor loop is called.
> Can someone explain?
>
> I tried to use spmd to send data once at the beginning and have it stay
> there, but the data is kind of big (~2 GB going to 10 workers), and I got
> "error during serialization" anyway. Is there a solution to this problem
> with local workers where they can all access the data from the same
> memory? Accesses are infrequent, so simultaneous access shouldn't cause a
> big slowdown.
>
> Any ideas would be great.
>
> My setup is something like this by the way:
>
> M = big data;
> for x = 1:m
> stuff
> parfor y = 1:n
> a(y) = function(M,a(y));
> end
> stuff
> end
>
> Parfor is presently worse than 'for' because of the overhead from sending
> M every time.

Can you use PARFOR for the loop over x instead of the loop over y?

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Sharing big data across parfor or spmd

From: Chuck37

Date: 12 Dec, 2012 00:24:15

Message: 3 of 9

"Steven_Lord" <slord@mathworks.com> wrote in message
>
> Can you use PARFOR for the loop over x instead of the loop over y?
>

I can't since each iteration depends on the last. It's only the inner loops that are sliceable. Would it make my memory problem any better?

Thanks.

Subject: Sharing big data across parfor or spmd

From: Edric M Ellis

Date: 12 Dec, 2012 08:39:34

Message: 4 of 9

"Chuck37 " <chuck3737@yahooremovethis.com> writes:

> I have a big piece of never changing data that I'd like to be used by
> all the workers in a parallel setting. Since I'm only working with
> local workers, I don't understand why I have to eat the huge latency
> associated with transferring the data to workers each time the parfor
> loop is called. Can someone explain?
>
> I tried to use spmd to send data once at the beginning and have it
> stay there, but the data is kind of big (~2 GB going to 10 workers),
> and I got "error during serialization" anyway. Is there a solution to
> this problem with local workers where they can all access the data
> from the same memory? Accesses are infrequent, so simultaneous access
> shouldn't cause a big slowdown.
>
> Any ideas would be great.
>
> My setup is something like this by the way:
>
> M = big data;
> for x = 1:m
> stuff
> parfor y = 1:n
> a(y) = function(M,a(y));
> end
> stuff
> end
>
> Parfor is presently worse than 'for' because of the overhead from sending M every time.

You could use my worker object wrapper which is designed for exactly
this sort of situation. See

<http://www.mathworks.com/matlabcentral/fileexchange/31972-worker-object-wrapper>

In your case, you could use it like this:

spmd
  M = <big data>;
end
M = WorkerObjWrapper(M);

for ...
  parfor ...
    a(y) = someFunction(M.Value, ...);
  end
end

By building M on the workers directly and building the WorkerObjWrapper
from the resulting Composite, the data is never actually transmitted
over the wire at any stage, so you should experience no problems with
the current 2GB transfer limit.

Cheers,

Edric.

Subject: Sharing big data across parfor or spmd

From: Chuck37

Date: 12 Dec, 2012 16:23:08

Message: 5 of 9

Yeah, I found an alternative also, using persistent variables. e.g.

M = <big mxn array>

function Mout = getM(m,n)
  persistent M
  if isempty(M)
   <load M from file>
  end
  Mout = M(m,n)
end

Each worker will load the persistent variable the first time it's called on the read from M (using getM instead of direct access).

This works but doesn't address the problem of having to store a big variable (e.g.) 12 times, using up the associated memory. I was mistaken, in my case my big array is 7.5 GB, so my machine can't even store it that many times over. For now I got around it because the matrix is sparse, so I can keep it around that way instead. Access is a little slower though I think. Still seems like workers on the same machine should be able to access the same memory.

Thanks.

Edric M Ellis <eellis@mathworks.com> wrote in message <ytwy5h3d5q1.fsf@uk-eellis0l.dhcp.mathworks.com>...
> "Chuck37 " <chuck3737@yahooremovethis.com> writes:
>
> > I have a big piece of never changing data that I'd like to be used by
> > all the workers in a parallel setting. Since I'm only working with
> > local workers, I don't understand why I have to eat the huge latency
> > associated with transferring the data to workers each time the parfor
> > loop is called. Can someone explain?
> >
> > I tried to use spmd to send data once at the beginning and have it
> > stay there, but the data is kind of big (~2 GB going to 10 workers),
> > and I got "error during serialization" anyway. Is there a solution to
> > this problem with local workers where they can all access the data
> > from the same memory? Accesses are infrequent, so simultaneous access
> > shouldn't cause a big slowdown.
> >
> > Any ideas would be great.
> >
> > My setup is something like this by the way:
> >
> > M = big data;
> > for x = 1:m
> > stuff
> > parfor y = 1:n
> > a(y) = function(M,a(y));
> > end
> > stuff
> > end
> >
> > Parfor is presently worse than 'for' because of the overhead from sending M every time.
>
> You could use my worker object wrapper which is designed for exactly
> this sort of situation. See
>
> <http://www.mathworks.com/matlabcentral/fileexchange/31972-worker-object-wrapper>
>
> In your case, you could use it like this:
>
> spmd
> M = <big data>;
> end
> M = WorkerObjWrapper(M);
>
> for ...
> parfor ...
> a(y) = someFunction(M.Value, ...);
> end
> end
>
> By building M on the workers directly and building the WorkerObjWrapper
> from the resulting Composite, the data is never actually transmitted
> over the wire at any stage, so you should experience no problems with
> the current 2GB transfer limit.
>
> Cheers,
>
> Edric.

Subject: Sharing big data across parfor or spmd

From: Haoran Xu

Date: 16 Dec, 2012 11:47:21

Message: 6 of 9

Hi! I have the same problem.
When I tried using
"spmd
M = <big data>;
end"
the program ends in error.
I suspect it is because big data too big (over 2.5G) for spmd.
Do you know any solutions?
Thanks!
Edric M Ellis <eellis@mathworks.com> wrote in message <ytwy5h3d5q1.fsf@uk-eellis0l.dhcp.mathworks.com>...
> "Chuck37 " <chuck3737@yahooremovethis.com> writes:
>
> > I have a big piece of never changing data that I'd like to be used by
> > all the workers in a parallel setting. Since I'm only working with
> > local workers, I don't understand why I have to eat the huge latency
> > associated with transferring the data to workers each time the parfor
> > loop is called. Can someone explain?
> >
> > I tried to use spmd to send data once at the beginning and have it
> > stay there, but the data is kind of big (~2 GB going to 10 workers),
> > and I got "error during serialization" anyway. Is there a solution to
> > this problem with local workers where they can all access the data
> > from the same memory? Accesses are infrequent, so simultaneous access
> > shouldn't cause a big slowdown.
> >
> > Any ideas would be great.
> >
> > My setup is something like this by the way:
> >
> > M = big data;
> > for x = 1:m
> > stuff
> > parfor y = 1:n
> > a(y) = function(M,a(y));
> > end
> > stuff
> > end
> >
> > Parfor is presently worse than 'for' because of the overhead from sending M every time.
>
> You could use my worker object wrapper which is designed for exactly
> this sort of situation. See
>
> <http://www.mathworks.com/matlabcentral/fileexchange/31972-worker-object-wrapper>
>
> In your case, you could use it like this:
>
> spmd
> M = <big data>;
> end
> M = WorkerObjWrapper(M);
>
> for ...
> parfor ...
> a(y) = someFunction(M.Value, ...);
> end
> end
>
> By building M on the workers directly and building the WorkerObjWrapper
> from the resulting Composite, the data is never actually transmitted
> over the wire at any stage, so you should experience no problems with
> the current 2GB transfer limit.
>
> Cheers,
>
> Edric.

Subject: Sharing big data across parfor or spmd

From: Haoran Xu

Date: 16 Dec, 2012 12:01:21

Message: 7 of 9

The spmd error is:
Error using distcompserialize
Error during serialization
So is this because the data is too large? Then how to solve this?
Edric M Ellis <eellis@mathworks.com> wrote in message <ytwy5h3d5q1.fsf@uk-eellis0l.dhcp.mathworks.com>...
> "Chuck37 " <chuck3737@yahooremovethis.com> writes:
>
> > I have a big piece of never changing data that I'd like to be used by
> > all the workers in a parallel setting. Since I'm only working with
> > local workers, I don't understand why I have to eat the huge latency
> > associated with transferring the data to workers each time the parfor
> > loop is called. Can someone explain?
> >
> > I tried to use spmd to send data once at the beginning and have it
> > stay there, but the data is kind of big (~2 GB going to 10 workers),
> > and I got "error during serialization" anyway. Is there a solution to
> > this problem with local workers where they can all access the data
> > from the same memory? Accesses are infrequent, so simultaneous access
> > shouldn't cause a big slowdown.
> >
> > Any ideas would be great.
> >
> > My setup is something like this by the way:
> >
> > M = big data;
> > for x = 1:m
> > stuff
> > parfor y = 1:n
> > a(y) = function(M,a(y));
> > end
> > stuff
> > end
> >
> > Parfor is presently worse than 'for' because of the overhead from sending M every time.
>
> You could use my worker object wrapper which is designed for exactly
> this sort of situation. See
>
> <http://www.mathworks.com/matlabcentral/fileexchange/31972-worker-object-wrapper>
>
> In your case, you could use it like this:
>
> spmd
> M = <big data>;
> end
> M = WorkerObjWrapper(M);
>
> for ...
> parfor ...
> a(y) = someFunction(M.Value, ...);
> end
> end
>
> By building M on the workers directly and building the WorkerObjWrapper
> from the resulting Composite, the data is never actually transmitted
> over the wire at any stage, so you should experience no problems with
> the current 2GB transfer limit.
>
> Cheers,
>
> Edric.

Subject: Sharing big data across parfor or spmd

From: Chuck37

Date: 16 Dec, 2012 17:08:18

Message: 8 of 9

If you can stand storing it N times, then my persistent variable trick seems to work. See my previous post in this thread.

"Haoran Xu" <haoran.x@gmail.com> wrote in message <kakd6h$elq$1@newscl01ah.mathworks.com>...
> The spmd error is:
> Error using distcompserialize
> Error during serialization
> So is this because the data is too large? Then how to solve this?
> Edric M Ellis <eellis@mathworks.com> wrote in message <ytwy5h3d5q1.fsf@uk-eellis0l.dhcp.mathworks.com>...
> > "Chuck37 " <chuck3737@yahooremovethis.com> writes:
> >
> > > I have a big piece of never changing data that I'd like to be used by
> > > all the workers in a parallel setting. Since I'm only working with
> > > local workers, I don't understand why I have to eat the huge latency
> > > associated with transferring the data to workers each time the parfor
> > > loop is called. Can someone explain?
> > >
> > > I tried to use spmd to send data once at the beginning and have it
> > > stay there, but the data is kind of big (~2 GB going to 10 workers),
> > > and I got "error during serialization" anyway. Is there a solution to
> > > this problem with local workers where they can all access the data
> > > from the same memory? Accesses are infrequent, so simultaneous access
> > > shouldn't cause a big slowdown.
> > >
> > > Any ideas would be great.
> > >
> > > My setup is something like this by the way:
> > >
> > > M = big data;
> > > for x = 1:m
> > > stuff
> > > parfor y = 1:n
> > > a(y) = function(M,a(y));
> > > end
> > > stuff
> > > end
> > >
> > > Parfor is presently worse than 'for' because of the overhead from sending M every time.
> >
> > You could use my worker object wrapper which is designed for exactly
> > this sort of situation. See
> >
> > <http://www.mathworks.com/matlabcentral/fileexchange/31972-worker-object-wrapper>
> >
> > In your case, you could use it like this:
> >
> > spmd
> > M = <big data>;
> > end
> > M = WorkerObjWrapper(M);
> >
> > for ...
> > parfor ...
> > a(y) = someFunction(M.Value, ...);
> > end
> > end
> >
> > By building M on the workers directly and building the WorkerObjWrapper
> > from the resulting Composite, the data is never actually transmitted
> > over the wire at any stage, so you should experience no problems with
> > the current 2GB transfer limit.
> >
> > Cheers,
> >
> > Edric.

Subject: Sharing big data across parfor or spmd

From: Markella

Date: 6 Aug, 2013 08:48:06

Message: 9 of 9

Hi Chuck,
Did you try using distributed array? I am having the same issue - I need to process a rather large array but only element by element so I am trying to figure out whether ditributed arrays work better here.
Thanks

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us