Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Connection pooling & serializing connection for parfor

Subject: Connection pooling & serializing connection for parfor

From: yash

Date: 21 Jul, 2010 17:36:11

Message: 1 of 7

Hello,

I'm writing a matlab program that fetches lots of data from a
database. Given the structure of the tables and the nature of the
data, I'm forced to run multiple queries multiple times. For instance,
I may decide that I want to get a dozen attributes for each of 1000
stocks. Getting each attribute requires a separate DB query.

In order to get better performance, I'm using a parfor loop to loop
through the 1000 stocks and fetch data. Since I'll have multiple
parallel loops getting data, I don't want to have them all use a
single connection, and so I create a connection in the parfor loop and
close it before exiting.

As you can imagine, the repeated opening and closing of connections is
very costly and slows down the program. Ideally I'd love to leverage
some kind of connection pooling to avoid the connect/disconnect. I see
that there's no native connection pooling in Matlab and so wrote my
own class to do so.

The issue now is that I believe Matlab serializes objects when passing
data to workers using parfor. The connection object can't be
serialized. So, when the code in my parfor loop tries to request a
connection from the pool, it fails since the pool would have to be
serialized for the worker to see it, and connections can't be
serialized.

Thus, I can't use a connection in the parfor loop that was created
outside the loop, and creating connections in the loop is very costly.
Does anyone have any ideas on how I can implement some kind of
connection pooling in this scenario, or some other way to speed things
up by avoiding the frequent connect/disconnect.

FYI - the database is SQL 2008, which does not implement connection
pooling on the server side. If it did, I could leave the pooling to
SQLServer and not have this issue...

Thanks!

Subject: Connection pooling & serializing connection for parfor

From: Abhishek

Date: 26 Jun, 2012 21:27:07

Message: 2 of 7

@Yash, were you able to find a solution on how to pass the connection objects into a parfor loop? I have a very similar issue. Thanks.

Subject: Connection pooling & serializing connection for parfor

From: Markella

Date: 2 Aug, 2013 13:07:07

Message: 3 of 7

Hello,
I can't get fetch to work in a parfor loop - even if I initiate the connection within it, how did you manage to get the results back?

Regards
Markella

Subject: Connection pooling & serializing connection for parfor

From: Edric M Ellis

Date: 6 Aug, 2013 13:19:13

Message: 4 of 7

"Markella " <markella.skempri@onzo.com> writes:

> Hello, I can't get fetch to work in a parfor loop - even if I initiate the
> connection within it, how did you manage to get the results back?

I'm not sure what you mean: please could you post an example of some
code that you think should work and doesn't.

Cheers,

Edric.

Subject: Connection pooling & serializing connection for parfor

From: Markella

Date: 6 Aug, 2013 14:19:05

Message: 5 of 7

Thanks Edric - i didn't realise that in the end I had to pass and load the drive to every worker in order to initiate a connection. I have now included it in the worker startup file and I am waiting to try it out.
Regards

Subject: Connection pooling & serializing connection for parfor

From: Gilles

Date: 8 Oct, 2014 14:32:16

Message: 6 of 7

"Markella " <markella.skempri@onzo.com> wrote in message <ktr0kp$6sl$1@newscl01ah.mathworks.com>...
> Thanks Edric - i didn't realise that in the end I had to pass and load the drive to every worker in order to initiate a connection. I have now included it in the worker startup file and I am waiting to try it out.
> Regards

Dear all,

I encounter the same problem when trying to fetch data retrieved by a sql queries on a microsoft sql server database.

Here's what I do (using ODBC driver - but I get the same error using JDBC):
In the following I want to retrieve data specifi to each client defined by their client number (num_cli) (each client has a different number of lines that I finally reshape - this part of the code is not shown here since it is outside of the parfor loop.
For info, I have N = 1.2*10^6 clients (approximatively 6.6*10^6 lines) - thus a quite big amount of data: that is why I would like to parallelize queries.

Note hat the connection to the database is made within the PARFOR loop.

% The Work:
% -------------

x = []; % initialization

parfor ii = 1:N
    
    Num_Cli = number(ii);
    % Connect to DB
    conn = database('data_base', 'user_name', 'pass_word');

    % Read data from database.
     curs = exec(conn, ['SELECT * FROM table_of_database '...
                                 ' WHERE table_of_database .NUM_CLI=', int2str(Num_Cli) , 'table_of_database .filed1= ''aaa'' '] ) ;
     curs = fetch(curs);
    close(curs);
    x = [x ; curs.Data]; % concatenate data
    close(conn);

end

The PARFOR loop stops, with an error:
An UndefinedFunction error was thrown on the workers for 'fetch'. This might be because 'fetch' is not accessible on the
workers. Use addAttachedFiles(pool, files) to specify the required files to be attached. See the documentation for
'parallel.Pool/addAttachedFiles' for more details.
Undefined function 'fetch' for input arguments of type 'struct'.

For info, I use the MDCS to distribute my jobs to a cluster. The MDCS deals with all licenses used in the database toolbox, etc.
Therefore, I should not get the error of "fetch is an unknown function"...


Does anybody has the answer to this issue? Or at elast encountered the same problem?
Thanks a lot for your answers !

--
Guillaume

Subject: Connection pooling & serializing connection for parfor

From: Raymond Norris

Date: 8 Oct, 2014 21:32:21

Message: 7 of 7

I'm guessing that the call to database failed. Look at the message from conn:

conn.Message

If it looks ok, then what does the message from curs look like:

curs.Message

If conn is invalid, then MATLAB won't find an overloaded function for the string value of conn; hence the error message 'Undefined function...'

My suggestion anytime you interface with the file system or a 3rd party application is to check the return string/argument for success, then proceed.

I saw a similar issues with a user using the Database Toolbox on a cluster years ago. The issue was that the username/password wasn't registered on the local database running on the cluster.

Raymond

"Gilles " <gdefrance@lefigaro.fr> wrote in message <m13htg$99g$1@newscl01ah.mathworks.com>...
> "Markella " <markella.skempri@onzo.com> wrote in message <ktr0kp$6sl$1@newscl01ah.mathworks.com>...
> > Thanks Edric - i didn't realise that in the end I had to pass and load the drive to every worker in order to initiate a connection. I have now included it in the worker startup file and I am waiting to try it out.
> > Regards
>
> Dear all,
>
> I encounter the same problem when trying to fetch data retrieved by a sql queries on a microsoft sql server database.
>
> Here's what I do (using ODBC driver - but I get the same error using JDBC):
> In the following I want to retrieve data specifi to each client defined by their client number (num_cli) (each client has a different number of lines that I finally reshape - this part of the code is not shown here since it is outside of the parfor loop.
> For info, I have N = 1.2*10^6 clients (approximatively 6.6*10^6 lines) - thus a quite big amount of data: that is why I would like to parallelize queries.
>
> Note hat the connection to the database is made within the PARFOR loop.
>
> % The Work:
> % -------------
>
> x = []; % initialization
>
> parfor ii = 1:N
>
> Num_Cli = number(ii);
> % Connect to DB
> conn = database('data_base', 'user_name', 'pass_word');
>
> % Read data from database.
> curs = exec(conn, ['SELECT * FROM table_of_database '...
> ' WHERE table_of_database .NUM_CLI=', int2str(Num_Cli) , 'table_of_database .filed1= ''aaa'' '] ) ;
> curs = fetch(curs);
> close(curs);
> x = [x ; curs.Data]; % concatenate data
> close(conn);
>
> end
>
> The PARFOR loop stops, with an error:
> An UndefinedFunction error was thrown on the workers for 'fetch'. This might be because 'fetch' is not accessible on the
> workers. Use addAttachedFiles(pool, files) to specify the required files to be attached. See the documentation for
> 'parallel.Pool/addAttachedFiles' for more details.
> Undefined function 'fetch' for input arguments of type 'struct'.
>
> For info, I use the MDCS to distribute my jobs to a cluster. The MDCS deals with all licenses used in the database toolbox, etc.
> Therefore, I should not get the error of "fetch is an unknown function"...
>
>
> Does anybody has the answer to this issue? Or at elast encountered the same problem?
> Thanks a lot for your answers !
>
> --
> Guillaume

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us