Why does my MDCS cluster on AWS Cloud Center fail to start up, and gives a warning about Insufficient AWS Resources

16 views (last 30 days)
I am running an MDCS cluster on the Amazon cloud through MathWorks Cloud Center:
The problem is that code that was previously running fine on the Amazon hosted cluster, suddenly stopped working after I tried running with more workers.On checking with Amazon Web Services (AWS) it seems that the workers and the cluster is being closed automatically by Cloud Center.
Why is MATLAB / Cloud Center doing this? How can I run my cluster with the desired number of workers?

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 26 Aug 2015
If you log into Cloud Center, you can see the following error message next to your cluster:
Warning: Insufficient AWS Resources
You have requested more instances (6) than your current instance limit of 5 allows for the specified instance type. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit. Launching EC2 instance failed.
This issues comes down to what an "instance" actually is.
A node is the physical computer in the cloud server. It can run a maximum of 16 workers. An instance is a node that is currently active.
You may have a license for a large number of workers (e.g. 256), this means that you can have 256 workers each running a version of MATLAB. However, Amazon still owns the hardware these workers are running on. If you have 256 workers licensed but Amazon only allows 1 instance, you will only be able to use a maximum of 16 workers because that's all you have the hardware for.
The reason you get this message is that you are requesting 256 workers, but Amazon only allows you to have 5 instances, which is a maximum of (5 * 16 = ) 80 workers. As you start up a cluster of 256 workers, the start up process fails after creating the head node plus 4 instances because you run out of instances on Amazon. When this happens, Cloud Center is being prevented from fulfilling your request for 256 workers. Instead of returning a cluster which is smaller than requested, it closes down the whole cluster and deletes the workers from the cloud. This was a design decision by the developers so that you do not end up with a cluster that is smaller than expected.
The issue you are facing is caused by Amazon preventing you from starting the number of instances you need. To fix this, you need to visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to your instances limit.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!