There is no a priori way to optimize the number of hidden neurons for 1 hidden layer, much less 3. However, you can get a good estimate for the minimum number of the former via trial and error. Increasing the number of hidden layers tends to reduce the total number of hidden neurons. So, maybe a first step would be to design a single hidden layer model first.
A priori information can help, especially with classification where it is known that each class consists of a number of known subclasses. Then a divide and conquer approach can be followed. I have only used this with elliptical basis functions (most of the time with radial basis functions). A first step in this case could be the clustering of each class into subclasses. I can't say much more without revealing proprietary info.
Both clustering and principal component decompositions help understand the data. Look at those first before determining how to construct a divide and conquer approach.
Also take a look at cascade correlation.