Training performance varies because the default train/val/test data division AND initial weights are pseudorandom.
One of many solutions for sufficiently large data sets.
1. Initialize the RNG so that the same stream of pseudorandom numbers can be repeated.
2. Design 10 or more nets
3. Choose the net with the smallest validation (NOT TRAINING) set error.
4. Estimate the performance on unseen data with the test set error.
5. If performance is unsatisfactory, try increasing the number of hidden nodes.
How large are X and T?
Hope this helps.
Thank you for formally accepting my answer