Gaussian Process-based constrained Bayesian optimization does not work well when the feasible region occupies such a small fraction of the search space. The GPs will have trouble modeling the feasible region. Also, our implementation will have numerical difficulties finding feasible points. In this situation we recommend that you try to redefine your space so that the feasible region is a larger fraction of the space.
For optimization with "unknown constraints" (where constraints must be learned from function evaluations), bayesopt() uses the algorithm of Gelbart, M., J. Snoek, R. P. Adams. Bayesian Optimization with Unknown Constraints. http://arxiv.org/abs/1403.5607, 2014.
At every step, to choose the next point, it tries to maximize the constraint-weighted acquisition function, which is basically the acquisition function multiplied by the probability the the constraint is satisfied according to the current GP constraint model. This is a global optimization problem. Our implementation chooses 10,000 random points from the space, evaluate the constraint-weighted acquisition function, then chooses the best 10 and does a local optimization of the same function from each point. If the proportion of the space in which the constraint is satisfied is 10^-5, then the probability that any of the 10,000 points will be feasible is less than 0.1. That's one problem.
A second problem is that it may be difficult for a global GP model over the space to model the feasible region since it occupies such a small fraction of the space. The GP kernel can only handle constraint function changes up to a certain steepness. If your function is very steep it may not be learnable by the GP.