I second Ashish's comment. It sounds to me that the main concern here is the ever-changing test set and the non-deterministic nature of the algorithm.
TDD and unit test can mean a lot of things, but in general we want things to be fixed and deterministic so we can test them. In that case one needs to try to eliminate variables from the system. The ground truth approach suggested by Ashish is one way of doing that. Another example would be incorporating statistics into your expected results. Let's say your algorithm may produce variable results on a single/small set of images, then perhaps increase the sample size and obtain a fixed lower/higher bound according to your requirement.