My way to fix this code:
1. when input image size<256, error occurs...
you can change 256 in the for loops to nX or nY
2. when image size is too small, registration accuracy gets worse..
when image size is too small. the 256-bin histogram gets less samples. We can use less intensity bins, say, change from uint8 (0-255) to(0-80).