The example does not match the solutions. First puzzle(30,3) has [0 1 1 0 ...] as next to last row with an expected [0 1 0]. The example shows this as [1 1 0]

Tim created another superb solution. The highlights are an elegant center of square's determination using ndgrid. The crux of his method is a convolution for each square not using "same" with an interesting centroid kernel. The comparison between all square convolutions for all rotations utilizes a concise norm metric function. The method should work on non-binary images. The best six matches from the 180x180 upper triangle error array are handily converted into the output format. Thank You Tim for this elegant solution.