Hardware design and implementation of an algorithm is all about finding alternatives to achieve some design goals like higher performance, low area or low power etc.,
Often when designing hardware algorithms one would stumble upon an expensive operation like division or modulo and would be hard pressed to replace it with a low cost alternative like repeated subtraction to meet the design goals.
Consider (m mod n) operation as shown below
(m mod n) = (m - (n * floor(m/n))
In the above expression division operator consumes lots of FPGA resources and creates long critical paths reducing performance of the algorithm.
This demo shows how to compute modulo of in a hardware friendly fashion by avoiding expensive division. It shows how to compute mod3 of a 32 bit number as a tree of mod3 operations on a 4bit numbers which can be implemented as an inexpensive multiplexer.
This algorithm breaks the binary number (m) into equal number of chunks and computes modulo(n) on the smaller chunks, concatenates the resultant values and repeats;
This algorithm uses bitslice and bitconcat functions to extract smaller chunks (nibbles in this case) and a simple switch to calculate the mod3 of a 4bit slice;