Why do we need to flip the kernel in 2D convolution in the first place? What's the benefit of this? So, why can't we leave it unflipped? What kind of terrible thing can happen if you don't flip it?
SEE: "First, flip the kernel, which is the shaded box, in both horizontal and vertical direction"
It's not meant to be a "benefit" or to avoid disastrous consequences. It's meant to be a definition. If you don't flip, then you violate the agreed upon definition of convolution. Convolution without the flip has a name of its own: correlation.
What motivated people to define convolution with a flip? Well in 1D, it means, for example that the convolution of causal signals will also be causal. Also, when you flip, then the convolution with an impulse response function of a system gives you the response of that system. If you don't flip, the response comes out backwards.
Why do the same in 2D? Using a different definition in 2D would make it inconsistent with 1D.
It doesn't need to be flipped, at least not by you. You pass in the array and the flipping is done internally, automatically, because that's the definition of convolution. If it didn't flip, it would be correlation, not convolution. If you flipped it before passing it into conv2(), then you'd be doing a correlation instead of a convolution because the internal flip in counteracted by your advance manual flip. If you want, you can use imfilter() or xcorr2() which do no flip internally.