Convolution is associative:
So, your 7x7 convolution kernel would be the convolution of the mean filters:
Use the 'full' option of conv2 to do this.
Note: I wouldn't necessarily expect convolving once with 7x7 to be faster than convolving three times with 3x3 filters. Convolving with a 7x7 filter is 49 multiplications and adds per pixel, whereas convolving three times with 3x3 filters is 3 times 9 = 27 multiplications and adds per pixel.
There is some overhead (such as memory allocation and input processing) associated with each convolution operation, so execution time is not just about counting floating-point operations. Generally, though, decomposing a larger filter into a sequence of smaller filters is a technique used to speed up operations, not slow them down.