You can avoid the for loop altogether by vectorization of your code. MATLAB has a very efficient implementation for vector operations. Your speed will increase several times. Here is the vector version of your code, replace all the for loops with this.
[phi_, thta_, nx_, ny_] = ndgrid(phi, thta, 1:Nx, 1:Ny);
step1 = exp(1i*((nx_-1).*k.*dx.*sin(thta_).*cos(phi_)+...
step2 = wmn*sum(step1,4);
final = abs(sum(step2, 3));
Speed Comparison: On my machine, it got almost 5x speed gain
Elapsed time is 1.840009 seconds.
The vectorized version:
Elapsed time is 0.362373 seconds.
However, this approach will require more memory because of the creation of 4D vectors. This is a trade-off between speed and memory.