If you have multiple multispectral images then it's a 4D image, for example a stack of color confocal microscopy images where each slice/layer is a 3 color (RGB) image. Or the Visible Human though they may store their data in some special format on disk.
Another example would be a color video where each color frame is a 3D image but you have multiple frames taken at different times. So the 4th dimension is time.
It depends. The 4th component could mean an alpha channel, see "RGBA". Or the image is not stored in RGB colors, but in the CMYK color model. Or you have 4 different layers of grayscale images, e.g. stored in a TIFF file. Or it is any other 4 channel image taken by a camera which records 4 different wavelengths, perhaps green and 3 different infra-read-channels.