We start with a sound file, and create a spectrogram out of it. If you looked at the first page of google results for "spectrogram", you would quickly decide they're only meant to be comprehended by math Ph.D.'s. (This is a shame, because spectrograms are so neat just on their own. There's a lot to them.) But in the end, they're as simple as this: the top of a spectrogram is the high tones (think mosquito in your ear), and the bottom is the low bass tones (think bass singer, singing his lowest note). Here is a spectrogram of me speaking the letters of my name "B-R-I-A-N". (mp3 file here.)
[Glossing over some of the fascinating and peculiar things about spectrograms for the time being*,]
Now, the low (bottom) pitches should map to the lowest frequency of light the eye can see: Red. The high pitches should map to the highest visible frequency of light: Blue/Violet. In between, should be all the colors of the rainbow (or the closest approximation a computer monitor can make).
Next, for each vertical slice (pixel) of the image above, we get the average color from bottom to top. Like mixing paint, this is what the eye would see if these colors were seen together. The progression below illustrates this "averaging" step.
Finally, we have a completed image, a stream of sound on the spectrum of visible light.
* For the mathematicians, the spectrogram above goes from 25 hz sound on the bottom to 10000 hz on top -- on, oddly,