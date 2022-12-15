All the Stable Diffusion features remain

The tweaks have enabled the system to now have unlimited variations of the prompt. This is accomplished by varying the seed. The out-of-the-box features are all there, including img2img, negative prompts, interpolation, and negative prompts.

What is a spectrogram?

Spectrograms are visual representations of audio sound waves, like someone singing or talking. In an audio spectrogram, the sound is represented and mapped on a graph. The X-Axis is the time duration, and the Y-Axis is the frequency of the sound.

Each frequency can have a time designation and is represented by the color of the pixels, which gives the amplitude. The time goes by the row and column in the image.

Stable Diffusion uses a Short-time Fourier transform (STFT) to compute the spectrogram image, The STFT approximates the sound using a series of sine waves in the various phases and amplitudes.

The STFT denotes the frequency and phase content of local signals as they change over time. These signal variations can be calculated, inverted and then displayed in a spectrogram.

How a spectrogram becomes sound

In Stable Diffusion's model the amplitude of the sine wave, but not the phases of the audio. This is due in large part to the chaotic nature of phases. This shifting of phases is hard for the AI to learn.