Spectrogram (Standard Performance)

Craig Maier

Major Contributor

Join Date: Oct 1999

Posts: 11201
- Share
- Tweet
#1

Spectrogram (Standard Performance)

06-25-2023, 01:03 PM

Spectrograms

Both the DCArt10 and the DCForensics10 Audio Laboratory versions include spectrogram displays (sometimes referred to as “spectrographs”). The DCArt10 version includes a Standard Definition system while the DC Forensics10 Audio Laboratory version includes a High Definition system in which the user can optimize and tradeoff between optimal frequency or time resolution displays. Since the High Definition Spectrogram is similar in overall operation to the High Definition system, it is recommended that the user should read the section pertaining to the Standard Definition system before proceeding to the High Definition section of this users guide.
View Spectrogram

A Spectrogram (or spectrograph) provides a method for displaying waveform data including Time, Frequency and Amplitude (Loudness) all on the same graph. Time is represented on the horizontal (X) axis, frequency is represented on the vertical (Y) axis, while Amplitude is represented by color intensity or gray scale/brightness. The Spectrogram displays itself in the Destination window when this is checked. The Spectrogram is calculated for whatever waveform is displayed, highlighted and zoomed in on in the Source window and is time aligned with the same. What follows is a description of the Standard Definition Spectrogram found in the DCArt10 version of the product.

Figure 93 - The DC Forensics Spectrogram View
Display Controls:
Right Vertical Slider: This slider sets the Maximum Amplitude Level (limit) used for the modulation of the spectrograms Z Axis (chroma level/brightness level). Its range of adjustment is from 0 dB down to as low as – 90 dB. Signals above its setting are suppressed. This applies to the High-Definition version of the Spectrogram found in the Forensics product. Note that this control operates differently compared to the Standard spectrogram’s Right Vertical Slider.

Left Vertical Slider: This slider sets the Minimum Amplitude Level (limit) used for the modulation of the spectrograms Z Axis. Its range of adjustment is from - 25 dB down to as low as - 125 dB. Signals below its setting are suppressed. This applies to the High-Definition version of the Spectrogram found in the Forensics product. Note that this control operates differently compared to the Standard spectrogram’s Left Vertical Slider.

Zoom In*: Allows you to zoom-in on a portion of the .wav file as displayed in the Source window. After zooming is completed, the system will re-calculate the spectrogram for the zoomed-in segment displayed in the destination window.

Zoom Out*: Allows you to zoom-out from a portion of the .wav file as displayed in the Source Window. Similarly, the system will re-calculate the spectrogram for the zoomed-out displayed data in the destination window.

The Spectrogram Display Preferences Menu
Display Preferences:
You can choose between a number of preferences associated with the spectrogram under the Preferences menu found under “Edit” or by left mouse - double clicking on the spectrogram display area. The following preferences are available to you:
Frequency Axis Selector:
Linear

Log

Amplitude Axis (Z Axis or Chroma/Intensity Modulation)

A. Linear
B. Gamma Scaling (Co-efficient of Non-linearity ranging from 0.1 to 10 with 1.0 being linear)
FFT Size:

Choose between 32, 64, 128, 512, 1024, 2048, and 4096. The Forensics High Definition version allows more choices, including 8192, 32768, 65526 and 131072. Small values provide fast FFT update time, while large FFT sizes provide improved frequency resolution. The basic frequency resolution is the FFT size/2.
Color Palette:

You have the choice of the following color gradients:
[*=2]Grayscale [*=2]White to Blue [*=2]White to Red [*=2]White to Green [*=2]White to Red to Blue [*=2]White to Green to Red [*=2]White to Red to Blue to Black [*=2]White to Yellow to Red to Black [*=2]White to Yellow to Green to Aqua to Blue [*=2]Black to Blue to White [*=2]Black to Green to White

Inverse Palette:

This inverts the polarity of the video signal providing a different visual perspective of the spectrogram which sometimes is more revealing than the normal polarity. For example, on the grayscale, black become white and white becomes black when the Inverse Palette checkbox is checked.
Display Frequency Range

[*=2]Enter Value for Minimum Frequency in Hz. [*=2]Enter Value for Maximum Frequency in Hz. which is limited to the file Sample Rate / 2.

Display Frequency Labels

This feature turns the Frequency Labels along the Vertical axis On or Off.

*Note: For more information on methods for Zooming-In and Zooming-Out, please refer to that section of the User’s Guide.

Important Note: The Sync files feature found under the View menu must be enabled so that the Spectrogram operates properly (stays in sync as you zoom the time display) and should be used in Classic Edit mode.

Spectrograms are useful for applications like spectrographic voice recognition or comparison (sometimes referred to as “voice-printing.”) Physiologically, speech is produced by the interaction of two mechanisms consisting of resonance and articulation. Resonance is produced by the nasal, pharyngeal and oral passages while articulators are produced by the jaw muscles, lips, teeth, tongue, and the soft palate. The human voice is acoustically modeled as a 4^th order cascaded resonant system with an excitation signal called F0 (produced by the vocal cords). These acoustical signatures are referred to as formants. There are generally 5 formants (acoustical signatures) that are identifiable starting with the fundamental which is usually designated as F0. Resonances produce formants designated as F1 through F4 are generally higher in frequency than the fundamental (F0). All of these formant frequencies lie somewhere below around 3000 Hz. F0 generally falls between around 70 Hz through around 270 Hz. Typically, audio samples that are around 2.5 seconds or less in length with the frequency display range showing information from 100 Hz to somewhere between 3 kHz to 6 kHz are used for vocal comparisons. The so-called English “cue words,” often used for comparison are as follows:

{The, To, And, Me, On, Is, You, I, It, A}

Here is a sentence that you can experiment with that incorporates all of the English cue words:

“It is important that I go to the bank on Friday to get a check for you and me”.

There are three pairs of demo files in the DCForensics10 demo wavefiles directory which express the above English sentence. Three pairs of files were made for user testing and experimentation. They include a male voice, a female voice and also that of a child’s voice. Each pair of files were recorded simultaneously; in each case one file was recorded through a low quality signal path while the other was recorded via a higher quality signal path. You can use these files in conjunction with the Spectrogram (and the Voice ID System) to study the differences between male, female and child voices expressing the same exact word or the entire sentence. You can also study and compare those same voices as recorded by high and low quality recording systems. The files are as follows:

Female Voice ID Test Sentence - High Quality.wav
Male Voice ID Test Sentence - High Quality.wav
Female Child (12) Voice ID - High Quality.wav
Female Voice ID Test Sentence - Low Quality.wav
Male Voice ID Test Sentence - Low Quality.wav
Female Child (12) - Low Quality.wav

To perform a voice-print comparison, it is necessary to observe the voice same words contained on the two specimens (the display tile feature is helpful for this purpose). Your Diamond Cut Voice ID system is a useful tool for this purpose. You can use it to compare the vowel (generally F1 and F2) and consonant (generally F2, F3 and F4) formants of the human voice. Formant creation is via word highlighting in the spectrogram view. Highlight the area of the spectrogram containing the word of interest and then apply the Voice ID function. The formants shall be plotted on the top of the spectrogram view. It is beneficial if the same recording equipment was used to record both samples of data to be compared, however impractical in most situations. It is also beneficial if the same ambient sound conditions are presented on the two samples. If there is a lot of background noise, consider applying one of the Speech filters before measuring the spectrogram. Speech filters can be found in the Band-pass filter preset menu and the Forensics Brick Wall filter preset menu. See those specific sections of the user’s guide for details. Lastly, the emotional state of the person(s) making the expressions should be similar. If one specimen, for example, has the person screaming, and the other has the person sobbing or whispering, it will be difficult to draw any conclusions with a reasonable degree of certainty. Please note that spectrographic “voice-printing” is not admissible evidence in all court systems in the United States. Contact the court system or a legal expert in your area for details.

Note 1: Hot key access to the Spectrogram is available via ALT+"S". It toggles between the Spectrogram Display and the software’s Normal Display mode. Alternatively, you can use the “Esc” key to simply turn off the spectrogram.
Note 2: To print a spectrogram, use the Print commands found under the File Menu.

"Who put orange juice in my orange juice?" - - - William Claude Dukenfield
Tags: conventional spectrogram, freq vs time vs intensity, spectrogram, standard spectrogram, z axis modulation

Previous template Next

Announcement

Spectrogram (Standard Performance)