Voice ID (Forensics versions only)

Craig Maier

Major Contributor

Join Date: Oct 1999

Posts: 11202
- Share
- Tweet
#1

Voice ID (Forensics versions only)

06-25-2023, 04:14 PM

Voice ID

(Forensics Version Only)

Your Diamond Cut Forensics10 Audio Laboratory software contains a special feature called “Voice ID” to help you identify and rank the speech formants* of a highlighted cue word or verbal expression. It also includes the ability to display the frequency response, power cepstrum,* (cepstromgram) and complex cepstrum* graphs of the highlighted signal simultaneously.

The Voice ID Display Dialog Box
This feature works in conjunction with your high resolution spectrogram and is found under the Forensics menu. The Voice ID feature allows you to highlight a cue word or phrase and then (after activating the Spectrogram) it finds and automatically ranks the various formants contained in that portion of the .wav file. It also provides you with a frequency response and cepstrum graph within the Voice ID dialog box. The system can measure the formants and cepstrum for time intervals up to 5 seconds based on the highlighted sector of the file. Most useful voice vowel formant identifications are performed on a much smaller range of time, usually in the 40-100 mSec range.

You can choose the number of formants that you want to be identified and ranked. Formants are always sorted based on the highest average amplitude of the formant. You can choose between having these results displayed in an Amplitude or Frequency priority order. You can also choose which formant number you want the ranking to commence with by choosing between “Start at F1” or “Start at F0”. Maximum and Minimum formant frequencies can also be defined by these features found under the “Settings” dialog box.

The system will generally show the first formant as F0 on the spectrogram (and Num 0 in the table of values). This F0 generally represents the fundamental excitation frequency. The component signals (formants) that comprise a cue word or phrase are displayed in terms of start and stop frequency, amplitude and start time by Fn.

Sometimes, you will encounter noisy files which can confuse the Voice ID System. In these cases, it may be useful to limit the range of frequencies used by the system or to raise the “Spectral Smoothing” value. By default the Voice ID frequency range is set to a lower limit of 100 Hz and an upper limit of 5,500 Hz. You can access and change the various internal parameters of the Voice ID system by using the Settings button on the Voice ID dialog. You can restore the Voice ID system to the factory settings by clicking on the “Restore Defaults” button.

The Voice ID Settings Dialog Box
It is recommended that files should have a sample rate of at least 22 kHz, with 44.1 kHz being a better choice. If necessary, you can convert the file using the file “Change Sample Rate/Resolution” feature found under the Diamond Cut Edit Menu.
You can adjust some of the internal parameters associated with the Voice ID function by clicking on the “settings” button. The Voice ID Settings Dialog Box will appear giving you control over the time window aperture, the spectral smoothing degree, the maximum formant frequency sought after and the application of pre-emphasis. The Window length is basically the signal frequency / time parameter of the Voice ID system. The most common value used is 20 mSec; sometimes it may be useful to try other values depending on the length of the phoneme of interest.

The Pre-Emphasis option is available via the settings dialog. This applies a +1 slope (+6 dB/Octave) from 75 Hz to 5,000 Hz to the signal being analyzed. This is used to compensate the natural roll-off of the vocal tract and flatten the vocal formant spectrum.

After the “Find Formants” has been “clicked” and the calculations have been completed, each formant and its trajectory is displayed on the spectrogram as a number displayed in a rectangular box. They are annotated with both time and frequency coordinates that correspond to the values displayed in the Voice ID dialog box (“Num” column in the table of values). “Start” and “End” frequency values for each formant are annotated in the table of values. Average amplitude values for each formant are given in dB relative to 0 dB which is the maximum value that can be displayed. Also, a column called “Delta Amp” (Delta Amplitude) displays the various formant amplitudes normalized to 0 dB for easier comparison to a reference cue word or phrase.

All of these data can be exported to a text file so that further analysis can be performed with such programs as an Excel or other equivalent data analysis systems. The data that is exported is the frequency and time values for each of the formant tracks.

The software supports two extensions, .txt and .csv, and they both provide you with the same exportable data. Csv (comma-separated values) files are comma delimited while .txt files are tab delimited. To export the data, just click on the “File” button, then select “Export to a file”. You can set the file path and extension to go to the directory of your choice. You can also choose “copy to clipboard” if you want to bypass writing the information to a file and copy it directly to another program.
Two ceptstrum graphs and a frequency response plot are also provided with the Voice ID feature. The highlighted portion of the file’s Frequency response shows the relative amplitude in dB plotted vs. frequency while the power and/or complex cepstrum can be simultaneously displayed in terms of amplitude vs. quefrency*. Use the “Graph” checkboxes to select the desired graphing mode. The Frequency Spectrum will be drawn in Yellow, the Power Cepstrum in Red and the Complex Spectrum will be drawn in White. You can select any and/or all of these graphical modes depending on your needs. The smoothing control applies to these graphs with higher values producing higher levels of graphic smoothing. The smoothing scaling factor runs from 0 (which produces no smoothing) to 20 (which results in the maximum degree of smoothing). This smoothing control also affects the formants detection functionality.

Six voice ID test files are provided including male (adult), female (adult) and female child in high and low quality versions. All files use the same test phrase for ease of comparison. These files can be found under the File menu/Open Demo Wave Files menu structure. All files include “Voice ID” as part of their file names for ease of location.

The Voice ID System Operating Procedure(s)

1. Bring up the file of interest in the time domain display.
2. If it is not 44.1 kHz, use the change sample rate feature to change it to 44.1 kHz. This feature is found under the Edit menu.
3. Optionally, go to the “View” menu and bring up the “Time Display” which makes it easier to see your highlighted “Span” time.
4. Go to the “Forensics” Menu and click on the “View Spectrogram” item.
5. Optionally modify the spectrogram properties by using the right mouse button menu to Edit the spectrogram properties.
6. Listen to the file and then highlight the cue word or phrase of interest on the spectrogram.
7. Click on the “Voice ID” function found under the Forensics Menu.
8. Set up the various “Voice ID” parameters to your preference. Generally, the Sort should be set for “Frequency” and the Max Formants would generally be set for 5 (which will allow the system to identify F0 through F4).
9. If you are interested in Frequency Spectrum or Cepstrum graphs, check the appropriate checkboxes in the “Voice ID” dialog box. If not, leave all checkboxes unchecked.
10. Lastly, click on the “Find Formants” button in the “Voice ID” dialog box.
11. The system will then calculate the various Formants and display their various numerical attributes in the table of values within the “Voice ID” dialog box.
12. Please be aware that the voice formant frequencies are very sensitive to the exact location of the selected speech. Small variations in time can cause different formants to be found. Likewise the window length time (under the settings dialog) will affect the type of formants found. Typical analysis is done with a 20 ms window (aperture).
12. The trajectories of the various Formants will appear on the Spectrogram along with their numerical formant labels.
13. If you want to analyze this data statistically or in any other way, you can click on the File->Export button and a file will be created which can be used external to your Diamond Cut software.
*Note 1: Definitions of Power Cepstrum, Complex Cepstrum, Formants and Quefrency can be found in the glossary section of this documentation.
Note 2: The Voice ID dialog box is user sizable; just use your mouse to drag the box margins to create the size that you desire.
Note 3: To print the Frequency Response and/or Cepstrum graphs, use the Alt Print Screen (Alt Prt Scr) command and it will be recorded on your system clipboard to be used as needed.
Note 4: Voices under stress may yield distorted Voice ID results.

"Who put orange juice in my orange juice?" - - - William Claude Dukenfield
Tags: cepstrum, export voice id, formants, quefrency, voice id

Previous template Next

Announcement

Voice ID (Forensics versions only)