Processing II: Acoustics

Overview

In this script, we will work with the audio files we extracted from XDF file. We will extract the following features:

  • intensity
  • f0
  • spectral centroid / spectral center of gravity
  • formants
Code to load packages and prepare the environment
# packages
import os
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy
from scipy.signal import butter, filtfilt
import librosa
import parselmouth
import matplotlib.pyplot as plt
import IPython.display as ipd
import seaborn as sns
from scipy.signal import find_peaks, peak_widths

curfolder = os.getcwd()
print(curfolder)

# files to work with
ACfolder = curfolder + '\\..\\01_XDF_processing\\data\\Data_processed\\Data_trials\\Audio_48'

# folders to save the processed data
ACfolder_processed = curfolder + '\\TS_acoustics\\'

actotrack = glob.glob(ACfolder + "/*.wav", recursive=True)
#print(actotrack)

# get rid of the first file because it's faulty
actotrack = actotrack[1:]

First, we need to take care that we are not working with wrongly cut audio files, but only with the corrected version (if it exists). From the list of audios, we will hence exclude all files that have also corrected version in the list.

# Get all corrected audios from the list
ac_cor = []
for file in actotrack:
    if 'corrected' in file:
        ac_cor.append(file)

# Now get the name of this trial without the corrected part
ac_old = []
for file in ac_cor:
    ac_old.append(file.replace('_corrected', ''))

# From actotrack, remove trials that are in videos_old
actotrack = [x for x in actotrack if x not in ac_old]

Here is an audio example

And here it is visualized as a waveform

Custom functions
def chunk_and_smooth(df, var, window=25, order=3):

    df['chunk'] = None

    chunk = 0
    for index, row in df.iterrows():
        if np.isnan(row[var]):
            continue
        else:
            df.loc[index, 'chunk'] = chunk
            # if the next value is NaN or this is the last row, increase the chunk
            if index == len(df)-1:
                continue
            elif np.isnan(df.loc[index+1, var]):
                chunk += 1

    # now we can smooth the spectralCent values in each chunk
    chunks = df['chunk'].unique()

    # skip if chunks are empty (that means that there is no var trace)
    if len(chunks) > 1:
        # ignore the first chunk (None)
        chunks = chunks[1:]
        for chunk in chunks:
            # get the rows of the chunk
            chunkrows = df[df['chunk'] == chunk].copy()
            # dont smooth chunks shorter than 5
            if len(chunkrows) < 5:
                continue
            else:
                # smooth var with savgol filter
                chunkrows[var] = scipy.signal.savgol_filter(chunkrows[var], window, order) 
                # put it back to the df
                df.loc[df['chunk'] == chunk, var] = chunkrows[var]

    # get rid of the chunk column
    df = df.drop('chunk', axis=1)

    return df

Extracting intensity (vocalic energy)

To extract the amplitude envelope of the acoustic signal, we follow a method by Tilsen and Arvaniti (2013), adapted by Pouw (2024). We use a bandpass and 2nd order 10Hz low-pass zero-phase Butterworth filter.

Code with functions to extract the amplitude envelope
# Define the bandpass filter
def butter_bandpass(lowcut, highcut, fs, order=2):
    nyquist = 0.5 * fs
    low = lowcut / nyquist
    high = highcut / nyquist
    b, a = butter(order, [low, high], btype='band')
    return b, a

def butter_bandpass_filtfilt(data, lowcut, highcut, fs, order=2):
    b, a = butter_bandpass(lowcut, highcut, fs, order=order)
    y = filtfilt(b, a, data)
    return y

# Define the lowpass filter
def butter_lowpass(cutoff, fs, order=2):
    nyquist = 0.5 * fs
    normal_cutoff = cutoff / nyquist
    b, a = butter(order, normal_cutoff, btype='low')
    return b, a

def butter_lowpass_filtfilt(data, cutoff, fs, order=2):
    b, a = butter_lowpass(cutoff, fs, order=order)
    y = filtfilt(b, a, data)
    return y

# Function to extract amplitude envelope
def amp_envelope(audiofilename):
    # load audio with librosa
    audio, sr = librosa.load(audiofilename, sr=None, mono=True)
    # Bandpass filter 400-4000Hz
    data = butter_bandpass_filtfilt(audio, 400, 4000, sr, order=2)
    # Lowpass filter 10Hz
    data = butter_lowpass_filtfilt(np.abs(data), 10, sr, order=2)
    # scale from 0 to 1
    data = (data - np.min(data)) / (np.max(data) - np.min(data))
    return data, sr

Here is an example how the vocalic energy is extracted

Now we loop over all the audio files and extract the vocalic energy

# Loop over wav files
for audiofile in actotrack:

    # get the trialid
    trialid = audiofile.split('\\')[-1].split('.')[0]
    trialid = '_'.join(trialid.split('_')[0:1] + trialid.split('_')[1:2] + trialid.split('_')[3:4] + trialid.split('_')[7:8])

    print('working on ' + trialid)

    # apply the function
    ampv, sr = amp_envelope(audiofile)

    # Extract and plot the original signal
    rawaudio, sr = librosa.load(audiofile, sr=None)

    # create a time vector
    time_env = np.arange(0, len(rawaudio)/sr, 1/sr)
    
    # Ensure the lengths match by padding ampv if necessary (Note that is a quick fix)
    if len(ampv) < len(time_env):
        ampv = np.pad(ampv, (0, len(time_env) - len(ampv)), mode='constant')
    elif len(ampv) > len(time_env):
        ampv = ampv[:len(time_env)]

    # the same for rawaudio
    if len(rawaudio) < len(time_env):
        rawaudio = np.pad(rawaudio, (0, len(time_env) - len(rawaudio)), mode='constant')
    elif len(rawaudio) > len(time_env):
        rawaudio = rawaudio[:len(time_env)]
    
    # save the audio and envelope
    try:
        audio = pd.DataFrame({'time': time_env, 'audio': rawaudio, 'envelope': ampv, 'trialID': trialid})
        # convert time to ms
        audio['time'] = audio['time'] * 1000

        # perform also envelope change
        audio['envelope_change'] = np.insert(np.diff(audio['envelope']), 0, 0)
        # smooth
        audio['envelope_change'] = butter_lowpass_filtfilt(np.abs(audio['envelope_change']), 10, sr, order=2)
        
        # write as csv
        audio.to_csv(ACfolder_processed + '/env_' + trialid + '.csv', index=False)

    except ValueError:
        print('ValueError: ' + trialid)
        continue

This is an example of a file

time audio envelope trialID envelope_change
0 0.000000 -0.000031 0.016587 0_1_18_p0 -4.792690e-09
1 0.020833 -0.000031 0.016587 0_1_18_p0 -4.779707e-09
2 0.041667 -0.000031 0.016587 0_1_18_p0 -4.766729e-09
3 0.062500 -0.000031 0.016587 0_1_18_p0 -4.753755e-09
4 0.083333 -0.000031 0.016587 0_1_18_p0 -4.740787e-09
5 0.104167 -0.000031 0.016587 0_1_18_p0 -4.727823e-09
6 0.125000 -0.000031 0.016587 0_1_18_p0 -4.714864e-09
7 0.145833 -0.000031 0.016587 0_1_18_p0 -4.701909e-09
8 0.166667 -0.000031 0.016587 0_1_18_p0 -4.688959e-09
9 0.187500 -0.000031 0.016587 0_1_18_p0 -4.676015e-09
10 0.208333 -0.000031 0.016587 0_1_18_p0 -4.663075e-09
11 0.229167 -0.000031 0.016587 0_1_18_p0 -4.650139e-09
12 0.250000 -0.000031 0.016587 0_1_18_p0 -4.637209e-09
13 0.270833 0.000000 0.016587 0_1_18_p0 -4.624284e-09
14 0.291667 0.000000 0.016587 0_1_18_p0 -4.611364e-09


Here it is visualized

Extracting fundamental frequency (f0)

Now we extract pitch using the parselmouth library (Jadoul, Thompson, and de Boer (2018)).

Because we need take into consideration the sex of participant to set the f0 range accordingly, prior to this script we have extracted the speakers’ register using Praat script Get_Speakers_register.praat from Celine De Looze and save it in file SpeakerRegister.txt.

Now, we first check the mean min and max f0 values across all available data and set the range accordingly.

# this is where we store the min-max f0 values of each speaker
register = pd.read_csv(curfolder + '\\SpeakerRegister.txt', sep='\t') 

# here we store metadata for each session about sex
meta = pd.read_csv(curfolder + '\\..\\00_RAWDATA\\META_gender.txt', sep='\t')

# now we want to find out the range for males and females
register['sex'] = None

# make f0min and f0max numeric
register['f0min'] = pd.to_numeric(register['f0min'], errors='coerce')
register['f0max'] = pd.to_numeric(register['f0max'], errors='coerce')

# loop over rows in register,
for idx, row in register.iterrows():
    #  get sessionID from FILE (first part)
    sessionID = row['FILE'].split('_')[0]
    # get pcn id
    pcn = row['FILE'].split('_')[7]
    # merge it
    ID = sessionID + '_' + pcn
    # find this id in meta and save in sex the value in column sex
    sex = meta[meta['ID'] == ID]['sex'].values[0]
    # save value of sex in current row
    register.at[idx, 'sex'] = sex

# now group sex by each value and find the mean of f0min and f0max
f0min = register.groupby('sex')['f0min'].mean()
f0max = register.groupby('sex')['f0max'].mean()

# bind in df
df_register = pd.DataFrame({'f0min': f0min, 'f0max': f0max})
df_register.head(5)
    
f0min f0max
sex
f 185.895425 380.660131

Dyad 0 consists of two females, and the f0 min is 22 Hz and f0 max is 381 Hz.

Code with function to extract the fundamental frequency
def extract_f0(locationsound, sex):

    # read the sound file as numpy array
    audio, sr = librosa.load(locationsound, sr=48000)

    # read the sound file into Python
    snd = parselmouth.Sound(audio, sampling_frequency=sr)

    if sex == 'f':
        f0min = 186      ## calculated by previous chunk
        f0max = 381
    else:
        f0min = 75      ## Note: don't have any males in dyad 0 so this is only placeholder
        f0max = 300

    pitch = snd.to_pitch(time_step = 0.002, pitch_floor=f0min, pitch_ceiling=f0max) # time_step to get 500Hz

    f0_values = pitch.selected_array['frequency']

    return snd, f0_values

Now we loop over all audio files and extract f0 from each. Resulting f0 contours were smoothed with a Savitzky-Golay 3rd-polynomial filter with a span of 50 ms (following Fuchs, Reichel, and Rochet-Capellan (2016)) applied to continuous runs of phonated vocalization to maintain discontinuities typical of the f0 signal.

freq=48000    
meta = pd.read_csv(curfolder + '\\..\\00_RAWDATA\\META.txt', sep='\t')

# Loop over wav files
for audiofile in actotrack:

    # get the trialid
    trialid = audiofile.split('\\')[-1].split('.')[0]
    #trial id is the first, second, fourth and eighth element
    trialid = '_'.join(trialid.split('_')[0:1] + trialid.split('_')[1:2] + trialid.split('_')[3:4] + trialid.split('_')[7:8])

    print('working on ' + trialid)

    # first element is sessionid, fourth element is participantid
    sessionid = trialid.split('_')[0]
    participantid = trialid.split('_')[3]
    ID = sessionid + '_' + participantid

    # what sex has this ID in meta
    sex = meta[meta['ID'] == ID]['sex'].values[0]

    # apply the function
    snd, f0 = extract_f0(audiofile, sex)

    length = len(f0)

    # replace 0 values with NaN
    f0 = np.where(f0 == 0, np.nan, f0)

    # create time vector
    F0_time = np.linspace(0, snd.duration, len(f0)) * 1000  # Generate time vector

    # create df
    f0_df = pd.DataFrame({'time_ms': F0_time, 'f0': f0, 'ID': trialid})

    # Smooth the f0 values
    try:
        f0_df = chunk_and_smooth(f0_df, 'f0') # do it with window 25
    except ValueError:
        # unless there is only tiny chunk of f0 and then we need window of 5
        print('ValueError: ' + trialid + ', f0 trace is smaller than window length, resuming to window=5')
        f0_df = chunk_and_smooth(f0_df, 'f0', window=5)

    # write as csv
    f0_df.to_csv(ACfolder_processed + '/f0_' + trialid + '.csv', index=False)

Here is an example of a file

time_ms f0 ID
0 0.000000 NaN 0_1_10_p1
1 2.052699 NaN 0_1_10_p1
2 4.105398 NaN 0_1_10_p1
3 6.158097 NaN 0_1_10_p1
4 8.210796 NaN 0_1_10_p1
5 10.263495 NaN 0_1_10_p1
6 12.316194 NaN 0_1_10_p1
7 14.368892 NaN 0_1_10_p1
8 16.421591 NaN 0_1_10_p1
9 18.474290 NaN 0_1_10_p1
10 20.526989 NaN 0_1_10_p1
11 22.579688 NaN 0_1_10_p1
12 24.632387 NaN 0_1_10_p1
13 26.685086 NaN 0_1_10_p1
14 28.737785 NaN 0_1_10_p1


And here visualized

Extracting spectral centroid

To extract the are of the main spectral energy, we will compute spectral centroid / spectral center of gravity using parselmouth package (Jadoul, Thompson, and de Boer (2018)). We are filtering out the fundamental frequency (F0) to remove low-frequency components that might interfere with the analysis (code adapted from here).

Code with function to filter out f0
from scipy.signal import sosfilt

def remove_f0(x, fundamental, fs):
    # Normalize the frequency for digital filter design
    f_dig = 2 * np.pi * fundamental / fs
    
    # Define the notch filter parameters for F0
    p1 = 0.999 * np.exp(1j * f_dig)  # Pole near the F0 frequency
    z1 = np.exp(1j * f_dig)          # Zero near the F0 frequency
    
    # Define zeros and poles for the notch filter
    zeros = np.array([z1, z1])
    poles = np.array([p1, p1])
    
    # Create a 2nd-order section (sos) filter array
    sos = np.array([[1, -2 * np.real(p1), 1, 1, -2 * np.real(z1), np.abs(z1)**2]])

    # Apply the notch filter to the signal
    y = sosfilt(sos, x)
    
    return y
# Parameters
window_length = 0.03 # 30ms analysis window

# Loop over wav files
for audiofile in actotrack:
    # Extract trial ID from filename
    trialid = os.path.basename(audiofile).split('.')[0]
    trialid = '_'.join(trialid.split('_')[0:1] + trialid.split('_')[1:2] + trialid.split('_')[3:4] + trialid.split('_')[7:8])

    print(f'Working on {trialid}')

    # Extract session and participant ID
    sessionid = trialid.split('_')[0]
    participantid = trialid.split('_')[3]
    ID = f"{sessionid}_{participantid}"

    # Load sound
    snd = parselmouth.Sound(audiofile)

    # Get sampling rate
    fs = snd.sampling_frequency

    # Load f0 file with the same trialid
    f0file = ACfolder_processed + '/f0_' + trialid + '.csv'
    f0_df = pd.read_csv(f0file)

    # Get the mean (from non-NaN values) of the f0
    f0_df = f0_df.dropna()
    mean_f0 = f0_df['f0'].mean()

    # Sound values
    sound_values = snd.values[0]

    # Remove F0 from the signal using the mean F0
    filtered_signal = remove_f0(sound_values, mean_f0, fs)

    # Recreate the sound object with the filtered signal
    filtered_sound = parselmouth.Sound(filtered_signal, sampling_frequency=fs)

    # Compute spectrogram of the filtered signal
    spectrogram = filtered_sound.to_spectrogram(window_length=window_length)

    # Extract time values from the spectrogram
    times = spectrogram.xs()  # Time vector in seconds

    # Compute CoG for each time step
    cog_values = [spectrogram.to_spectrum_slice(time=t).get_centre_of_gravity(power=2.0) for t in times]

    # Convert time to milliseconds
    time_cog = np.array(times) * 1000  

    # Convert CoG values to numpy array
    cog_values = np.array(cog_values)

    # Create DataFrame
    cog_df = pd.DataFrame({'time': time_cog, 'CoG': cog_values, 'TrialID': trialid})

    # Replace zeros with NaN
    cog_df['CoG'] = cog_df['CoG'].replace(0, np.nan)

    # Smooth the data
    try:
        cog_df = chunk_and_smooth(cog_df, 'CoG')
    except ValueError:
        print(f'ValueError: {trialid}, CoG trace is smaller than window length, using window=5')
        cog_df = chunk_and_smooth(cog_df, 'CoG', window=5)

    # Save DataFrame
    output_path = os.path.join(ACfolder_processed, f'cog_{trialid}.csv')
    cog_df.to_csv(output_path, index=False)

This is a visual example of a file

Extracting formants

To extract formant values, we use Chris Carignan’s Praat script (see Github) which optimizes the F1-F5 values.

To verify the sensibility of the data, we will do some visual inspections. Moreover, we will consider taking formant values from the windows of envelope amplitude peaks.

Code to prepare the environment
# Here we store formants from praat
formantfolder = curfolder + '/TS_formants/Carignan_formants/'
formants = glob.glob(formantfolder + '*.Table')

# Here we store processed envelope 
envfiles = glob.glob(ACfolder_processed + '/env_*.csv')

Chris Carignan’s Praat-script outputs formants as a .Table. Let’s therefore first read these files and resave them as .csv files.

for formant in formants:
    formant_df = pd.read_csv(formant, sep='\t')

    # get the name of the file
    filename = os.path.basename(formant)
    # get the name of the file without the extension
    filename = os.path.splitext(filename)[0]

    # add to df
    formant_df['filename'] = filename

    # get trialid from the file name 
    trialid = '_'.join(filename.split('_')[0:2] + filename.split('_')[3:4] + filename.split('_')[7:8]) 

    print('working on ' + trialid)

    # add empty row in the beginning with time 0 and rest as first row
    copy_row = formant_df.iloc[0].copy()
    # time of this row is 0
    copy_row['time'] = 0

    # add this row to the beginning of the df
    formant_df = pd.concat([pd.DataFrame(copy_row).T, formant_df], ignore_index=True)
    
    # add trialid to the df
    formant_df['trialid'] = trialid

    # write it as csv to formantfolder1
    formant_df.to_csv(ACfolder_processed + 'praat_formants_' + trialid + '.csv', index=False)
# inititate empty df
formants_df = pd.DataFrame()

# get all formant files we just created
formantfiles = glob.glob(ACfolder_processed + 'praat_formants_*.csv')

# loop over formants 2 and make a giga df from all
for formant in formantfiles:
    print('working on ' + formant)
    for_df = pd.read_csv(formant)

    # get the name of the file
    filename = for_df['filename'][0]

    # in filename, look for c1, c2, c0
    if 'c1' in filename:
        for_df['correction'] = 'c1'
    elif 'c2' in filename:
        for_df['correction'] = 'c2'
    elif 'c0' in filename:
        for_df['correction'] = 'c0'
    else:
        for_df['correction'] = 'none'
    
    # concatenate
    formants_df = pd.concat([formants_df, for_df])

# get rid of rows with correction = none
formants_df = formants_df[formants_df['correction'] != 'none']

This is how the formants look like in a table

time f1 f2 f3 f4 f5 filename trialid correction
0 0.000000 1910.072640 3380.992452 5896.442699 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
1 0.026031 1910.072640 3380.992452 5896.442699 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
2 0.031031 1890.080560 3366.165092 5896.610205 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
3 0.036031 1910.066181 3380.974399 5896.442306 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
4 0.041031 1890.092319 3366.156404 5896.609863 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
5 0.046031 1910.073207 3380.962077 5896.441999 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
6 0.051031 1890.087965 3366.149616 5896.609697 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
7 0.056031 1910.062802 3380.953514 5896.441903 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
8 0.061031 1890.085211 3366.138214 5896.609723 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
9 0.066031 1910.063561 3380.941686 5896.442068 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
10 0.071031 1890.080307 3366.122801 5896.610225 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
11 0.076031 1910.048401 3380.925190 5896.443055 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
12 0.081031 1890.073211 3366.114427 5896.611974 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
13 0.086031 1910.043940 3380.916181 5896.446468 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0
14 0.091031 1153.005672 2425.537494 3774.555756 0.0 0.0 0_2_pr_0_Mic_nominal_srate48000_p0_juichen_com... 0_2_0_p0 c0


This is how the formants look for a single trial.

Now let’s look at the vowel space area across all data.

And this is distribution of f1 across all data.

This all looks reasonable. However, we should still be careful. Formant values are most reliable where f0 is present. Since in this project, we work with non-speech sounds, they are frequently unvoiced. Because research shows that there are also weak ‘formants’ beyond f0 contour, resulting, for instance, from resonances of sub- and supraglottal tract during breathing (Werner et al. 2024), we will also consider formant values in the moments of envelope peaks. This will maximize the number of data points we can use for analysis.

We can use findpeaks() function from the signal package to find the peaks in the envelope. We can then use these peaks as a reference point for formant extraction.

env_df = pd.DataFrame()

# loop over env files and make a giga df from all
for envfile in envfiles:
    df = pd.read_csv(envfile)
    env_df = pd.concat([env_df, df])

Using peak_width function, we can extract the window of an envelope peak. Further, we can define the relative height of the peak to adjust the window size. Here, we try relative height of 0.5 and 0.9

# rename trialID to trialid
env_df = env_df.rename(columns={'trialID': 'trialid'})

# pick a random trialid from env_df
trialid = '0_2_62_p1'

# get the env for this trialid from env_df
env_trial = env_df[env_df['trialid'] == trialid]

# find peaks, min height is mean of the env
peaks, _ = find_peaks(env_trial['envelope'], height=np.mean(env_df['envelope']))

# get the width of the peaks
results_half = peak_widths(env_trial['envelope'], peaks, rel_height=0.5)
results_full = peak_widths(env_trial['envelope'], peaks, rel_height=0.9)

Now we can check envelope peak widths against formant values. In merged dataframe with both formants and envelope, we will annotate peak widths, so that we know which values of formants to consider (the rest we turn to NA)

# find formants2 df with the same trialid
formants_trial = formants_df[formants_df['trialid'] == trialid]

# convert time to ms
formants_trial['time'] = formants_trial['time'] * 1000

# merge formants1 and formants2 on trialid and time, outer method
merged_df = pd.merge(env_trial, formants_trial, on=['trialid', 'time'], how='outer')

# cols to int
colstoint = ['f1', 'f2', 'f3', 'f4', 'f5']

#interpolate 
for col in colstoint:
    merged_df[col] = merged_df[col].interpolate(method='linear', x = merged_df['time'])

#delete rows where envelope is NaN
merged_df = merged_df.dropna(subset=['envelope'])

# check the width of the peaks
peaks, _ = find_peaks(merged_df['envelope'], height=np.mean(env_df['envelope'])) # minimum height of the peak is mean of the envelope (across all data)

# get the width of the peaks
results_width = peak_widths(merged_df['envelope'], peaks, rel_height=0.9)

# create column peak_width and put 1 everywhere between start and end of the peak
merged_df['peak_width'] = 0

# create a table from the results_half[2] and results_half[3]
peak_w = pd.DataFrame({'start': results_width[2], 'end': results_width[3]})

# loop over the rows of the peak_w and put 1 in the peak_width column between start and end
for i, row in peak_w.iterrows():
    merged_df.loc[row['start']:row['end'], 'peak_width'] = 1

# for each formant column, create new f_clean column and put the value of the formant where peak_width = 1
for col in colstoint:
    merged_df[col + '_clean'] = merged_df[col] * merged_df['peak_width']
    #instead of 0, put NaN
    merged_df[col + '_clean'] = merged_df[col + '_clean'].replace(0, np.nan)

Here we can see visualized overlap of formants and envelope (peaks). The darker part of the formants signal is the window of an envelope peak.

In merging script, we will get back to this and use both envelope peaks and f0 to define the relevant formant windows.

References

Fuchs, Susanne, Uwe Reichel, and Amélie Rochet-Capellan. 2016. “F0 Declination and Speech Planning in Face to Face Dialogues.” In. https://doi.org/10.13140/RG.2.1.4909.0320.
Jadoul, Yannick, Bill Thompson, and Bart de Boer. 2018. “Introducing Parselmouth: A Python Interface to Praat.” Journal of Phonetics 71: 1–15. https://doi.org/10.1016/j.wocn.2018.07.001.
Pouw, Wim. 2024. “Wim Pouw’s EnvisionBOX Modules for Social Signal Processing.” https://github.com/WimPouw/envisionBOX_modulesWP.
Tilsen, Sam, and Amalia Arvaniti. 2013. “Speech Rhythm Analysis with Decomposition of the Amplitude Envelope: Characterizing Rhythmic Patterns Within and Across Languages.” The Journal of the Acoustical Society of America 134 (1): 628–39. https://doi.org/10.1121/1.4807565.
Werner, Raphael, Susanne Fuchs, Jürgen Trouvain, Steffen Kürbis, Bernd Möbius, and Peter Birkholz. 2024. “Acoustics of Breath Noises in Human Speech: Descriptive and Three-Dimensional Modeling Approaches.” Journal of Speech, Language, and Hearing Research 67 (10S): 3947–61. https://doi.org/10.1044/2023\_JSLHR-23-00112.