In the previous scripts, we have preprocessed various motion and acoustic data. In this script, we will merge the data into a single file per trial. These data include:
Balance Board data
Kinematics
Joint angles
Joint moments
Amplitude envelope
f0
Formants
Spectral centroid
Code to prepare environment
# packagesimport osimport globimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport scipyimport matplotlib.pyplot as pltimport seaborn as snscurfolder = os.getcwd()# folders with processed dataMTfolder_processed = curfolder +'\\TS_motiontracking\\'ACfolder_processed = curfolder +'\\TS_acoustics\\'# folder to save merged dataTSmerged = curfolder +'\\TS_merged\\'# prepare all filesbbfiles = glob.glob(MTfolder_processed +'/bb*.csv')idfiles = glob.glob(MTfolder_processed +'/id*.csv')ikfiles = glob.glob(MTfolder_processed +'/ik*.csv')mtfiles = glob.glob(MTfolder_processed +'/mt*.csv')envfiles = glob.glob(ACfolder_processed +'/env*.csv')f0files = glob.glob(ACfolder_processed +'/f0*.csv')formants = glob.glob(ACfolder_processed +'/praat_formants*.csv')scfiles = glob.glob(ACfolder_processed +'/cog*.csv')
When extracting and processing the acoustic and motion signals, we kept the sampling rates untouched. This means that now we have a variety of timeseries that each samples at different frequency. Inspecting a trial per each signal, we see the following sampling rates:
We opt for 500 Hz as the final sampling rate we will merge on. That means that we will interpolate all missing data (using linear interpolation) to match this frequency.
Additionally, we will adapt the formants such that we only consider values that are present within a range of an amplitude peak (see acoustics processing script for more details), or where f0 is present, or both. These two situations can be considered as yielding in the most reliable formant values.
Finally, we will also use inverse kinematics and dynamics to calculate power (as joint moment × joint velocity) and smooth it with 1st-polynomial Savitzky-Golay filter with span of 560 ms.
Custom functions
# Function to create chunks of non-NaN values in a dataframedef create_chunks(df, var): df['chunk'] =None# annotate chunks of non-NaN values chunk =0for index, row in df.iterrows():if np.isnan(row[var]):continueelse: df.loc[index, 'chunk'] = chunk# if the next value is NaN or this is the last row, increase the chunkif index ==len(df)-1:continueelif np.isnan(df.loc[index+1, var]): chunk +=1 chunks = df['chunk'].unique()iflen(chunks) >1: # skip if chunks are empty (that means that there is no f0 trace)# ignore the first chunk (None) chunks = chunks[1:]return df, chunks# Function to interpolate chunks of non-NaN values in a dataframe to maintain discontinuities in the signaldef interpolate_chunks(df, chunks, var):# we ignore the None chunk above, so if there is some trace, None should not be within chunksifNonenotin chunks:for chunk in chunks:# get the first and last row of the chunk firstrow = df[df['chunk'] == chunk].index[0] lastrow = df[df['chunk'] == chunk].index[-1]# fill all inbetween with the chunk number df.loc[firstrow:lastrow, 'chunk'] = chunk# get the rows of the chunk chunkrows = df[df['chunk'] == chunk].copy()# interpolate chunkrows[var] = chunkrows[var].interpolate(method='linear', x = chunkrows['time'])# put the interpolated chunk back to the df df.loc[df['chunk'] == chunk, var] = chunkrows[var]# get rid of the chunk column df.drop('chunk', axis=1, inplace=True)return df
Merging signals on a common sampling rate
desired_sr =0.5# this is the sr we are going to merge on (in Hz/sec)error_log = []forfilein bbfiles: bb_df = pd.read_csv(file)# get trial id trialid = bb_df['TrialID'][0]print('working on '+ trialid)# find the same trialid in idfiles id_file = [x for x in idfiles if trialid in x]try: id_df = pd.read_csv(id_file[0])exceptIndexError:print('IndexError: '+ trialid +' not found') errormessage ='IndexError: '+ trialid +' not found for ID' error_log.append(errormessage)continue# find the same trialid in mtfiles mt_file = [x for x in mtfiles if trialid in x]try: mt_df = pd.read_csv(mt_file[0])exceptIndexError:print('IndexError: '+ trialid +' not found') errormessage ='IndexError: '+ trialid +' not found for MT' error_log.append(errormessage)continue# rename Time to time mt_df.rename(columns={'Time': 'time'}, inplace=True)# find the same trialid in envfiles env_file = [x for x in envfiles if trialid in x]try: env_df = pd.read_csv(env_file[0])exceptIndexError:print('IndexError: '+ trialid +' not found') errormessage ='IndexError: '+ trialid +' not found for ENV' error_log.append(errormessage)continue# rename trialID to TrialID env_df.rename(columns={'trialID': 'TrialID'}, inplace=True)# find the same trialid in f0files f0_file = [x for x in f0files if trialid in x]try: f0_df = pd.read_csv(f0_file[0])exceptIndexError:print('IndexError: '+ trialid +' not found') errormessage ='IndexError: '+ trialid +' not found for F0' error_log.append(errormessage)continue# rename time_ms to time f0_df.rename(columns={'time_ms': 'time'}, inplace=True)# rename ID to TrialID f0_df.rename(columns={'ID': 'TrialID'}, inplace=True)# find the same trialid in ikfiles ik_file = [x for x in ikfiles if trialid in x]try: ik_df = pd.read_csv(ik_file[0])exceptIndexError:print('IndexError: '+ trialid +' not found') errormessage ='IndexError: '+ trialid +' not found for IK' error_log.append(errormessage)continue# find the same trialid in formants formants_file = [x for x in formants if trialid in x]try: formants_df = pd.read_csv(formants_file[0])exceptIndexError:print('IndexError: '+ trialid +'not found') errormessage ='IndexError: '+ trialid +' not found for formants' error_log.append(errormessage)continue# rename triald to TrialID formants_df.rename(columns={'trialid': 'TrialID'}, inplace=True) formants_df = formants_df[['time', 'f1', 'f2', 'f3', 'TrialID']] formants_df['time'] = formants_df['time'] *1000# find the same triaalid in CoG sc_file = [x for x in scfiles if trialid in x]try: sc_df = pd.read_csv(sc_file[0])exceptIndexError:print('IndexError: '+ trialid +'not found') errormessage ='IndexError: '+ trialid +' not found for CoG' error_log.append(errormessage)continue# write error logwithopen(TSmerged +'/error_log.txt', 'w') as f:for item in error_log: f.write("%s\n"% item)############## MERGING ############################ regularize sr in bb time_new = np.arange(0, max(bb_df['time']), 1/desired_sr) bb_interp = pd.DataFrame({'time': time_new})# interpolate all columns in samplebb colstoint = bb_df.columns colstoint = [x for x in colstoint if'time'notin x] colstoint = [x for x in colstoint if'TrialID'notin x] colstoint = [x for x in colstoint if'FileInfo'notin x]for col in colstoint: bb_interp[col] = bb_df[col].interpolate(method='linear', x = bb_interp['time'])# add trialid and time bb_interp['TrialID'] = trialid bb_interp['FileInfo'] = bb_df['FileInfo'][0]########### merge the bb_interp with env# merge the two dataframes merge1 = pd.merge(bb_interp, env_df, on=['time', 'TrialID'], how='outer')# interpolate missing values of envelope and audio colstoint = merge1.columns colstoint = [x for x in colstoint if'audio'in x or'envelope'in x]for col in colstoint: merge1[col] = merge1[col].interpolate(method='linear', x = merge1['time'])# now we can kick out all values where COPc is NaN merge1 = merge1[~np.isnan(merge1['COPc'])]########### merge with ID# merge the two dataframes merge2 = pd.merge(merge1, id_df, on=['time', 'TrialID'], how='outer')# get cols of sampleid colstoint = id_df.columns colstoint = [x for x in colstoint if'time'notin x] colstoint = [x for x in colstoint if'TrialID'notin x]# interpolate for col in colstoint: merge2[col] = merge2[col].interpolate(method='linear', x = merge2['time'])# now we can kick out all values where COPc is NaN to get sampling rate back to 500 merge2 = merge2[~np.isnan(merge2['COPc'])]########### merge with MT# merge the two dataframes merge3 = pd.merge(merge2, mt_df, on=['time', 'TrialID'], how='outer')# get cols of samplemt colstoint = mt_df.columns colstoint = [x for x in colstoint if'time'notin x] colstoint = [x for x in colstoint if'TrialID'notin x]# interpolate missing values of from mtfor col in colstoint: merge3[col] = merge3[col].interpolate(method='linear', x = merge3['time'])# now we can kick out all values where COPc is NaN merge3 = merge3[~np.isnan(merge3['COPc'])]########### merge with F0# for interpolation, we need to again parse f0 into chunks of non-NaN values f0_df, chunks = create_chunks(f0_df, 'f0')# now we can merge merge4 = pd.merge(merge3, f0_df, on=['time', 'TrialID'], how='outer')# interpolate f0 signal, while maintaining discontinuities merge4 = interpolate_chunks(merge4, chunks, 'f0')# now we can drop all rows where COPc is NaN merge4 = merge4[~np.isnan(merge4['COPc'])]########### merge with IK merge5 = pd.merge(merge4, ik_df, on=['time', 'TrialID'], how='outer')# get cols of sampleik colstoint = ik_df.columns colstoint = [x for x in colstoint if'time'notin x] colstoint = [x for x in colstoint if'TrialID'notin x]# interpolate missing values of from ikfor col in colstoint: merge5[col] = merge5[col].interpolate(method='linear', x = merge5['time'])# now we can kick out all values where COPc is NaN merge5 = merge5[~np.isnan(merge5['COPc'])]########### merge with formants merge6 = pd.merge(merge5, formants_df, on=['time', 'TrialID'], how='outer')# get cols of sampleformants colstoint = formants_df.columns colstoint = [x for x in colstoint if'time'notin x] colstoint = [x for x in colstoint if'TrialID'notin x]# interpolate missing values of from formants - currently they do not have NaNs so we can interpolate the whole signal instead of only non-NaN chunksfor col in colstoint: merge6[col] = merge6[col].interpolate(method='linear', x = merge6['time'])# now we can kick out all values where COPc is NaN merge6 = merge6[~np.isnan(merge6['COPc'])]########### merge with CoG merge7 = pd.merge(merge6, sc_df, on=['time', 'TrialID'], how='outer')# get cols of samplespecCentroid colstoint = sc_df.columns colstoint = [x for x in colstoint if'time'notin x] colstoint = [x for x in colstoint if'TrialID'notin x]# for interpolation, we need to again parse specCentroid into chunks of non-NaN values sc, chunks = create_chunks(sc_df, 'CoG')# now we merge merge7 = pd.merge(merge6, sc, on=['time', 'TrialID'], how='outer')# interpolate CoG signal, while maintaining discontinuities merge7 = interpolate_chunks(merge7, chunks, 'CoG')# now we can kick out all values where COPc is NaN merge7 = merge7[~np.isnan(merge7['COPc'])]# this is final df merge_final = merge7 ############## FORMANT ADAPTATION ######################### find peaks in envelope, with min=mean peaks, _ = scipy.signal.find_peaks(merge_final['envelope'], height=np.mean(merge_final['envelope']))# get widths of the peaks widths = scipy.signal.peak_widths(merge_final['envelope'], peaks, rel_height=0.95)# peak width df with starts and ends peak_widths = pd.DataFrame({'start': widths[2], 'end': widths[3]})# now create a new column env_weak_width, and put 0s everywhere, and 1s in the intervals of the width merge_final['env_peak_width'] =0for index, row in peak_widths.iterrows(): merge_final.loc[int(row['start']):int(row['end']), 'env_peak_width'] =1# now we will create formant columns, where we will keep only formants in the intervals of env_pak_width OR where f0 is not NaN merge_final['f1_clean_f0'] = merge_final['f1'] merge_final['f2_clean_f0'] = merge_final['f2'] merge_final['f3_clean_f0'] = merge_final['f3']# where f0 is NaN, we will put NaN - these are formants during f0 only merge_final.loc[np.isnan(merge_final['f0']), 'f1_clean_f0'] = np.nan merge_final.loc[np.isnan(merge_final['f0']), 'f2_clean_f0'] = np.nan merge_final.loc[np.isnan(merge_final['f0']), 'f3_clean_f0'] = np.nan# we will also create formants, where we will keep only those in the intervals of env_pak_width merge_final['f1_clean_env'] = merge_final['f1'] merge_final['f2_clean_env'] = merge_final['f2'] merge_final['f3_clean_env'] = merge_final['f3']# where env_peak_width is 0, we will put NaN - these are formants during envelope peaks only merge_final.loc[merge_final['env_peak_width'] ==0, 'f1_clean_env'] = np.nan merge_final.loc[merge_final['env_peak_width'] ==0, 'f2_clean_env'] = np.nan merge_final.loc[merge_final['env_peak_width'] ==0, 'f3_clean_env'] = np.nan## now we create formants where we copy values from clean_env and clean_f0 merge_final['f1_clean'] = merge_final['f1_clean_env'] merge_final['f2_clean'] = merge_final['f2_clean_env'] merge_final['f3_clean'] = merge_final['f3_clean_env']# where formant is now NaN, copy values from f_clean_f0 in case there is a value merge_final.loc[np.isnan(merge_final['f1_clean']), 'f1_clean'] = merge_final['f1_clean_f0'] merge_final.loc[np.isnan(merge_final['f2_clean']), 'f2_clean'] = merge_final['f2_clean_f0'] merge_final.loc[np.isnan(merge_final['f3_clean']), 'f3_clean'] = merge_final['f3_clean_f0']# now calculate formant velocities (but only for the f_clean) merge_final['f1_clean_vel'] = np.insert(np.diff(merge_final['f1_clean']), 0, 0) merge_final['f2_clean_vel'] = np.insert(np.diff(merge_final['f2_clean']), 0, 0) merge_final['f3_clean_vel'] = np.insert(np.diff(merge_final['f3_clean']), 0, 0)# smooth merge_final['f1_clean_vel'] = scipy.signal.savgol_filter(merge_final['f1_clean_vel'], 5, 3) merge_final['f2_clean_vel'] = scipy.signal.savgol_filter(merge_final['f2_clean_vel'], 5, 3) merge_final['f3_clean_vel'] = scipy.signal.savgol_filter(merge_final['f3_clean_vel'], 5, 3)########## POWER #################### groups = ['lowerbody', 'leg', 'head', 'arm']for group in groups:# get all columns that contain group cols = [x for x in merge_final.columns if group in x]# get all columns that contain 'moment_sum' torque = [x for x in cols if'moment_sum'in x]# but not change torque = [x for x in torque if'change'notin x][0]# get all columns that contain 'angSpeed_sum' angSpeed = [x for x in cols if'angSpeed_sum'in x][0]# get power which is moment * angSpeed merge_final[group +'_power'] = merge_final[torque] * merge_final[angSpeed]# smooth merge_final[group +'_power'] = scipy.signal.savgol_filter(merge_final[group +'_power'], 281, 1) # window 281 corresponds to 562 ms (we add +1 to have odd length of window)# write to csv merge_final.to_csv(TSmerged +'/merged_'+ trialid +'.csv', index=False)
Here is an example of the file containing all the data
time
left_back
right_forward
right_back
left_forward
COPXc
COPYc
COPc
TrialID
FileInfo
...
f1_clean
f2_clean
f3_clean
f1_clean_vel
f2_clean_vel
f3_clean_vel
lowerbody_power
leg_power
head_power
arm_power
0
0.0
1.040234
0.922185
1.447832
1.439272
0.000153
-1.427206e-05
0.000154
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
22.337284
1.403824
8.267574
6.538422
1
2.0
1.040128
0.922236
1.448219
1.439506
0.000218
-4.098744e-06
0.000218
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
22.285626
1.400045
8.244890
6.524016
2
4.0
1.040021
0.922294
1.448528
1.439647
0.000270
5.396631e-08
0.000270
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
22.233968
1.396267
8.222205
6.509609
3
6.0
1.039911
0.922355
1.448768
1.439706
0.000311
-5.992720e-07
0.000311
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
22.182311
1.392488
8.199520
6.495203
4
8.0
1.039796
0.922414
1.448945
1.439691
0.000342
-4.980859e-06
0.000342
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
22.130653
1.388709
8.176836
6.480797
5
10.0
1.039676
0.922470
1.449066
1.439612
0.000365
-1.214347e-05
0.000365
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
22.078995
1.384930
8.154151
6.466391
6
12.0
1.039550
0.922519
1.449138
1.439477
0.000381
-2.126330e-05
0.000382
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
22.027338
1.381152
8.131466
6.451984
7
14.0
1.039417
0.922559
1.449166
1.439293
0.000391
-3.163325e-05
0.000392
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.975680
1.377373
8.108782
6.437578
8
16.0
1.039277
0.922589
1.449157
1.439068
0.000396
-4.265618e-05
0.000398
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.924022
1.373594
8.086097
6.423172
9
18.0
1.039130
0.922607
1.449115
1.438808
0.000397
-5.383815e-05
0.000401
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.872364
1.369816
8.063413
6.408766
10
20.0
1.038976
0.922613
1.449046
1.438519
0.000395
-6.478160e-05
0.000400
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.820707
1.366037
8.040728
6.394359
11
22.0
1.038816
0.922605
1.448954
1.438206
0.000390
-7.517861e-05
0.000397
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.769049
1.362258
8.018043
6.379953
12
24.0
1.038648
0.922583
1.448844
1.437876
0.000383
-8.480413e-05
0.000392
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.717391
1.358479
7.995359
6.365547
13
26.0
1.038475
0.922548
1.448720
1.437532
0.000374
-9.350916e-05
0.000386
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.665734
1.354701
7.972674
6.351140
14
28.0
1.038295
0.922499
1.448586
1.437179
0.000365
-1.012140e-04
0.000378
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.614076
1.350922
7.949989
6.336734
15
30.0
1.038111
0.922437
1.448445
1.436821
0.000354
-1.079016e-04
0.000370
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.562418
1.347143
7.927305
6.322328
16
32.0
1.037923
0.922362
1.448301
1.436461
0.000343
-1.136105e-04
0.000361
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.510761
1.343364
7.904620
6.307922
17
34.0
1.037731
0.922276
1.448155
1.436101
0.000332
-1.184283e-04
0.000352
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.459103
1.339586
7.881936
6.293515
18
36.0
1.037537
0.922179
1.448012
1.435745
0.000321
-1.224848e-04
0.000343
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.407445
1.335807
7.859251
6.279109
19
38.0
1.037341
0.922072
1.447872
1.435395
0.000309
-1.259452e-04
0.000334
0_1_10_p1
p1_auto_geluiden_corrected
...
NaN
NaN
NaN
NaN
NaN
NaN
21.355788
1.332028
7.836566
6.264703
20 rows × 528 columns
And here we visualize some of the timeseries. We can also see that our interpolation did maintain the original discontinuities in the f0 signal.