Refer to task 5: we use edit distance is used to evaluate the accuracy of the model.
For each input sample feed to model + decoder → predicted output sequence (pred_seq), which is a list of string including symbols and relations. We also have a label sequence (label_seq) for that input sample (which also is a list of string). Edit distance is used to measure the different between pred_seq and label_seq
Examples:
pred_seq = ['\\\\phi', 'Right', '(', 'Right', '0', 'Right', '(', 'Right', 'n', 'Right', ')', 'Right', ')']
label_seq = ['\\\\phi', 'Right', '(', 'Right', '\\\\phi', 'Right', '(', 'Right', 'n', 'Right', ')', 'Right', ')']
For the example above, edit distance is 1 since there is one substitution of ‘0’ → ‘\\phi’ to make pred_seq → label_seq
This measurement, however, does not evaluate the accuracy of symbols and accuracy of relations. To evaluate that, the edit distance for symbols and edit distance for relations need to be calculated. Then we can evaluate the word error rate (wer) for symbols and wer for relations during training (bonus 1).