This function provides the frequency of collocations in comments that correspond to the provided transcript.
Arguments
- transcript_token
transcript token to act as baseline for notes, resulting from
tokenize_source()- note_token
tokenized document of notes, resulting from
tokenize_derivative()- collocate_length
the length of the collocation. Default is 5
Details
Collocations are sequences of words present in the source document. For example, the phrase "the blue bird flies" contains one collocation of length 4 ("the blue bird flies"), two collocations of length 3 ("the blue bird" and "blue bird flies"), and three collocations of length 2 ("the blue", "blue bird", and "bird flies"). This function counts the number of corresponding phrases in the 'notes', or the derivative documents. Matches between the two documents must be exact
Examples
# Tokenize the derivative document
toks_comment <- tokenize_derivative(comment_example[1:100,], text_column="Notes")
# Tokenize source document
toks_source <- tokenize_source(transcript_example)
# Compute collocation frequencies
collocation_object <- collocate_comments(toks_source, toks_comment)
