This function provides the frequency of collocations in comments that correspond to the provided transcript, using fuzzy matching.
Usage
collocate_comments_fuzzy(
transcript_token,
note_token,
collocate_length = 5,
n_bands = 50,
threshold = 0.7
)
Arguments
- transcript_token
transcript token to act as baseline for notes, resulting from
token_transcript()
- note_token
tokenized document of notes, resulting from
token_comments()
- collocate_length
the length of the collocation. Default is 5
- n_bands
number of bands used in MinHash algorithm passed to
zoomerjoin::jaccard_right_join()
. Default is 50- threshold
considered a match in for Jaccard distance passed to
zoomerjoin::jaccard_right_join()
. Default is 0.7
Examples
comment_example_rename <- dplyr::rename(comment_example[1:10,], page_notes=Notes)
toks_comment <- token_comments(comment_example_rename)
transcript_example_rename <- dplyr::rename(transcript_example, text=Text)
toks_transcript <- token_transcript(transcript_example_rename)
fuzzy_object <- collocate_comments_fuzzy(toks_transcript, toks_comment)
#> Joining with `by = join_by(unlist.descript_ngrams.)`
#> Joining with `by = join_by(collocation)`
#> Joining with `by = join_by(collocation)`
#> Warning: A pair of records at the threshold (0.7) have only a 95% chance of being compared.
#> Please consider changing `n_bands` and `band_width`.
#> Joining with `by = join_by(collocation.y)`
#> Joining with `by = join_by(collocation)`
#> Joining with `by = join_by(word_number)`