This is a method of note cleaning that uses the length of the previous notes to compare to the beginning of the questioned page of notes. If there is enough correspondance between the two paes, the notes will be removed from the current page.
Examples
test_dataset <- data.frame(ID=c("1","1","2","2","1", "3", "3"),
Notes=c("The","The cat","The","The dog","The cat ran",
"the chicken was chased", "The goat chased the chicken"),
Page=c(1,2,1,2,3,1,2))
firstnchar(dataset=test_dataset,notes="Notes",char_diff=3,identifier="ID",pageid="Page")
#> ID Notes Page page_notes edit_distance
#> 1 1 The 1 The NA
#> 2 1 The cat 2 cat 0
#> 3 2 The 1 The NA
#> 4 2 The dog 2 dog 0
#> 5 1 The cat ran 3 ran 0
#> 6 3 the chicken was chased 1 the chicken was chased NA
#> 7 3 The goat chased the chicken 2 The goat chased the chicken 17