Skip to contents

This is a method of note cleaning that uses the length of the previous notes to compare to the beginning of the questioned page of notes. If there is enough correspondance between the two paes, the notes will be removed from the current page.

Usage

firstnchar(dataset, notes, char_diff, identifier, pageid)

Arguments

dataset

the dataset containing the notes

notes

the column name for the notes

char_diff

allowable character difference for removing notes

identifier

column name for uniquely identifying identification

pageid

column name for page number

Value

a data frame

Examples

test_dataset <- data.frame(ID=c("1","1","2","2","1", "3", "3"),
Notes=c("The","The cat","The","The dog","The cat ran",
"the chicken was chased", "The goat chased the chicken"),
Page=c(1,2,1,2,3,1,2))
firstnchar(dataset=test_dataset,notes="Notes",char_diff=3,identifier="ID",pageid="Page")
#>   ID                       Notes Page                  page_notes edit_distance
#> 1  1                         The    1                         The            NA
#> 2  1                     The cat    2                         cat             0
#> 3  2                         The    1                         The            NA
#> 4  2                     The dog    2                         dog             0
#> 5  1                 The cat ran    3                         ran             0
#> 6  3      the chicken was chased    1      the chicken was chased            NA
#> 7  3 The goat chased the chicken    2 The goat chased the chicken            17