r - tm corpus exporting some structural function words -
using tm library, corpus includes words vector source structure :
text <- readlines("some.txt") finalcorpus <- corpus(vectorsource(newcorpus)) finalcorpus <- tm_map(finalcorpus, stripwhitespace) save(finalcorpus, file="data/debug.rda")# debug df<- data.frame(lapply(finalcorpus, as.character), stringsasfactors=false) df >protracted periods meditation fasting prayer ennui fever energy vigor >married joseph lee dollars million canadian dollars gbp pastored african >american church snow hill jersey children died infancy **meta list author >character datetimestamp list sec min hour mday mon year wday yday isdst >description character heading character id language en origin character >x2 x3 >1 list list**
the words between ** corpus , not imported text, why them et how remove them (without removewords tm function) ?
Comments
Post a Comment