Elimination of Stopwords
•Stopwordsare extremely common words across document
collections that have no discriminatory power
–They may occur in 80% of the documents in a collection.
–They would appear to be of little value in helping select documents
matching a user need and needs to befiltered out as potential
index terms
•Examples of stopwordsare articles, prepositions, conjunctions,
etc.:
–articles (a, an, the); pronouns: (I, he, she, it, their, his)
–Someprepositions(on,of,in,about,besides,against),
conjunctions/connectors(and,but,for,nor,or,so,yet),verbs(is,
are,was,were),adverbs(here,there,out,because,soon,after)
andadjectives(all,any,each,every,few,many,some)canalsobe
treatedasstopwords
•Stopwordsare language dependent.
23