Elimination of Stopwords
•Stopwords are extremely common words across document collections that have no
discriminatory power
–They may occur in 80% of the documents in a collection.
–They would appear to be of little value in helping select documents matching a
user need and needs to be filtered out as potential index terms
•Examples of stopwords are articles, prepositions, conjunctions, etc.:
–articles (a, an, the); pronouns: (I, he, she, it, their, his)
–Some prepositions (on, of, in, about, besides, against), conjunctions/ connectors
(and, but, for, nor, or, so, yet), verbs (is, are, was, were), adverbs (here, there,
out, because, soon, after) and adjectives (all, any, each, every, few, many, some)
can also be treated as stopwords
•Stopwords are language dependent.