Wednesday, January 16, 2008

Python Idiom; remove stop words

I always forget this simple idiom for removing a string based on a set of words to remove. This particular example, given an array of stop words, remove them.

BOTLIST_STOP_WORDS = [ "abc", "123" ]
keywords = "kjskldjfkljskdfksdkl abc lkjsdfksdklfd 123123"

def filterStopwords(self, keywords):
""" Find all of the stopwords and remove them"""
res = []
keywords_list = keywords.lower().split()
res = list(set(keywords_list).difference(set(BOTLIST_STOP_WORDS)))

# Return the new keyword string
return " ".join(res)

In that example, the stop words will be removed from the string.

