Monday, March 27, 2017

Counter in Python

Counter is a simple way to find the text occurrences in a given text. You can use it to create tag cloud also. Let's look into an example to see how this works

Let's see the word occurrences from a given url in the web. The code can be used to process any palin text also. Just pass it to the Counter method as a list of words.


Let's look at the code now

#imports
>>> import urllib
>>> from collections import Counter


#Point to a website which you want to hit
>>> loc = urllib.urlopen("http://www.lalitbhatt.net")

#read the text
>>> text = loc.read()

# Find the counter
>>> words_counter = Counter(string.split(text))

# Show the most common 10 words. You can pass any number as parameter.
# Not passing any number will result in showing all the counters
>>> words_counter.most_common(10)

It will not show any meaningful result. You can use one of the libraries like BeautifulSoap or some regular expression to strip the html tags. Also you might want to build a dictionary of common words which can be stripped out to make any meaningful inference.

No comments:

Post a Comment