Counter is a simple way to find the text occurrences in a given text. You can use it to create tag cloud also. Let's look into an example to see how this works
Let's see the word occurrences from a given url in the web. The code can be used to process any palin text also. Just pass it to the Counter method as a list of words.
Let's look at the code now
>>> from collections import Counter
>>> loc = urllib.urlopen("http://www.lalitbhatt.net")
#read the text
>>> text = loc.read()
# Find the counter
>>> words_counter = Counter(string.split(text))
# Show the most common 10 words. You can pass any number as parameter.
# Not passing any number will result in showing all the counters
It will not show any meaningful result. You can use one of the libraries like BeautifulSoap or some regular expression to strip the html tags. Also you might want to build a dictionary of common words which can be stripped out to make any meaningful inference.