="Lorem ipsum dolor! diam amet, consetetur Lorem magna. sed diam nonumy eirmod tempor. diam et labore? et diam magna. et diam amet." givenstring
Count Unique Words - Class
We’ll convert the text to lowercase & count the occurrence of all unique words using class
Case Study
Use:
- List
- Strings
- Classes and objects
We have a large dataset of customer reviews in the form of strings, and we want to extract useful information from them using three identical tasks:
Task1 - String to lowercase: Convert the feedback text to lowercase in order to standardize the set
Task2 - Count the frequency of all words in a given string to identify which words are used more frequently, indicating the key aspects or topics customers are commenting about
Task3 - Count the frequency of a specific word in the event that we wish to analyze the reviews for one specific topic we want to be able to input the topic and find out how often users commented on it
Data
Let’s pretend the data is in this given string:
Create Class
Create Constructor
- From the tasks at hand we have to create a class with
- 3 methods
- Create class: TextAnalyzer
- Define the constructor method
# Create class to analyze the text
class TextAnalyzer:
# Constructor method
def __init__(self, text):
self.text = text
Create Methods
Covert to Lowercase
Replace Punctuations
- Inside the constructor convert the text to lower case using
lower()
- Remove all punctuation marks (periods, exclamation marks, commas, question marks) from the text using
replace()
- Assign the edited text to new attribute fmtText
# Edit the constructor to covert text to lower case
class TextAnalyzer:
# Constructor method
def __init__(self, text):
= text.lower()
lowertext = lowertext.replace('.','').replace('!','').replace('?','').replace(',','')
cleantext self.fmtText = cleantext
Count All Unique Words
- Create freqAll() method to:
- split the text into individual words using split()
- create an empty dictionary to store the word frequency count, so every word will have its paired count
- iterate over the list of words and update the frequency count for each unique word
- use set() which eliminates duplicates and provide a unique list
- use count for counting the occurrences
- return the dictionary
# Edit the constructor to covert text to lower case
class TextAnalyzer(object):
# Constructor method
def __init__(self, text):
= text.lower()
lowertext = lowertext.replace('.','').replace('!','').replace('?','').replace(',','')
cleantext self.fmtText = cleantext
# Construct freqAll method
def freqAll(self):
= [] # create an empty list
wordslist = self.fmtText.split() # string into a list of words
wordslist = {} # create a dictionary
Dict
# loop through the list of words and count the occurrences of each
for key in set(wordslist):
= wordslist.count(key)
Dict[key] return Dict
Count Frequency of Specific Word
- create a method fredOf that receives the word to be found
- use the freqAll method to look for the input word count
- return the count if the word exists, and return 0 if not found
Note: class TextAnalyzer: can be used instead as well without the (object):
# Edit the constructor to covert text to lower case
class TextAnalyzer(object):
# Constructor method
def __init__(self, text):
= text.lower()
lowertext = lowertext.replace('.','').replace('!','').replace('?','').replace(',','')
cleantext self.fmtText = cleantext
# Construct freqAll method
def freqAll(self):
= [] # create an empty list
wordslist = self.fmtText.split() # string into a list of words
wordslist = {} # create a dictionary
Dict
# loop through the list of words and count the occurrences of each
for key in set(wordslist):
= wordslist.count(key)
Dict[key] return Dict
# Construct freqOf method
def freqOf(self,word):
= self.freqAll()
alldict
if word in alldict:
return alldict[word]
else:
return 0
Create Instance of Class
- Call class with string
="Lorem ipsum dolor! diam amet, consetetur Lorem magna. sed diam nonumy eirmod tempor. diam et labore? et diam magna. et diam amet."
givenstring= TextAnalyzer(givenstring) analyzed
Call Function to Convert
print("Lower Text: ", analyzed.fmtText)
Lower Text: lorem ipsum dolor diam amet consetetur lorem magna sed diam nonumy eirmod tempor diam et labore et diam magna et diam amet
Call Unique Words Count
- Call the function freqAll() which automatically returns the dictionary
- assign the returned dictionary to Dict
- print the dictionary
= analyzed.freqAll()
Dict Dict
{'lorem': 2,
'et': 3,
'diam': 5,
'nonumy': 1,
'amet': 2,
'sed': 1,
'eirmod': 1,
'tempor': 1,
'consetetur': 1,
'magna': 2,
'labore': 1,
'dolor': 1,
'ipsum': 1}
Call Count of Word
- Call the count of occurrences of the word: “lorem”
= analyzed.freqOf("lorem")
times times
2
Simple Examples
class Points(object):
def __init__(self, x, y):
self.x = x
self.y = y
def print_point(self):
print('x=', self.x, ' y=', self.y)
= Points("A", "B")
p1 p1.print_point()
x= A y= B
class Points(object):
def __init__(self, x, y):
self.x = x
self.y = y
def print_point(self):
print('x=', self.x, ' y=', self.y)
= Points(1, 2)
p2 = 'A'
p2.x p2.print_point()
x= A y= 2