Count Unique Words - Class

We’ll convert the text to lowercase & count the occurrence of all unique words using class

Case Study


Use:

  • List
  • Strings
  • Classes and objects

We have a large dataset of customer reviews in the form of strings, and we want to extract useful information from them using three identical tasks:

Task1 - String to lowercase: Convert the feedback text to lowercase in order to standardize the set

Task2 - Count the frequency of all words in a given string to identify which words are used more frequently, indicating the key aspects or topics customers are commenting about

Task3 - Count the frequency of a specific word in the event that we wish to analyze the reviews for one specific topic we want to be able to input the topic and find out how often users commented on it

Data


Let’s pretend the data is in this given string:

givenstring="Lorem ipsum dolor! diam amet, consetetur Lorem magna. sed diam nonumy eirmod tempor. diam et labore? et diam magna. et diam amet."

Create Class


Create Constructor

  • From the tasks at hand we have to create a class with
  • 3 methods
  • Create class: TextAnalyzer
  • Define the constructor method
# Create class to analyze the text
class TextAnalyzer:
      
      # Constructor method
      def __init__(self, text):
              self.text = text

Create Methods


Covert to Lowercase

Replace Punctuations

  • Inside the constructor convert the text to lower case using lower()
  • Remove all punctuation marks (periods, exclamation marks, commas, question marks) from the text using replace()
  • Assign the edited text to new attribute fmtText
# Edit the constructor to covert text to lower case
class TextAnalyzer:
      
      # Constructor method
      def __init__(self, text):
              lowertext = text.lower()
              cleantext = lowertext.replace('.','').replace('!','').replace('?','').replace(',','')
              self.fmtText = cleantext

Count All Unique Words

  • Create freqAll() method to:
    • split the text into individual words using split()
    • create an empty dictionary to store the word frequency count, so every word will have its paired count
    • iterate over the list of words and update the frequency count for each unique word
      • use set() which eliminates duplicates and provide a unique list
    • use count for counting the occurrences
    • return the dictionary
# Edit the constructor to covert text to lower case
class TextAnalyzer(object):
        
        # Constructor method
        def __init__(self, text):
                lowertext = text.lower()
                cleantext = lowertext.replace('.','').replace('!','').replace('?','').replace(',','')
                self.fmtText = cleantext
              
              
        # Construct freqAll method
        def freqAll(self):
                wordslist = []              # create an empty list
                wordslist = self.fmtText.split()    # string into a list of words
                Dict = {}               # create a dictionary
                
                # loop through the list of words and count the occurrences of each
                for key in set(wordslist):
                        Dict[key] = wordslist.count(key)
                return Dict

Count Frequency of Specific Word

  • create a method fredOf that receives the word to be found
  • use the freqAll method to look for the input word count
  • return the count if the word exists, and return 0 if not found

Note: class TextAnalyzer: can be used instead as well without the (object):

# Edit the constructor to covert text to lower case
class TextAnalyzer(object):
        
        # Constructor method
        def __init__(self, text):
                lowertext = text.lower()
                cleantext = lowertext.replace('.','').replace('!','').replace('?','').replace(',','')
                self.fmtText = cleantext
              
              
        # Construct freqAll method
        def freqAll(self):
                wordslist = []              # create an empty list
                wordslist = self.fmtText.split()    # string into a list of words
                Dict = {}               # create a dictionary
                
                # loop through the list of words and count the occurrences of each
                for key in set(wordslist):
                        Dict[key] = wordslist.count(key)
                return Dict
        
        # Construct freqOf method
        def freqOf(self,word):
                alldict = self.freqAll()
                
                if word in alldict:
                        return alldict[word]
                else:
                        return 0

Create Instance of Class

  • Call class with string
givenstring="Lorem ipsum dolor! diam amet, consetetur Lorem magna. sed diam nonumy eirmod tempor. diam et labore? et diam magna. et diam amet."
analyzed = TextAnalyzer(givenstring)

Call Function to Convert

print("Lower Text: ", analyzed.fmtText)
Lower Text:  lorem ipsum dolor diam amet consetetur lorem magna sed diam nonumy eirmod tempor diam et labore et diam magna et diam amet

Call Unique Words Count

  • Call the function freqAll() which automatically returns the dictionary
  • assign the returned dictionary to Dict
  • print the dictionary
Dict = analyzed.freqAll()
Dict
{'lorem': 2,
 'et': 3,
 'diam': 5,
 'nonumy': 1,
 'amet': 2,
 'sed': 1,
 'eirmod': 1,
 'tempor': 1,
 'consetetur': 1,
 'magna': 2,
 'labore': 1,
 'dolor': 1,
 'ipsum': 1}

Call Count of Word

  • Call the count of occurrences of the word: “lorem”
times = analyzed.freqOf("lorem")
times
2

Simple Examples

class Points(object):
    def __init__(self, x, y):
        self.x = x 
        self.y = y 


    def print_point(self): 
        print('x=', self.x, ' y=', self.y) 


p1 = Points("A", "B") 
p1.print_point()
x= A  y= B
class Points(object): 
    def __init__(self, x, y): 
        self.x = x 
        self.y = y 
    def print_point(self): 
        print('x=', self.x, ' y=', self.y) 


p2 = Points(1, 2) 
p2.x = 'A' 
p2.print_point()
x= A  y= 2