×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Python Challenge: Word Frequency Analysis

Python Challenge: Word Frequency Analysis

Python Challenge: Word Frequency Analysis

(OP)
Write a Python function that takes a string as input and returns a dictionary where the keys are unique words in the string, and the values are the frequencies of each word. The function should be case-insensitive and should ignore punctuation.

For example:

CODE --> python

def word_frequency_analysis(text):
    # Your code goes here
    pass

# Test the function
sample_text = "Python is a powerful, versatile programming language. Python is widely used for web development, data analysis, artificial intelligence, and more."
result = word_frequency_analysis(sample_text)
print(result) 

The expected output should be something like:

CODE --> python

{
    'python': 2,
    'is': 2,
    'a': 1,
    'powerful': 1,
    'versatile': 1,
    'programming': 1,
    'language': 1,
    'widely': 1,
    'used': 1,
    'for': 1,
    'web': 1,
    'development': 1,
    'data': 1,
    'analysis': 1,
    'artificial': 1,
    'intelligence': 1,
    'and': 1,
    'more': 1
} 

Provide a concise and efficient Python code solution along with any explanations or considerations. Thank you!

RE: Python Challenge: Word Frequency Analysis

Bard and chat GOT are pretty good at Python.

RE: Python Challenge: Word Frequency Analysis

Sounds like a classroom assignment.

Skip,

glassesJust traded in my OLD subtlety...
for a NUance!tongue

"The most incomprehensible thing about the universe is that it is comprehensible" A. Einstein

You Matter...
unless you multiply yourself by the speed of light squared, then...
You Energy!

RE: Python Challenge: Word Frequency Analysis

It's certainly homework.
When I started learning Python 20 years ago, this was a demonstration example in an introductory book on "What Are Dictionaries Good For?"
If you want us to help you with this, show us some code what have you tried so far and what doesn't work as you expected.

RE: Python Challenge: Word Frequency Analysis

Because I was bored and haven't done anything with Python in a while.

CODE --> Python

def word_frequency_analysis(text):
    output = {}
    for word in text.lower().split():
        if word in output:
            output[word] = output[word] + 1 
        else:
            output[word] = 1

    return output 

More compact, but perhaps less clear.

CODE --> Python

def word_frequency_analysis_2(text):
    output = {}
    for word in text.lower().split():  
        value = output[word] + 1 if word in output else 1
        output[word] = value
        
    return output 

RE: Python Challenge: Word Frequency Analysis

I realized that there is a punctuation problem with my original solutions. I cheated and found about the 'string' library online.

Helper function to strip out punctuation.

CODE --> Python

import string

def strip_punctuation(input_string):
    return input_string.translate(str.maketrans('', '', string.punctuation)) 

And implementing it.

CODE --> Python

def word_frequency_analysis_1(text):
    output = {}
    for word in strip_punctuation(text).lower().split():
        if word in output:
            output[word] = output[word] + 1 
        else:
            output[word] = 1

    return output 

CODE --> Python

def word_frequency_analysis_2(text):
    output = {}
    for word in strip_punctuation(text).lower().split():  
        value = output[word] + 1 if word in output else 1
        output[word] = value
        
    return output

result = word_frequency_analysis_2(sample_text) 

For good measure, one more way.

CODE --> Python

def word_frequency_analysis_3(text):
    output = {}
    for word in strip_punctuation(text).lower().split():
        try:
            output[word] = output[word] = 1
        except KeyError:
            output[word] = 1
            
    return output 

RE: Python Challenge: Word Frequency Analysis

Lastly, for kicks, which is fastest on a longer string?

Quote (Declaration of Independence)

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. --That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security. --Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world.

  1. word_frequency_analysis_1 took 4.990608100000827 seconds to run 100000 times
  2. word_frequency_analysis_2 took 5.240722300004563 seconds to run 100000 times
  3. word_frequency_analysis_3 took 9.201343699998688 seconds to run 100000 times

RE: Python Challenge: Word Frequency Analysis

Bard's solution. After correcting some indent mistakes.

CODE --> Bard

import re

def word_frequencies(text):
  """
  Bard's solution
  Returns a dictionary of word frequencies in a text string.

  Args:
    text: A string containing the text to analyze.

  Returns:
    A dictionary where the keys are unique words in the text, and the values
    are the frequencies of each word.
  """
  # Lowercase the text and remove punctuation.
  text = text.lower()
  text = re.sub(r"[^\w\s]", "", text)

  # Split the text into words and count their frequencies.
  words = text.split()
  word_counts = {}
  for word in words:
    if word in word_counts:
      word_counts[word] += 1
   
    else:
      word_counts[word] = 1

  return word_counts 

8.129551300000458 seconds

RE: Python Challenge: Word Frequency Analysis

mintjulep,

But when you have the function

CODE

def strip_punctuation(input_string):
    return input_string.translate(str.maketrans('', '', string.punctuation)) 
then applying it on the string "foo,bar;baz:spam/eggs." delivers

CODE

>>> strip_punctuation("foo,bar;baz:spam/eggs.")
'foobarbazspameggs' 
which is not good, because then there is nothing to split()

IMO it would be better to use

CODE

def strip_punctuation(input_string):
    return input_string.translate(str.maketrans(string.punctuation, len(string.punctuation) * " ") 
which applied on the same string delivers

CODE

>>> strip_punctuation("foo,bar;baz:spam/eggs.")
'foo bar baz spam eggs ' 
and then you can split() it.

RE: Python Challenge: Word Frequency Analysis

@mikrom

Thanks for the improvement.

RE: Python Challenge: Word Frequency Analysis

One more improvement to return the dictionary in alphabetical order. Change the return statement to

CODE --> Python

return dict(sorted(output.items())) 

The sort imposes a pretty big performance hit.

This returns a List of Tuples, which is faster, but doesn't meet the problem statement.
Depending on the downstream use.....

CODE --> Python

return sorted(output.items()) 

RE: Python Challenge: Word Frequency Analysis

You did a nice job, but unfortunately there has been no feedback so far from soni21 who asked this question.

RE: Python Challenge: Word Frequency Analysis

For my own interest.

CODE --> Python

def strip_with_string(input_string):
    return input_string.translate(str.maketrans(string.punctuation, len(string.punctuation) * " ")).split()

def strip_with_regex(input_string):
    return re.sub(r"[^\w\s]", " ", input_string).split()

a  = strip_with_string(declaration)

b = strip_with_regex(declaration)

print ( a == b)

T_string = timeit.timeit(lambda: strip_with_string(declaration), number=doit)
T_regex = timeit.timeit(lambda: strip_with_regex(declaration), number=doit)

print (f"Strip with string: {T_string}\nStrip with regex: {T_regex}") 

CODE --> Results

True
Strip with string: 1.7044300000416115
Strip with regex: 4.372240500000771 

The String library is much faster than regex for stripping out punctuation.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login


Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close