-
Notifications
You must be signed in to change notification settings - Fork 16
Create a pull request #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| # commonwords.py | ||
| Required packages: | ||
|
|
||
| $ pip install wikipedia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put more documentation on README file.
Besides required packages, you can put descriptions, relevant links, or how to use it
| for word in text_1: | ||
| geometric_mean = math.sqrt(histogram_1.get(word, 0)*histogram_2.get(word, 0)) | ||
| if geometric_mean != 0: | ||
| if word not in preposition and word not in pronoun and word not in article and word not in commonverb and word not in alphabet and word not in conjunction and word not in number: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stopwords = preposition+pronoun+number+article+commonverb+alphabet+conjunction
if word not in stopwords will be more concise code. I would make them as one list before using in the line 63. Also, some libraries already have those commonly ignored list of words as "stopwords". If you use those libraries, you don't have to write out all the words and might save your time.
| text = text.replace(')','') | ||
| text = text.replace("'",'') | ||
| text = text.replace(".",'') | ||
| text = text.replace(",",'') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a minor suggestion, but if you want to make code a bit more scalable and concise code,
you might want to consider
for x, y in [('-',' '), ('(',''), (')',''), ("'",''), (".",''), (",",'')]:
text = text.replace(x, y)
| count[word] = 1 | ||
| else: | ||
| count[word] += 1 | ||
| return count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also minor suggestion)
In python library, there is a special dictionary called Counter.
By default, this initializes the count to be 0, so that you wouldn't need to check if the keyword is inside the dictionary or not.
An equivalent code using Counter will be like following
from collections import Counter
count = Counter()
for word in unsorted_words:
count[word] += 1
return count
| count[word] += 1 | ||
| return count | ||
|
|
||
| #histogram(text_to_word('Subeen-is-(an)-idiot')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the unnecessary comments before submitting
| geometric_mean = math.sqrt(histogram_1.get(word, 0)*histogram_2.get(word, 0)) | ||
| if geometric_mean != 0: | ||
| common_count[word] = geometric_mean | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else statement is unnecessary here
|
Overall, I think you did a great job on documenting functions using docstrings, but still there can be an improvement on README file. |
No description provided.