Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial commit of alliteration filter #251

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ahoimarie
Copy link

A simple alliteration filter that checks if a sentence is an alliteration or not.

@ahoimarie
Copy link
Author

Sorry, I did not mean to close the pull request, I wanted to force-reload from the upstream branch to undo my erroneous changes to the wrong folder, which had caused the checks to fail.

@ahoimarie ahoimarie reopened this Sep 4, 2021
@james-simon
Copy link
Contributor

james-simon commented Sep 16, 2021

Simply superb submission! Alliteration's always an amusing attribute to amplify. I'm a reviewer, and I just have a few minor comments.

  1. In your README, you don't state what your exact criterion for alliterativity is. It's clear from your code, but it's worth mentioning towards the top of the README.

  2. That criterion's pretty strong - few people would say that "Peter Piper picked a peck of pickled peppers" isn't alliteration! I think it'd make sense to weaken it (e.g. allow one off-word for every $k$ alliterative words) or allow the user to set the filter strength (e.g. take $k$ as an argument).

  3. For multi-sentence input, what's your rationale for ignoring all but the first sentence? I'd intuitively expect that the filter would either check whether the entire passage was alliterative on the same letter or whether each sentence is independently alliterative.

Solid work! Let me know if you have any questions.

@rteehas
Copy link
Contributor

rteehas commented Sep 18, 2021

I enjoyed this submission as well. I agree with @james-simon, but on the last point I think that it would make the most sense to treat each sentence independently, since that's probably more likely than an entire passage being alliterative.

@mille-s
Copy link
Contributor

mille-s commented Sep 21, 2021

Thanks for the submission! I think it would be good to add a few words on why sentences with alliterations are a challenge, and for which tasks; right now this is not clear to me (even if I like the idea). As mentioned previously, please also add the method you apply to select the sentences. By the way, this is tagged as "Transformation" but should probably be "Filter".

@ahoimarie
Copy link
Author

ahoimarie commented Sep 22, 2021

Hi @james-simon, hello @rteehas,
gratitude and greetings! Alliterations always attract audiences.

  1. I added a few more lines in the README to explain my criterion for alliterativity. Fixed in 9940840
  2. I have added a new variable min_alliteration_length to alleviate it. Now, a sentence is deemed an alliteration if it contains at least min_alliteration_length words starting with the (first) phoneme found. Would it make sense not to use the first phoneme but the one that occurs most in each sentence?
  3. I wanted to avoid checking if a whole paragraph was alliterative because I thought that would practically never be the case. But I agree that ignoring the remaining sentences is not optimal either. I now fixed it to go through all the input sentences and to return true if any of the sentences are alliterative. I think this is what @rteehas suggested too? Alternatively, it would be possible to check each sentence individually but I think having separate outcomes for each input sentence is not the aim of this filter?

Does this make sense?
Thanks again!

@ahoimarie
Copy link
Author

Hello @mille-s,
thanks for your feedback! I have added a few more lines in the README to discuss alliterations and their use. Please see 9940840. Would you like me to expand on it?
Thanks again!

@james-simon
Copy link
Contributor

Ahoy! Although arguably ameliorating the aforementioned ails, your changes created certain contrasting concerns:

  • if the criterion for alliterativity is for any k words to start with the first phoneme, won't a lot of long sentences just satisfy this by accident? For example, if a sentence starts with "The" and includes two other "the"s later, wouldn't that count? This doesn't seem like it actually checks for alliteration; I think the alliterative words do have to be close together. If you agree, you should rethink the rule. As an example, a criterion I might choose would be to take two arguments, min_alliteration_length and allowable_offwords, and then check if a sentence contains any long run (not necessarily starting with the first word) of min_alliteration_length alliterative words, allowing allowable_offwords offwords breaking up the run. For example, the sentence Look, guys: Peter Piper picked a peck of pickled peppers would satisfy min_alliteration_length=6 with allowable_offwords=2 (i.e. "a" and "of"). This criterion seems like it gets much closer to what the typical reader would actually call an alliterative sentence. Does that make sense?
  • the header "Why is it a challenge?" doesn't seem right; I'd say something like "Why is this filter important?" instead
  • typo in the README: "will they not be removed"

I think checking for a single alliterative sentence makes sense.

@mille-s
Copy link
Contributor

mille-s commented Sep 23, 2021

@ahoimarie thank for the details, I agree it could be used in a generation setting to see if a model manages to create alliterations, although I suspect that the subpopulations with alliterations in the datasets we are using will be extremely small. But with other (more literary) datasets that could be nice.

@ahoimarie
Copy link
Author

Hello! Hopefully having handled your handy help here, I imagine it indeed improved its impact.

  • Instead of checking the first phoneme of the sentence, I now use a rolling window of size min_alliteration_length+allowable_offwords to check if it contains one phoneme of at least the frequency min_alliteration_length in it. If the sentence is shorter than min_alliteration_length+allowable_offwords, the window reduces to min_alliteration_length.
  • Agreed, and changed.
  • Yes, thanks!

Thanks! Thorough thoughtful things!
The thing is, the theme of this thrilling thread thrives without thumbing through a thick thesaurus.

@james-simon
Copy link
Contributor

Amazing! Apt augmentations, all. Two tiny things:

  • I think "If the windowlen is smaller than the length of the data" in your docstring is meaning-reversed
  • Is allowable_offwords a parameter the user can specify, or is it always 2? It'd be better if it could be chosen

After the aforementioned alterations are achieved, I'll animatedly accept.

@ahoimarie
Copy link
Author

Awesome as always. Appreciated!

  • Impressive inspection! I immediately edited it.
  • allowed_offwords is indeed a parameter the user can specify. For example print(Alliteration(stopwords = False, min_alliteration_length=5, allowed_offwords=3).filter("Illuminating illustration of this inane but interesting instruction.")) will return True but calling it with allowed_offwords=2 won't.

Any additional amendments?

@james-simon
Copy link
Contributor

Ja, I'm happy! Great filter; every doubt's crushed. Bravo! Accept.

Makefile Outdated Show resolved Hide resolved
@kaustubhdhole
Copy link
Collaborator

All looks good. Please fix the makefile and I am happy to merge :)

@kaustubhdhole
Copy link
Collaborator

The makefile still shows in the tracking as deleted.

@ahoimarie
Copy link
Author

Hi @kaustubhdhole, sorry for the delay, I have been on sick leave because of an injury.
I've added the Makefile again -- does it work now or shall I create a new PR?

@kaustubhdhole
Copy link
Collaborator

@ahoimarie I missed this message somehow. Please create a separate PR and link this PR in the comment.

@kaustubhdhole
Copy link
Collaborator

Ensure that when you click "Files Changed", the makefile remains untouched.

@ahoimarie
Copy link
Author

@kaustubhdhole I removed the makefile with this little trick mentioned on stackoverflow, as creating a separate PR kept failing. But this should solve it too, shouldn't it?

@ahoimarie ahoimarie mentioned this pull request Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants