-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Court name issues #129
base: main
Are you sure you want to change the base?
Court name issues #129
Conversation
The comment above the buggy code indicates that it was trying to account for missing punctuation. I did this by allowing matches where the last character of the canonical form is different/missing. I also created a test file to make sure Eyecite gets the right court names when fed some of the most common courts.
This commit updates the unit test which ensures that Eyecite is returning the correct court names.
I ran the tests natively and they all passed. Most of the commit checks that are failing seem to be due to my linter allowing slightly wider lines than your linter (a problem I have noticed while commiting to other FLP projects, and that I should probably address), but I don't know what about this commit could be affecting fast_diff_match_patch and causing build 3.10 to fail. |
I'm actively working on the tests here in courts-db as im putting in some medium sized changes and thinking about how it should work and could work better. |
@flooie can you also opine about how this bug affects CourtListener itself? |
If there is anything I can do to help, I'd be happy to jump on a call to talk though it. |
@bbernicker are you in our slack channel yet? |
No not yet. |
Just sent an invite, @bbernicker. |
Any updates on this PR @flooie? |
@bbernicker I was just reviewing this PR in light of #144 and the changes made to Right now, we use This is problematic, as you point out, because something like This solves the I still think getting rid of |
Thanks Matt. This is really helpful. Let me give this problem some thought. |
This PR addresses #128. The previous code returned the court ID of the first court for which the courts-db cite string started with the court abbreviation eyecite detected in the parenthetical. There was a comment on that code explaining that it used startswith instead of requiring a match because the court abbreviations from eyecite were often missing final punctuation.
This PR looks for courts in courts-db for which the cite string, with or without its final character, matches. This solution is not perfect because the citation_string field in courts-db has many duplicates: for 1981 courts, there are only 1174 unique citation strings. On the other hand, allowing matches without regard to the last character does not seem to make the problem worse, as there are only two citation strings which are the same but for their last character.
Ultimately, ensuring that Eyecite can correctly determine the court which rendered an opinion from a citation will require cleaning up courts-db so that duplicate citation_strings can be disambiguated using other info (such as reporter) and so that alternate citation_strings are permitted for courts (see freelawproject/courts-db#53). We should also add a more robust system for determining the rendering court from the reporter (e.g. with vendor neutral citations) or from neighboring parallel cites (esp. any in the metadata.extras field).
Finally, I added some tests to check that Eyecite is handling court names reasonably well.