Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add columbia ordering to management command #4325

Merged
merged 19 commits into from
Aug 30, 2024
Merged

Conversation

quevon24
Copy link
Member

@quevon24 quevon24 commented Aug 20, 2024

This PR adds the functionality to update_opinions_order command to fill ordering_key field for columbia opinions.

It will search for clusters that include columbia in their source (Z, ZL, ZU, ZLU, etc), clusters with more than one opinion, and whose opinion have a file assigned in the local_path field.

The command to run it:

docker exec -it cl-django python /opt/courtlistener/manage.py update_opinions_order --action sort-columbia --xml-dir /opt/courtlistener/cl/assets/columbia

We need to place the xml files in a directory and then pass the path to the command, which will allow us to read the xml assigned to the opinion, this is essential because we rely on the original xml to infer the correct order because some opinions were created in the wrong order due the old columbia importer.

To test it you can clone this clusters:

docker exec -it cl-django python /opt/courtlistener/manage.py clone_from_cl --type search.OpinionCluster --id 9079419 4226317 8279828 8237637 8041148 4233235 4223647 8041161 8040857 8040920

And paste this files: test_xml.zip in your local environment in this location: /opt/courtlistener/cl/assets/columbia

And then run the command exactly as i put it above.

@quevon24 quevon24 marked this pull request as ready for review August 21, 2024 19:08
@quevon24
Copy link
Member Author

This is the list of ids of the columbia clusters that have more than one opinion and have columbia in their source:

_SELECT_search_opinioncluster_id_FROM_search_opinioncluster_LEFT_202408260949.csv

This is the query I used to get the list:

SELECT "search_opinioncluster"."id" FROM "search_opinioncluster" LEFT OUTER JOIN "search_opinion" ON ("search_opinioncluster"."id" = "search_opinion"."cluster_id") WHERE (EXISTS(SELECT 1 AS "a" FROM "search_opinion" U0 WHERE (U0."cluster_id" = ("search_opinioncluster"."id") AND U0."local_path" > '' AND U0."local_path" IS NOT NULL) LIMIT 1) AND "search_opinioncluster"."source" IN ('Z', 'ZA', 'ZD', 'ZC', 'ZH', 'ZLC', 'ZLR', 'ZLCR', 'ZR', 'ZCR', 'ZL', 'ZM', 'ZQ', 'ZU', 'ZLU', 'ZDU', 'ZLRU', 'ZLCRU', 'ZCU', 'ZMU', 'ZRU', 'ZLCU')) GROUP BY "search_opinioncluster"."id" HAVING COUNT("search_opinion"."id") > 1 ORDER BY "search_opinioncluster"."id" asc

@quevon24 quevon24 requested a review from flooie August 30, 2024 17:59
@quevon24 quevon24 requested a review from flooie August 30, 2024 18:28
quevon24 and others added 2 commits August 30, 2024 12:28
refactor the ngram generator
change function name for getting cleaned columbia text
@flooie
Copy link
Contributor

flooie commented Aug 30, 2024

Thanks @quevon24 this looks good to me -

@flooie flooie merged commit c1c8460 into main Aug 30, 2024
13 checks passed
@flooie flooie deleted the add-columbia-ordering branch August 30, 2024 19:40
Copy link

sentry-io bot commented Aug 30, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ FieldError: Cannot resolve keyword 'opinions_count' into field. Choices are: arguments, attorneys, blocked, c... cl.search.models in filter View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants