AutoPR for paper information correction #4147

yufei118liu · 2024-12-11T16:37:48Z

This PR implements a semi-automated bulk processing of metadata corrections. We hope that it will vastly simplify the metadata correction experience for both users and Anthology staff. It includes the following pieces:

A link is added to each paper page which, when clicked, displays a modal dialog allowing for editing of the paper's title, author list, and abstract
Submitting the dialog takes the user to a Github issue template populated with the changed information in a JSON data block
A new script, process_bulk_metadata.py, processes these work items, makes the corrections, bundles them into a branch, and creates a unified PR

TODO items:

Replace the direct link with a dialog that allows for UI manipulation of the data
Finish the bulk processing script (may finish this in another PR so this can go live)
Create an LLM bot that attempts to validate each PR by checking whether the proposed changes match the PDF
~~Figure out why the text on the yellow "Fix metadata" button is white, not black~~
~~Close old metadata change requests with a note to use the new format (Clean up metadata issues #4175)~~

This change contains the following features: On each paper page, a collapsible is added at the bottom, with the indication "File a correction". When clicked on, it shows a form prefilled current of the paper Users can modify these information and click submit After submission, the user is taken to a window that creates pull request to the repository. The effect of the pr is to add file containing these new information that the user entered. The path to the file and the branch that the pr is aiming can be specified.

This merge contains the following features: On each paper page, a collapsible is added at the bottom, with the indication "File a correction". When clicked on, it shows a form prefilled current of the paper Users can modify these information and click submit After submission, the user is taken to a window that creates pull request to the repository. The effect of the pr is to add file containing these new information that the user entered. The path to the file and the branch that the pr is aiming can be specified.

mjpost · 2024-12-11T16:46:38Z

Thanks for starting this off! We can deal with the UI elements once we get the functionality working. I see that this creates a PR against a Markdown file. However, the PR needs instead to be created against the relevant portion of the the XML files. For example, if I edit K17-1003, it should create a PR against the file data/xml/K17.xml, for the <paper id="3"> block. Does that make sense?

Copilot reviewed 1 out of 3 changed files in this pull request and generated no suggestions.

Files not reviewed (2)

hugo/layouts/papers/single.html: Language not supported
hugo/layouts/partials/correction.html: Language not supported

mjpost · 2024-12-11T16:50:59Z

At the moment, it seems to be trying to create a Markdown file, e.g., as pictured. But this Markdown file is not part of the repository; instead, it is an artifact of the build process, created by running create_hugo_yaml.py. We need to edit he XML itself.

The trick here is whether a pull request can be automated. If we can find a way to (a) grab the current relevant XML file (b) modify it according to the parameters and (c) create a PR with the new file, that would be sufficient. I suggest we take these one by one.

yufei118liu · 2024-12-11T16:53:46Z

Yes it makes sense. However, I have not yet found a solution to locate and edit a sepecific section of a file in this way. My current vision is that we can put all such .md files in a specific directory, and use a python script to parse the .xml files and execute the change regularly.

mjpost · 2024-12-11T16:59:36Z

Yes, that makes sense, this could be a workaround for that (though note that the information is not all passed on correctly at the moment).

Can you summarize the research you've done about automating PRs? What is possible in this area? For example, we know that we can do the following in Javascript:

Identify the correct XML file in the ACL repo
Modify its DOM in the correct location

Would it be possible, then to create a PR, replacing the existing file with the new one?

github-actions · 2024-12-11T17:05:25Z

Build successful. Some useful links:

Complete site preview: https://preview.aclanthology.org/bulk-corrections-2024-12-28
Potential volumes of interest:

This preview will be removed when the branch is merged.

yufei118liu · 2024-12-11T17:46:36Z

The current version (with a tiny change) can already identify the .xml file and bring the user to the modifying window.
Modifying/parsing can be done through a script. However, the main challenge is that creating a pr this way request authentification, which I'm not sure can be done when the user visit the website. I'll look more into this.
Replacing the existing file with a new one would be suboptimal, since every .xml file are long, and in the current method, the content of the new file are converted to string and passed as a parameter.

hugo/layouts/papers/single.html

hugo/layouts/partials/correction.html

mbollmann · 2024-12-11T18:18:31Z

hugo/layouts/partials/correction.html

+    <label for="paper_author">Paper authors: </label>
+    {{ range $index, $element := .paper.author }}
+        {{ if $index }}
+          <label for="paper_author_first">First Name: </label>
+          <input type="text" id="paper_author_first_{{$index}}" name="paper_author_first_{{$index}}" value="{{ $element.first }}"><br><br>
+
+          <label for="paper_author_last">Last Name: </label>
+          <input type="text" id="paper_author_last_{{$index}}" name="paper_author_last_{{$index}}" value="{{ $element.last }}"><br><br>
+
+          <!-- Is full name always first + last name?  -->
+          <!-- <label for="paper_author_full">Full Name: </label>
+          <input type="text" id="paper_author_full_{{$index}}" name="paper_author_full_{{$index}}" value="{{ $element.full }}"><br><br> -->
+        {{ end }}
+      {{ end }}


Two potential complications:

What do we do with names that require disambiguation through author IDs? We probably don't want to expose the complexity of author IDs to end users...

I understand that this is a prototype/proof-of-concept, but just a note that we’d ultimately need functionality to remove, add, and ideally reorder author fields as well.

The ID could be a hidden field embedded into the web page. As for the aesthetics and design, I have a new mockup I'll post below.

mbollmann · 2024-12-11T18:36:55Z

Cool that you're looking into concrete ideas for this!

Regarding the .md files: I would definitely not want to add temporary Markdown files to the repository, if this is the suggestion. It clutters the commit history and I’m not sure it provides any benefit over simply opening a new issue containing this information. After all, we could also run a Python script that interacts with the Github API to read and parse issues, and then modifies the XML accordingly.

I also feel that requiring the user suggesting changes to authenticate is a good thing, as it's a potential hurdle for abuse of this system. We shouldn't discount the security implications of such automation.

Finally, if we need to find the line in an XML referring to a specific paper, there's always the option to embed that information into the website at build time. (Although this may run the risk that the information is outdated by the time the user clicks on the correction button.)

mbollmann · 2024-12-11T18:41:09Z

After all, we could also run a Python script that interacts with the Github API to read and parse issues, and then modifies the XML accordingly.

To add: This was kinda my original idea with the issue templates, that we might be able to automate the creation of a PR from those. Unfortunately, people currently don't use them consistently enough for that, but I wouldn't discount that idea entirely. :)

mjpost · 2024-12-11T18:57:46Z

I agree that we shouldn't create new temporary files in the repo. If we can't directly submit a PR, I'd suggest that we would create a queue outside the repo that could then be handled by a script. But I prefer to have the users themselves create a PR using their own Github ID.

mjpost · 2024-12-11T19:00:24Z

I'm attaching a mockup of the interface (generated with the help of Claude) which adds author list manipulation. I'd prefer to have a button that would pop up a dialog containing this code, so everything would be self-contained. You can download and test this file.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Paper Editor</title>
    <link href="https://cdnjs.cloudflare.com/ajax/libs/tailwindcss/2.2.19/tailwind.min.css" rel="stylesheet">
    <style>
        .draggable:hover {
            cursor: move;
            background-color: #f3f4f6;
        }
        .dragging {
            opacity: 0.5;
        }
    </style>
</head>
<body class="bg-gray-100 p-6">
    <div class="max-w-4xl mx-auto bg-white rounded-lg shadow p-6">
        <form id="paperForm" class="space-y-6">
            <div>
                <label class="block text-sm font-medium text-gray-700">Title</label>
                <input type="text" id="title" class="mt-1 block w-full rounded-md border-gray-300 shadow-sm focus:border-indigo-500 focus:ring-indigo-500" value="Exploring the Syntactic Abilities of RNNs with Multi-task Learning">
            </div>

            <div>
                <div class="flex justify-between items-center">
                    <label class="block text-sm font-medium text-gray-700">Authors</label>
                    <button type="button" id="addAuthor" class="px-4 py-2 text-sm font-medium text-gray-700 bg-white border border-gray-300 rounded-md hover:bg-gray-50">
                        Add Author
                    </button>
                </div>
                <div id="authorsList" class="mt-2 space-y-3">
                    <!-- Authors will be added here -->
                </div>
            </div>

            <div>
                <label class="block text-sm font-medium text-gray-700">Abstract</label>
                <textarea id="abstract" rows="6" class="mt-1 block w-full rounded-md border-gray-300 shadow-sm focus:border-indigo-500 focus:ring-indigo-500">Recent work has explored the syntactic abilities of RNNs using the subject-verb agreement task, which diagnoses sensitivity to sentence structure. RNNs performed this task well in common cases, but faltered in complex sentences (Linzen et al., 2016). We test whether these errors are due to inherent limitations of the architecture or to the relatively indirect supervision provided by most agreement dependencies in a corpus. We trained a single RNN to perform both the agreement task and an additional task, either CCG supertagging or language modeling. Multi-task training led to significantly lower error rates, in particular on complex sentences, suggesting that RNNs have the ability to evolve more sophisticated syntactic representations than shown before. We also show that easily available agreement training data can improve performance on other syntactic tasks, in particular when only a limited amount of training data is available for those tasks. The multi-task paradigm can also be leveraged to inject grammatical knowledge into language models.</textarea>
            </div>

            <input type="hidden" id="anthology_id" value="K17-1003">

            <div class="flex justify-end space-x-4">
                <button type="button" class="px-4 py-2 text-sm font-medium text-gray-700 bg-white border border-gray-300 rounded-md hover:bg-gray-50">
                    Cancel
                </button>
                <button type="submit" class="px-4 py-2 text-sm font-medium text-white bg-indigo-600 border border-transparent rounded-md hover:bg-indigo-700">
                    Submit
                </button>
            </div>
        </form>
    </div>

    <script>
        // Initial authors data
        const initialAuthors = [
            { first: "Émile", last: "Enguehard", affiliation: "" },
            { first: "Yoav", last: "Goldberg", affiliation: "" },
            { first: "Tal", last: "Linzen", affiliation: "" }
        ];

        let authors = [...initialAuthors];
        let draggedItem = null;

        function createAuthorRow(author, index) {
            const div = document.createElement('div');
            div.className = 'draggable flex gap-4 items-center p-4 bg-gray-50 rounded-lg';
            div.draggable = true;
            div.innerHTML = `
                <input type="text" placeholder="First Name" class="flex-1 px-3 py-2 rounded-md border-gray-300" value="${author.first}">
                <input type="text" placeholder="Last Name" class="flex-1 px-3 py-2 rounded-md border-gray-300" value="${author.last}">
                <input type="text" placeholder="Affiliation" class="flex-1 px-3 py-2 rounded-md border-gray-300" value="${author.affiliation}">
                <button type="button" class="remove-author px-2 py-1 text-red-600 hover:text-red-800">×</button>
            `;

            // Add drag and drop event listeners
            div.addEventListener('dragstart', (e) => {
                draggedItem = index;
                div.classList.add('dragging');
            });

            div.addEventListener('dragend', () => {
                div.classList.remove('dragging');
            });

            div.addEventListener('dragover', (e) => {
                e.preventDefault();
            });

            div.addEventListener('drop', (e) => {
                e.preventDefault();
                const dropIndex = index;
                if (draggedItem !== dropIndex) {
                    const temp = authors[draggedItem];
                    authors.splice(draggedItem, 1);
                    authors.splice(dropIndex, 0, temp);
                    renderAuthors();
                }
            });

            // Add remove button listener
            div.querySelector('.remove-author').addEventListener('click', () => {
                authors.splice(index, 1);
                renderAuthors();
            });

            // Add input change listeners
            const inputs = div.querySelectorAll('input');
            inputs[0].addEventListener('change', (e) => authors[index].first = e.target.value);
            inputs[1].addEventListener('change', (e) => authors[index].last = e.target.value);
            inputs[2].addEventListener('change', (e) => authors[index].affiliation = e.target.value);

            return div;
        }

        function renderAuthors() {
            const authorsList = document.getElementById('authorsList');
            authorsList.innerHTML = '';
            authors.forEach((author, index) => {
                authorsList.appendChild(createAuthorRow(author, index));
            });
        }

        // Initialize authors list
        renderAuthors();

        // Add author button
        document.getElementById('addAuthor').addEventListener('click', () => {
            authors.push({ first: '', last: '', affiliation: '' });
            renderAuthors();
        });

        // Form submission
        document.getElementById('paperForm').addEventListener('submit', (e) => {
            e.preventDefault();
            const params = new URLSearchParams();
            
            params.append('title', document.getElementById('title').value);
            params.append('abstract', document.getElementById('abstract').value);
            
            authors.forEach((author, index) => {
                params.append(`author${index + 1}_first`, author.first);
                params.append(`author${index + 1}_last`, author.last);
                params.append(`author${index + 1}_affiliation`, author.affiliation);
            });
            
            params.append('anthology_id', document.getElementById('anthology_id').value);

            const url = `https://aclanthology.org/edit?${params.toString()}`;
            alert(url);
        });
    </script>
</body>
</html>

mjpost · 2024-12-11T21:02:28Z

@mbollmann I've only looked into this briefly, but I am starting to think that we won't be able to automate the creation of PRs.

With the issue templates, what strikes me is how often people fail to use them appropriate, for example, failing to substitute what are to me at least clearly variable templates.

What you wrote here suggests to me another workflow:

To add: This was kinda my original idea with the issue templates, that we might be able to automate the creation of a PR from those. Unfortunately, people currently don't use them consistently enough for that, but I wouldn't discount that idea entirely. :)

Based on this PR, we :

Add a link from each paper page that displays the dialog and allows editing.
Submitting this creates a Github issue under their account. Since we control the input, there is less risk that variables won't be filled. And since it's tied to a user account, it reduces the risk of abuse
We then write code that automatically parses these issues and commits the changes in batches.

I'm thinking that this workflow might actually work. It would also have additional advantages: we could batch the PRs, say on a weekly basis, instead of having to manage (approve and merge) each PR one-by-one. We could also manage a lot of the complexity of tag-handling and IDs in Python code on the server.

Our issues board is already swamped with metadata change requests. This could really help with this and would complete the work you started earlier.

(Apologies if this is what you had in mind, this is only now crystalizing for me).

mbollmann · 2024-12-12T12:43:36Z

Based on this PR, we :

Add a link from each paper page that displays the dialog and allows editing.

Submitting this creates a Github issue under their account. Since we control the input, there is less risk that variables won't be filled. And since it's tied to a user account, it reduces the risk of abuse

Yes, but then we're at a point where I'm not sure we even need to implement our own dialog at all. We could just embed a link that opens a new issue pre-filled with the current title/author list etc., like so:

https://github.com/acl-org/acl-anthology/issues/new?template=01-metadata-correction.yml&anthology_id=K17-1003&paper_title=Exploring%20the%20Syntactic%20Abilities%20of%20RNNs%20with%20Multi-task%20Learning&author_name=Émile%20Enguehard,%20Yoav%20Goldberg,%20Tal%20Linzen

Since we can't enforce that people don't change anything in the issue text, even if we generate it from a dialog on our website, it might be a better approach to validate that with a Github Action or script. It could even automatically post on these issues if there's a problem that people should fix...

mjpost · 2024-12-12T13:36:10Z

Okay, this is great, it seems we are finally homing in on a solution to this problem that has long dogged us. We may yet achieve this in Q4!

Directly linking to prefilled templates like this is a great idea. My main concern is that I don't trust users to make correct edits to those templates while maintaining machine-readability. The state of the current board is a good example of this (pictured above), where people interpret fields differently. We deal with lots of problems with author name formatting, and so on.

It seems better to me to use a modern web UI dialog where we present only the most basic fields that capture 90%+ of use cases and let the user make the changes that way. This avoids the manual error prone in unconstrained manipulation of text fields. This would also allow us to handle, e.g., author ID tags, which would just be hidden from the user (or could be a drop-down list), and any other arbitrary field

We would still instantiate the template, but it would be marked all over with language mentioning that the fields are meant to be read-only to preserve machine-readability. We couldn't stop users from changing them, of course, but likely at this point, only those who knew what they were doing would. In fact, the template could just be a single YAML block containing all the updated information:

==== YAML BLOCK ====
==== PLEASE DO NOT EDIT UNLESS YOU KNOW WHAT YOU'RE DOING ====
id: K17-1003
title: A new era in metadata editing for the <fixed-case>ACL</fixed-case> <fixed-case>A</fixed-case>anthology
abstract: ...
authors:
- name: Post, Matt
   affiliation: Microsoft
- name: Liu, Yang
   id: yang-liu-edinburgh
==== END YAML BLOCK ===

We could then have a weekly script that gathers these issues and creates a batch PR which would require human approval.

mbollmann · 2024-12-12T14:15:07Z

My main concern is that I don't trust users to make correct edits to those templates while maintaining machine-readability.

Oh, absolutely, but I wouldn't trust them not to add irrelevant comments in a web UI that we create either. :)

I think we have to expect and handle these cases in the downstream process either way.

mbollmann · 2024-12-12T14:15:41Z

We could then have a weekly script that gathers these issues and creates a batch PR which would require human approval.

What if we only want to approve some of these changes though?

mjpost · 2024-12-12T14:40:33Z

What if we only want to approve some of these changes though?

The script could require some sign of approval from an Anthology team member, e.g., the addition of a label. @nschneid often performs this approval service on PRs, for example, checking that metadata corrections fit the PDF (I'd also love to add an LLM bot that does the same). It would be good to start conservatively.

nschneid · 2024-12-12T15:00:03Z

Would be happy to be replaced by a bot. :)

mbollmann · 2024-12-12T15:09:40Z

FWIW, I was playing around once with making a Github Action that posts a direct link to the paper based on the Anthology ID mentioned in the issue, but I seemed a bit tricky to do. Maybe if you write a proper script it would be more feasible. A bot posting an image of the first page of the PDF would probably be even more helpful :)

mjpost · 2024-12-12T17:31:45Z

That's a nice idea that would simplify the checking, even if we still rely on @nschneid to do it.

A simple thing to do is just grab the thumbnail, e.g., https://aclanthology.org/thumb/K17-1003.jpg

mjpost · 2024-12-27T02:24:22Z

Sorry, to be more clear, I meant whether there is an id attribute on a particular paper, e.g.,

<author id="fei-liu-gga"><first>Fei</first><last>Liu</last></author>

I have implemented the following solution:

In the paper page creation, every paper has an ID, since these are dumped from the YAML file
These IDs are passed to the Github issue. If new authors are added in the form, they get an empty author ID
When bulk processing metadata, we take each ID, and when editing the XML with the corrections, we search for that ID. If it's present in the XML, we retain it. This is only true for a very small subset, like the above example

In the future, it sounds like I can use NameSpecification when switching to your library.

However, I am hoping that we can merge this PR, and port to the new library in Q1 next year, perhaps under the discussed "sink-or-swim" scenario.

Note that I also updated the bulk processing script to close old-style metadata correction issues with a note, asking authors to resubmit their issue using the new system. You can see an example on #4208. It's very nice and I hope will really clear out our huge number of issues.

mbollmann

I tried to read the bulk processing script now primarily from a security perspective, and I see one issue, which is that we cannot rely on the approval by a team member to determine that the JSON is "safe"—or even that the submitted information is correct!—as a malicious actor could edit the JSON block after approval from a team member. This seems like a security vulnerability that’s not immediately clear to me how to avoid.

K17-1003.html

bin/process_bulk_metadata.py

mbollmann · 2024-12-27T11:55:12Z

bin/process_bulk_metadata.py

+                for author in changes["authors"]:
+                    attrib = {}
+                    if "id" in real_ids:
+                        # if the ID was explicitly represented, preserve it
+                        attrib["id"] = author["id"]


This only checks if the ID was associated with some author, but it may not be the one that is meant here. The scenario I have in mind is the following:

Let's say the original author list is

[ {id: 'john-doe', first: 'John', last: 'Doe'}, {first: 'Mary', last: 'Jane'} ]

If someone now uses the web interface to edit the first author to "Mary Jane", then deletes the second one, wouldn't the submitted JSON block look like

[ {id: 'john-doe', first: 'Mary', last: 'Jane'} ]

Maybe this seems unlikely in this simple case, but I can easily see this happening when editing a list of 15+ authors.

But again maybe just something we need to be mindful of before approving?

I agree this is a possible situation. But my goal with this PR was to get something working and then refine it. Less than 0.5% of the <author> lines in the Anthology XML (1,908 / 354,582) have an id attribute. I'm not sure this is worth worrying about at this point; partly I was planning to revisit this next year when we figure out how to do explicit representation of people, rather than inferring people from names.

Sure, I'm not saying this is a blocker. I would expect an explicit representation of people to mean that essentially everyone has an ID, but of course we can wait until we have figured that out.

I haven't thought it through completely, but that's loosely what I have in mind. And then, separately, there will be an in-repo person database indexed by those IDs. Ingestion would then have to create new person entries, and we'd have to occasionally clean them out if any were orphaned.

bin/process_bulk_metadata.py

bin/create_hugo_pages.py

mjpost · 2024-12-27T13:17:39Z

Thanks for the security-minded review. A few points:

It's definitely worth considering security and potential abuse from the outset
That said, the final PR is still subject to manual merging and review, which mitigates the security threat somewhat
Still, it seems we could setup a Github action that clears approval status if an issue is edited. Let me put one together and push it up

Copilot

Copilot reviewed 9 out of 13 changed files in this pull request and generated 1 comment.

Files not reviewed (4)

.github/ISSUE_TEMPLATE/01-metadata-correction.yml: Language not supported
hugo/assets/css/_papers.scss: Language not supported
hugo/layouts/_default/baseof.html: Language not supported
hugo/layouts/papers/single.html: Language not supported

Comments suppressed due to low confidence (2)

.github/ISSUE_TEMPLATE/99-bulk-metadata-correction.yml:10

The phrase 'activated by following the 'Fix metadata' link from each paper page in the Anthology' should be 'activated by following the 'Fix metadata' button from each paper page in the Anthology' for consistency.

activated by following the 'Fix metadata' link from each paper page in the Anthology

.github/workflows/annotate-metadata-issue.yml:58

[nitpick] The removal of the paper title in the comment reduces clarity. Consider including the paper title for better user experience.

Found ACL Anthology entry: https://aclanthology.org/${anthology_id}

Copilot · 2024-12-27T14:58:01Z

.github/ISSUE_TEMPLATE/99-bulk-metadata-correction.yml

    validations:
      required: true
  - type: markdown
    attributes:
      value: |
-        **Note:** If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.
+        **Important:** If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.


The phrase 'If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.' should be 'If you request an author name correction for yourself, please ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.' for clarity.

Suggested change

**Important:** If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.

**Important:** If you request an author name correction for yourself, please ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.

bin/process_bulk_metadata.py

addressed in 50fe7f9

mbollmann · 2024-12-27T15:33:14Z

Still, it seems we could setup a Github action that clears approval status if an issue is edited. Let me put one together and push it up

Oh, how cool that this appears to be quite easy to do!

mbollmann · 2024-12-27T15:39:08Z

bin/process_bulk_metadata.py

+                collection_id, _, _ = deconstruct_anthology_id(anthology_id)
+                xml_path = f"data/xml/{collection_id}.xml"
+


What I usually do is

scriptdir = os.path.dirname(os.path.abspath(__file__)) xml_path = f"{scriptdir}/../data/xml/{collection_id}.xml"

so that this code is not sensitive to your current working directory that you're calling the script from. I wonder if one "security check" then could be

assert os.path.realpath(xml_path) == os.path.realpath(f"{scriptdir}/../data/xml/")

or something to that effect?

I usually do something similar, but I assumed that this was unnecessary since the file is being open by self.repo.get_contents, and would therefore be relative to the repo root. But I tried it from a different directory and it fails, so I'll add this.

Looks like I had to create both: one path relative to the repo root (for reading from a branch with the GitHub module), and another absolute path (for writing to the actual XML file). I tested this by running in both bin/ and root.

mbollmann · 2024-12-27T20:44:26Z

We should probably coordinate between this and #4192. Should we merge #4192 first, then merge master into autopr to run this against the new build?

mjpost · 2024-12-27T21:28:25Z

Sorry, @mbollmann, I merged this before seeing your note. I had the same thought, actually, but then counter-thought that aren't they mostly independent? Sorry if I complicated your merge.

mbollmann · 2024-12-27T21:41:43Z

I’m not afraid of merge conflicts 😆 but once I merge #4192, let's double-check that the new metadata fixing functionality is still working.

yufei118liu and others added 3 commits December 10, 2024 14:22

Merge branch 'master' into autopr

6831fe1

mjpost added this to the 2024Q4 milestone Dec 11, 2024

mjpost requested a review from Copilot December 11, 2024 16:47

Copilot AI reviewed Dec 11, 2024

View reviewed changes

mjpost mentioned this pull request Dec 11, 2024

Add correct link to Github issue from every paper page #3314

Closed

mjpost requested review from nschneid and mbollmann December 11, 2024 16:56

mjpost assigned yufei118liu Dec 11, 2024

mbollmann reviewed Dec 11, 2024

View reviewed changes

mjpost added 2 commits December 26, 2024 20:07

Add flag to close old issues

4252ded

Update note

fe50ccd

mbollmann requested changes Dec 27, 2024

View reviewed changes

mjpost added 2 commits December 27, 2024 07:02

Remove test code

3099ce7

Restore deleted lines

62181e2

mjpost added 5 commits December 27, 2024 07:31

Added workflow to remove approval status

68b9c04

Look for "approved" label

7e592c9

Cleanup

85e9ce7

black

26b29e7

Use try block around each issue

f6e173f

mjpost requested review from mbollmann and Copilot December 27, 2024 14:57

Copilot AI reviewed Dec 27, 2024

View reviewed changes

mjpost added 3 commits December 27, 2024 09:00

clean up header

f4c2d3e

Add missing anthology id key to javascript

6503e48

typos

50fe7f9

nschneid previously requested changes Dec 27, 2024

View reviewed changes

bin/process_bulk_metadata.py Outdated Show resolved Hide resolved

mbollmann reviewed Dec 27, 2024

View reviewed changes

mbollmann approved these changes Dec 27, 2024

View reviewed changes

mjpost added 2 commits December 27, 2024 10:16

Fix XML file paths

5540e45

Remove LLM checker

599ef1e

mjpost merged commit 8149399 into master Dec 27, 2024
2 checks passed

mjpost deleted the autopr branch December 27, 2024 20:51

	Important: If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.
	Important: If you request an author name correction for yourself, please ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.

		collection_id, _, _ = deconstruct_anthology_id(anthology_id)
		xml_path = f"data/xml/{collection_id}.xml"

AutoPR for paper information correction #4147

AutoPR for paper information correction #4147

Conversation

yufei118liu commented Dec 11, 2024 • edited by mjpost Loading

mjpost commented Dec 11, 2024

Choose a reason for hiding this comment

mjpost commented Dec 11, 2024

yufei118liu commented Dec 11, 2024 • edited Loading

mjpost commented Dec 11, 2024 • edited Loading

github-actions bot commented Dec 11, 2024 • edited Loading

yufei118liu commented Dec 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbollmann commented Dec 11, 2024

mbollmann commented Dec 11, 2024

mjpost commented Dec 11, 2024

mjpost commented Dec 11, 2024 • edited Loading

mjpost commented Dec 11, 2024

mbollmann commented Dec 12, 2024

mjpost commented Dec 12, 2024

mbollmann commented Dec 12, 2024

mbollmann commented Dec 12, 2024

mjpost commented Dec 12, 2024

nschneid commented Dec 12, 2024

mbollmann commented Dec 12, 2024

mjpost commented Dec 12, 2024

mjpost commented Dec 27, 2024

mbollmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjpost commented Dec 27, 2024

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot AI Dec 27, 2024

Choose a reason for hiding this comment

mbollmann commented Dec 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbollmann commented Dec 27, 2024

mjpost commented Dec 27, 2024

mbollmann commented Dec 27, 2024

yufei118liu commented Dec 11, 2024 •

edited by mjpost

Loading

yufei118liu commented Dec 11, 2024 •

edited

Loading

mjpost commented Dec 11, 2024 •

edited

Loading

github-actions bot commented Dec 11, 2024 •

edited

Loading

mjpost commented Dec 11, 2024 •

edited

Loading