Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input populated by spurious files from git repos #12

Open
aidanheerdegen opened this issue Feb 28, 2022 · 2 comments
Open

Input populated by spurious files from git repos #12

aidanheerdegen opened this issue Feb 28, 2022 · 2 comments

Comments

@aidanheerdegen
Copy link
Contributor

This input line https://github.com/COSIMA/1deg_jra55_ryf/blob/master/config.yaml#L26 leads to lots of spurious files being linked into the the work directory, e.g.

work/atmosphere/INPUT/make_ryf/.git/COMMIT_EDITMSG:
  fullpath: /g/data/ik11/inputs/JRA-55/RYF/v1-4/make_ryf/.git/COMMIT_EDITMSG
  hashes:
    binhash: 22ec7701562a961d8119c4af92a98044
    md5: 9bc314d387974df26448af23836f5f23
work/atmosphere/INPUT/make_ryf/.git/HEAD:
  fullpath: /g/data/ik11/inputs/JRA-55/RYF/v1-4/make_ryf/.git/HEAD
  hashes:
    binhash: 48c6644dd466a8ff8d22fbdda382e22a
    md5: 2b74885bd597136c4af661a85538b2a3

See https://github.com/COSIMA/1deg_jra55_ryf/blob/master/manifests/input.yaml#L84-L343

Because there are two code repository directories under /g/data/ik11/inputs/JRA-55/RYF/v1-4.

Possible solutions:

  1. Keep the code for generating the data in a different location
  2. Put the data inside a subdirectory and use that as the input location, e.g. /g/data/ik11/inputs/JRA-55/RYF/v1-4/data
  3. Directly link to the files required rather than the directory itself, e.g.
    - name: atmosphere
      model: yatm
      exe: /g/data/ik11/inputs/access-om2/bin/yatm_a227a61.exe
      input:
            - /g/data/ik11/inputs/access-om2/input_20201102/yatm_1deg
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.huss.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.licalvf.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.prra.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.prsn.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.psl.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.rhuss.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.rlds.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.rsds.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.tas.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.uas.1990_1991.nc
            - /g/data/ik11/inputs/JRA-55/RYF/v1-4/RYF.vas.1990_1991.nc

The also affects the other RYF configs.

@aekiss
Copy link
Contributor

aekiss commented Feb 28, 2022

Oops, looks like I was responsible for that.

Putting the repos in /g/data/ik11/inputs/JRA-55/RYF/v1-4 is not a great way to document the origins of the files, because there's no saying which commit is relevant. It would be better to include the URL to the exact commit in github as an attribute in each .nc file, so they carry their provenance with them. I've been doing that with other inputs using scripts such as this: https://github.com/COSIMA/initial_conditions_access-om2/blob/master/finalise.sh. If we did that we would no longer need to have the repos there (if we trust that github will last forever).

Perhaps payu could also ignore .git directories in case this happens again (belt-and-braces)?

@aidanheerdegen
Copy link
Contributor Author

Agreed that adding information to the files themselves is a better solution.

Ignoring .git repos would end up including the code in the input manifest, which is arguably undesirable. It is possible to do. Chuck an issue on https://github.com/payu-org/payu if you think it is worthwhile.

There was always a problem with the approach of just linking everything in the input dirs and implicitly assuming it is used as an actual input to the model. Absent of a better approach this was the best that could be done. As a result it is the responsibility of whoever is setting up the config to make that as clean as possible, assuming this is a shared (or even known) goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants