Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove load from functions #8

Open
briochemc opened this issue May 27, 2020 · 2 comments
Open

Remove load from functions #8

briochemc opened this issue May 27, 2020 · 2 comments

Comments

@briochemc
Copy link
Contributor

@profseth and @hengdiliang

As I mentioned somewhere else, I think we should remove all the load calls from the functions, but I'm not sure what is the best approach, so we should discuss this here (if you even want to change it at all, of course)

I propose two alternatives:

  1. Carrying extra argument(s)

    In each function to hold these items. If it was all up to me, I think I'd try something like turning all the

    fun(A, b, do)

    into

    fun(A, b, BGC, OCIM, DATA)

    where

    • BGC is just my renamed do (because do is not self-explanatory) and holds the usual parameters, e.g., BGC.revscav.on, and so on.
    • OCIM holds the transport circulation and the grid information, so basically it's ao (but instead of loading it many times, we load it once and for all) and water_transport.
    • DATA holds the required external data, e.g., the Mahowald dust 2D map gridded to the OCIM. Again, having it passed as an argument means that it's not loaded every time the computation needs the dust source. DATA could also hold things like other tracers, like stuff for the cost function, etc. Basically it would hold everything else than OCIM and BGC.
  2. Use global variables.

    OCIM circulation and grid, BGC flags, and external data don't change at all while we run OCIM models. Following the 1st solution above, we could just use

    fun(A, b, p)

    everywhere, and turn DATA, OCIM, and BGC into global variables, except for a few "optimizable" parameters that would live in p. Then we could keep the load statements in each of these functions, but have the load statement only trigger if the data is not already available in DATA, OCIM, or BGC. So something like that, for example:

    if ~isfield(DATA, AEROSOLDEP)
        DATA.AEROSOLDEP = load('data/AEROSOLDEP')
    end

    I think this would prevent redundant load calls, and could save a lot of time.


FWIW, I prefer option 2 in theory, but I've been told to avoid globals, so there's that.

What do you guys think?

@profseth
Copy link
Collaborator

profseth commented May 27, 2020

I just ran a model with dust and nepheloid and hydrothermal sources (so at least 3 big 'loads' plus loading ao.mat) which took 0.5 seconds to set up the problem and 9 seconds to solve the matrix inversion. So even with all of that loading it's a relatively small total proportion of the time, right?

Unless I'm missing something and this is a much bigger time-sink than I have understood, I prefer to re-load the data in each function because it makes the functions easier to understand and modify for other uses.

@briochemc
Copy link
Contributor Author

You're right, I don't think it will matter for the current setup, because these are in fact rather "small" loads. But it's a good general practice to not compute or load the same things again and again! 😄

BTW, the changes I'm thinking of should be rather minimal and not obfuscate things at all. I think we could just replace the

load('data/AEROSOLDEP')

type of calls with something like

global AEROSOLDEP
isempty(AEROSOLDEP) && load('data/AEROSOLDEP')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants