Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first implementation of steel model with revised structure #8

Merged
merged 5 commits into from
Oct 2, 2024

Conversation

Merjo
Copy link
Collaborator

@Merjo Merjo commented Sep 16, 2024

Added new inflow driven model and adapted steel model to the new, more clear structure. Still not running completely smoothly (some problem in the use process) but overall a first start.

Copy link
Collaborator

@SallyDa SallyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, looks good, although I have not tried to run it, since I don't have the steel data. I made a few small comments/suggestions, so please check these, and we can merge :)

simson/common/data_transformations.py Outdated Show resolved Hide resolved
simson/steel/inflow_driven_steel_historic.py Outdated Show resolved Hide resolved
simson/steel/steel_model.py Outdated Show resolved Hide resolved


gi_values = gi_distribution.values.transpose()
at_a = np.matmul(gi_values.transpose(), gi_values)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some more self-explanatory variable names here please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, definitely. I wasn't really sure how though - gi_values I can definitely rename, at_a here means matrix a transposed times matrix a, basically I am solving for a parameter here according to a standard optimization approximation Jakob proposed (to minimize Ax-b). I guess the options are to either to

  1. squish all the lines into one to avoid using that variable at all
  2. Come up with some sort of variable name that describes this
  3. Keep it as it is but make a better explanatory comment before
    What do you think would be best? I'm tending towards option 3, but am happy to go with something else (I know it probably doesn't matter too much but wanted to ask cause I don't have a good feel for how to write very self-explanatory code yet).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, yes it's difficult when there isn't really a physical meaning, and squashing all of that into line 74 is probably too much. I also think a comment is the best option.

elif fitting_function_type == 'exponential':
prms_out = least_squares(
exponential_fitting_function,
x0=np.array([400, 1]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be the same x0 as used in the sigmoid case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think they should be different, at least the second one (the first one should be the saturation level which I guess should be similar with both approaches, but the way it is calculated in the sigmoid case is I think specific to the plastics model [I guess it might make sense to change that structure at some point as this is in the common section now - e.g. by specifying in the config whether and how the initial guess of saturation level and other parameters should be calculated?]).

In the former steel model, we had different version of this regression, one, where the parameters where optimised like here and one where we calculated it numerically. As I changed the inflow driven model calculation, I steel need to look at the numbers to make this decision so this might change soon again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I didn't release this was a model specific initial guess. If you were using both options (exponential and sigmoid) for the steel model (e.g. to compare results from different approaches), you would then have the same x0, right? Or is it also dependent on the fitting function? Maybe, we could pass x0 as a parameter to the gdp_regression function, but I think I would keep hard-coded values somewhere (and not require the user to specify it in the config), maybe directly in the steel and plastic models, or putting (or calling) this gdp_regression in the steel / plastic directories rather than common. For now, please just add a little comment to the code with your explanation above and I will think what to do with it later!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial guess x0 would be different for the different fitting functions, at least for the second parameter. In both fitting functions here the first parameter describes the saturation level, whilst the second one defines some other valuable that influences the curvature of the function which will be very different depending on the form of the function. One could also imagine fitting functions which don't use a saturation level at all in the future, hence they should always be different. I will add the comment like you said and as said I also need to look at the numbers a little more myself - am happy to change the location of the x0 values with whatever you propose!

Comment on lines 74 to 75
for i_region in range(shape_out[1]):
for i_good in range(shape_out[2]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this would work, but:

Suggested change
for i_region in range(shape_out[1]):
for i_good in range(shape_out[2]):
for i_good in range(shape_out[2]):
historic_good = historic_stocks_pc[:, :, i_good]

If we do the above, could we take out the loop over region and calculate using arrays?
e.g.

def sigmoid_fitting_function(prms):
    return prms[0] / (1 + np.exp(prms[1]/gdppc[:n_historic])) - historic_good

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good point, I actually just adopted this from what was here which I am assuming then stemmed from the plastics model (or my memory is getting quite bad and I coded this some time ago :) ). I can try it out, I guess it could even be possible to take out the loop over goods as well, we might need to reshape the initial parameters then which might look messy, depends a bit on scipy, I'm going to look into it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yes I know this was not actually in your pr, but the whole function is a bit messy, so I was thinking how we can make it a bit cleaner. Yeah, reshaping parameters might be less readable than one loop (or maybe not, I'm not sure). It would likely be more efficient to reshape instead of looping, but I don't think we have an efficiency problem at the moment, so readability would be my priority!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I looked into it now and SciPy requires x0 to be maximum one dimensional, meaning that if we canceled one or both loops, we would need to 'flatten' x0 so that (if we just looped over regions) x0 would look something like this [sat_level_good_0, sat_level_good_1, sat_level_good_2, sat_level_good_3, curvature_prm_good_0, curvature_prm_good_1, curvature_prm_good_2, curvature_prm_good_3], which I think would severely effect readability and following up from what you said I think I would leave it as it is for now (unless we find another solution).

Merjo and others added 4 commits September 18, 2024 10:56
delete unnecessary lines

Co-authored-by: SallyDa <sallyda@pik-potsdam.de>
delete unnecessary init function

Co-authored-by: SallyDa <sallyda@pik-potsdam.de>
Co-authored-by: SallyDa <sallyda@pik-potsdam.de>
Copy link
Collaborator

@SallyDa SallyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you! And thanks for the extra explanations for me as well :)

@SallyDa SallyDa merged commit f63eff7 into main Oct 2, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants