Skip to content

Conversation

@tltomi
Copy link

@tltomi tltomi commented Aug 25, 2025

Improving read and write speed, adding gdx.GdxFile.read_single_symbol, and adding disable_gc kwarg to gdx.GdxSymbol.load, read_gdx.to_dataframes, and read_gdx.to_dataframe

…, and adding disable_gc kwarg to gdx.GdxSymbol.load, read_gdx.to_dataframes, and read_gdx.to_dataframe
@elainethale elainethale self-requested a review September 12, 2025 21:05
Copy link
Member

@elainethale elainethale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments.

Comment on lines +325 to +326
target_symbol = GdxSymbol(name, data_type, dims=dims, file=self, index=index)
self.append(target_symbol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is skipping this for symbols whose data you don't want to load costly enough in terms of time to warrant this separate function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In large GDX files, it is a very significant improvement in runtime. We encounter regularly 100-1000 Mb GDX files with hundreds of symbols. This method is sometimes up to 15x faster when reading a single symbol from a large file. There are of course other overheads, like opening the file, etc and thus the old method is significantly better to read all symbols. The increase in code maintenance and documentation is the biggest drawback. Small files are always extremely fast.

Comment on lines +291 to +292
if not self.empty:
raise Error("GdxFile.read_single_symbol can only be used if the GdxFile is .empty")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that you can't call this function multiple times to read in the metadata for multiple symbols? So you can either read in the meta data for all symbols or for just one symbol?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That section should probably be improved. You can call that function multiple times, because it always closes the file and starts refreshed. That was a safeguard while developing and the code should not reach that section, but there might be some use case I can't think of and didn't remove that Error.

Comment on lines +1122 to +1124
# restore GC if changed
if disable_gc and gc_was_enabled and not gc.isenabled():
gc.enable()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad this is here. It makes me more comfortable that disabling is default behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: "def load(self,load_set_text=False,disable_gc=True):" Carbage collector is Python's internal guard against memory leaks etc. and this disables it by default. "disable_gc=False" would be the python default behavior. The speedup with large GDX files is quite large and I didn't notice any issues while testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants