Skip to content

Parallelism & chunking at convert step #68

@ml-evs

Description

@ml-evs

It might be nice to allow for the convert step to be performed in parallel, with each chunk combined at the end.

This should reduce peak memory usage (currently all entries need to fit in memory in the OPTIMADE format) and would also give us better control of concurrency, as for now it seems e.g., the pymatgen CIF reader will happily use all cores and lock up a system.

The only difficult here will be how the properties are then assigned to a structure. We could consider changing this to a two-step process, where first a bare optimade.jsonl is written with all the structures only, and then we loop through that file and add properties where appropriate, writing the results out to a new file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions