GraphCNN and CPU memory usage #413

MLPharmaTron · 2025-08-25T09:53:05Z

MLPharmaTron
Aug 25, 2025

Hi AMPL staff.

I am currently using a GraphCNN and CPU usage is massive. @2.5% of the final intended training data, CPU usage has approached 150GB.

Questions:

I am intending to work with ultra large libraries. Is this something AMPL is equipped to handle? I can't find other publications which have used or validated it for 100M++ datasets.
Can we use our own precomputed features? It seems the dictionary requires us to put in a featuriser that is pre-approved in AMPL. I am wanting to use my own encoder to create my own tensors for training. I am unable to figure out if this is possible on AMPLs workflow.

Thanks.

Answered by stewarthe6

Sep 9, 2025

There aren't any plans to implement the DiskDataset. I don't think it will be able to scale to the 100M or 1B range.

You can try experimenting with batch size, network size, weight decay, or learning rate. I don't think we have any tools for you to directly inspect the weights or deltas.

View full answer

stewarthe6 · 2025-08-27T21:13:37Z

stewarthe6
Aug 27, 2025
Maintainer

Hello,

I don't think AMPL is suitable for datasets that large. The entire dataset is held in memory using a NumpyDataset from DeepChem. We used to use a DiskDataset, which could stream data in, but we never have datasets large enough to need it and using a NumpyDataset is faster.
Yes, you can. There are a few limitations though and it is an advanced user feature. The features need to be a vector and saved in a csv file. You'll have to edit descriptor_sets_sources_by_descr_type.csv and add a new row for your encoding. Then you can call AMPL with 'descriptor_type' with the new name. Since AMPL won't know how to calculate these descriptors, you'll have to place the pre-computed features somewhere AMPL can find it. Here's an example.

data.csv
scaled_descriptors/
    data_with_example_encoding_descriptors.csv

2 replies

MLPharmaTron Sep 1, 2025
Author

Thank you. Following on, I have two questions.

Has DiskDataset is deprecated as a function?
It seems active learning is not a component built into AMPL, correct? It does seem using the wrapper in conjunction with other libraries such as moDAL is not really AMPL friendly?

stewarthe6 Sep 3, 2025
Maintainer

We're not supporting that anymore. I will record that you requested it.
I'm not familiar with moDAL and what it needs. AMPL does a prediction uncertainty for some model types that you can use for uncertainty based sampling.

MLPharmaTron · 2025-09-08T01:44:00Z

MLPharmaTron
Sep 8, 2025
Author

Thanks Stewart.

Any idea what is the likely outcome and status on the DiskDataset?

Also I've tried to address this problem using ECFP4s instead of graphs, and while I saw a 10% decrease in memory, and a 4x training speed, I noticed that training failed after 17 epochs, signified by a drop in r2 vs epoch. I suspect this is because ECFP4s are binary vectors and explosion problems occur, which graphs smoothing out can avoid (unless you have other insights). AMPL isn't really geared to troubleshoot this within its framework is it?

If so, it seems AMPL is really not something that can train in the 100M, let alone 1B range is it?

2 replies

stewarthe6 Sep 9, 2025
Maintainer

There aren't any plans to implement the DiskDataset. I don't think it will be able to scale to the 100M or 1B range.

You can try experimenting with batch size, network size, weight decay, or learning rate. I don't think we have any tools for you to directly inspect the weights or deltas.

Answer selected by MLPharmaTron

MLPharmaTron Sep 15, 2025
Author

Thank you Stewart

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GraphCNN and CPU memory usage #413

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GraphCNN and CPU memory usage #413

Uh oh!

MLPharmaTron Aug 25, 2025

Replies: 2 comments · 4 replies

Uh oh!

stewarthe6 Aug 27, 2025 Maintainer

Uh oh!

MLPharmaTron Sep 1, 2025 Author

Uh oh!

stewarthe6 Sep 3, 2025 Maintainer

Uh oh!

MLPharmaTron Sep 8, 2025 Author

Uh oh!

stewarthe6 Sep 9, 2025 Maintainer

Uh oh!

MLPharmaTron Sep 15, 2025 Author

MLPharmaTron
Aug 25, 2025

Replies: 2 comments 4 replies

stewarthe6
Aug 27, 2025
Maintainer

MLPharmaTron Sep 1, 2025
Author

stewarthe6 Sep 3, 2025
Maintainer

MLPharmaTron
Sep 8, 2025
Author

stewarthe6 Sep 9, 2025
Maintainer

MLPharmaTron Sep 15, 2025
Author