Building a curated dataset from NREL’s High Throughput Experimental Materials Database (HTEM DB) and training a neural network to predict thin-film properties (starting with thickness). The project is structured in three stages:
-
Stage 1 — Dataset pipeline (complete):
Notebook-driven workflow to search HTEM libraries, download filtered libraries locally to reduce API calls, and construct a flattened ML-ready dataset (deposition parameters + composition + measurement outputs). Includes preliminary cleaning, EDA, and outlier handling. -
Stage 2 — Neural network modeling (in progress):
Training and refining a PyTorch regression model. Current focus:- Simplify architecture while preserving performance (parameter efficiency vs. accuracy)
- Tune batch size / learning rate tradeoffs; improve training stability
- Add regularization (dropout, weight decay) and compare impact on overfitting
- Implement learning-rate scheduling and early stopping
-
Stage 3 — Web interface (planned):
A lightweight web app for model access and inference once the training pipeline is finalized.
Repo structure
notebooks/— HTEM querying, filtering, dataset creation, EDAneuralnet/— PyTorch training code and experimentsconfig/— localconfig.yaml(ignored by git) for system-specific paths
HTEM API: https://htem.nrel.gov/api-docs
- Pebble (https://github.com/Vision84/Pebble)
- Parky (https://github.com/Plate1/Parky)
