Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 631 Bytes

File metadata and controls

13 lines (10 loc) · 631 Bytes

ASMS Data Curation Pipeline

This repository contains a Python-based data curation pipeline for processing Affinity Selection Mass Spectrometry (ASMS) datasets. The pipeline prepares data for machine learning by performing cleaning, labeling, and fingerprint extraction.

Main Features

  • Splits protein-specific data into separate files
  • Detects and filters out anomalous entries
  • Handles isomer corrections
  • Adds negative samples from a master list
  • Generates binary labels for machine learning
  • Extracts chemical fingerprints (e.g., ECFP4, FCFP6, MACCS)
  • Saves curated data in both CSV and Parquet formats