A service that automatically transfers microscope data to DataFed repository.
- Automatically detects new files in the microscope data directory
- Processes files daily at midnight
- Tracks processed files to avoid duplicates
- Uploads data to DataFed with metadata
- Comprehensive logging
- Clone the repository:
git clone <repository-url>
cd diatoms-to-datafed- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install the package:
pip install -e .-
Edit
config.yaml:- Set
watch_directoryto your microscope data directory - Set
datafed.repo_idto your DataFed repository ID - Adjust other settings as needed
- Set
-
Create the logs directory:
mkdir logsRun the service:
diatoms-to-datafedThe service will:
- Start running in the background
- Check for new files every day at midnight
- Process and upload new files to DataFed
- Log all activities to
logs/diatoms_to_datafed.log - Track processed files in
logs/processed_files.json
- Application logs:
logs/diatoms_to_datafed.log - Processed files log:
logs/processed_files.json
- Python 3.9 or higher
- DataFed account and repository
- Access to the microscope data directory
- Build Globus container
docker build -t globus_container -f Dockerfile.globus-connect .- Setup Globus endpoint
docker run -e DataPath="{Your Local Data directory}" -e ConfigPath="{Your PWD/(mkdir config)}" -v "{Your PWD + (mkdir config)}:/home/gridftp/globus_config" -v "{Your Local Data Directory}:/home/gridftp/data" -it globus_container- Test Globus Endpoint
docker run -e DataPath="{Your Local Data Directory}" -e ConfigPath="{Your PWD + (mkdir config)}" -v "{Your PWD + (mkdir config)}:/home/gridftp/globus_config" -v "{Your Local Data Directory}:/home/gridftp/data" -e START_GLOBUS="true" -it globus_container docker-compose up --build Check the app running at
http://localhost:5006/app