Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## Developer Workflow
Follow SOLID principles.
Always ensure that tests pass before committing.


## Coding Standards
- Keep routers thin; logic in `services/`.
- Repositories are the only DB touchpoint from services.
- DTOs (schemas) are versioned; never leak ORM models.
- Return 201 on creates, 202 on async accepted; 429 on limits; 422 on validation.
- Docstrings: Google style; type hints everywhere.
- Avoid defining `__all__` unless absolutely necessary, and never use `from module import *`; prefer explicit imports (ruff F403/F405 guard against this).
- Keep package `__init__.py` files empty aside from optional module docstrings; avoid re-exporting symbols from them.
- Never add `from __future__ import annotations`; native Python 3.12 typing is required.
- Prefer native union syntax (e.g., `str | None`) instead of `typing.Optional[...]` or `typing.Union`.

## Development Workflow

- All PRs require: ✅ lint, ✅ type-check, ✅ tests.
- Before committing, run all relevant project checks locally and ensure they pass.
- Write concise PR descriptions (why + what), not just code diffs.
- Instructions from user may be asked in any language, but code and response should be in English.
- Do not use `docker compose up -d` (or any detached Compose command) in development-facing scripts or docs.
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,18 +143,21 @@ print clear instructions for manual download.
--dataset-dir data/images --out results
```

This will create a JSONL file in `results/` whose name begins with the
model name and includes the current timestamp. Each line contains the
image file name, the extracted text and any additional metadata returned by
the model. The extractor includes a template prompt that instructs the
model to convert the chemical diagram into a SMILES string. You may
This command now performs **holistic page analysis**. Each JSONL line
corresponds to a single patent page and stores the prompt, raw model
response and any parsed JSON payload containing `structures` (with
model-assigned identifiers and SMILES strings) and `reactions`
(reactant/product identifier sets plus optional conditions). The default
prompt keeps the model blind to MolMole annotations so it must discover the
number of structures and reactions directly from the image. You may
customise the prompt or provide an alternative generation function by
editing `extractor.py`.

5. **Evaluate predictions**: use the evaluator script to compare your
predictions against the ground truth SMILES strings. The evaluator
canonicalises SMILES using RDKit (if available) and computes accuracy
using both SMILES and InChI‑Key matching:
predictions against the ground truth SMILES strings and reaction graphs.
The evaluator canonicalises SMILES using RDKit (if available), matches
predictions per structure identifier, and reports SMILES/InChI accuracy,
mean Tanimoto similarity and reaction precision/recall/F1:

```bash
python -m molmole_research.evaluator --pred results/gpt-4o-vision_*.jsonl \
Expand Down
24 changes: 1 addition & 23 deletions src/molmole_research/__init__.py
Original file line number Diff line number Diff line change
@@ -1,23 +1 @@
"""Top‑level package for MolMole OCSR research environment.

This package contains modules to download the MolMole benchmark dataset,
extract chemical structures from images using vision‑language models,
evaluate predictions against ground truth and orchestrate experiments.

The package is designed to be executed via the command line, for example:

python -m molmole_research.downloader --help
python -m molmole_research.extractor --help
python -m molmole_research.evaluator --help
python -m molmole_research.runner --help

"""

__all__ = [
"downloader",
"extractor",
"evaluator",
"runner",
]

__version__ = "0.1.0"
"""Top-level package for MolMole OCSR research environment."""
Loading