Complete Type Safety: Eliminating All Pyright Errors #245

saleml · 2025-02-21T10:23:37Z

🎯 Complete Type Safety: Eliminating All Pyright Errors

🌟 Major Achievement

ZERO pyright errors across the entire codebase! 🎉
Stricter type checking with reportOptionalMemberAccess and reportArgumentType set to "error"
Critical step towards a production-ready, enterprise-grade codebase

🏗️ Key Architectural Improvements

1. New Type-Safe Containers 📦

Introduced StatePairs[DiscreteStates] for robust state pair handling
Generic ReplayBuffer[ContainerType] implementation
Type-safe container operations with proper generics

2. Enhanced Type Safety in Core Components ⚡

Removed ALL pyright: ignore comments by fixing underlying issues
Proper type casting for DiscreteStates in training examples
Improved null handling in log probabilities and state operations
Better type hints in preprocessors and estimators

3. Configuration & Quality Assurance 🛠️

Extended pyright coverage to include tutorials/examples/ and testing/
Upgraded pre-commit hooks for stricter type checking
More comprehensive type validation across the project

💫 Impact & Importance

Why This Matters

Code Reliability: Eliminates an entire class of runtime errors
Developer Experience: Better IDE support and code navigation
Maintainability: Easier to refactor and extend code with confidence
Documentation: Types serve as living documentation

Strategic Timing ⏰

Critical to merge before the next release
Should precede the graph PR to ensure type safety foundation
Sets the standard for future contributions

🔄 Next Steps

While this PR achieves complete pyright compliance, future improvements could include:

Further generic type constraints
Additional custom type guards
More specific type narrowing
Enhanced type documentation

🎓 Technical Details

+ Added reportOptionalMemberAccess = "error"
+ Added reportArgumentType = "error"
- Removed all pyright: ignore comments
+ Extended type checking coverage

🚀 Call to Action

This PR represents a crucial milestone in code quality. Merging it now will:

Set a strong foundation for future development
Prevent type-related technical debt
Ensure the upcoming graph PR maintains type safety standards

Ready for immediate review and high-priority merge 🔥

- Improve type hints in stack_states method - Add robust handling of log rewards in stacking - Remove pyright ignore comments - Use cast for type safety in DiscreteStates - Improve type conversion and error handling

- Implement type-safe method to generate batch of initial states - Ensure return type is DiscreteStates with an assertion - Extends base class method with discrete environment specifics

…tioning - Implement a generic container for storing and manipulating pairs of states - Support optional conditioning tensors for intermediary and terminating states - Provide methods for extending, indexing, and accessing state pairs - Designed to support flow matching and other algorithms requiring state pair processing

- Implement a helper method to compute loss directly from trajectories - Support different GFlowNet types by handling training sample conversion - Provide a flexible way to compute loss with optional recalculation of log probabilities - Enhance loss computation workflow for various GFlowNet implementations

- Add import for StatePairs from state_pairs module - Extend container module to include the new StatePairs class

- Modify EnumPreprocessor and OneHotPreprocessor to use DiscreteStates type hint - Update type annotations for get_states_indices and preprocess methods - Improve type safety for discrete state preprocessing

- Update `to_non_initial_intermediary_and_terminating_states` method to return a StatePairs instance - Improve type safety by asserting DiscreteStates type for intermediary and terminating states - Enhance method documentation to clarify its purpose and usage - Simplify state pair generation with direct StatePairs constructor

- Update FMGFlowNet to use StatePairs instead of tuple for state handling - Modify loss method to work with StatePairs container - Simplify type annotations and state processing logic - Improve type safety by asserting DiscreteStates types - Update to_training_samples method to return StatePairs

- Update expected_output_dim methods to use @Property decorator - Remove pyright ignore comments in various modules - Improve type safety and code clarity in samplers, modules, and utility functions - Simplify state and action processing in sampling and training methods - Update type hints in discrete environment and estimator classes

- Implement generic ReplayBuffer with type-safe container handling - Remove objects_type parameter and use type inference - Simplify initialization and sampling methods - Add support for dynamic buffer type detection - Improve type hints and remove pyright ignore comments - Update test cases to work with new generic buffer implementation

src/gfn/containers/replay_buffer.py

hyeok9855 · 2025-02-25T16:50:33Z

testing/test_environments.py

+        states = env._step(states, actions)

    # Step 4 fails due an invalid input action.
    actions = env.actions_from_tensor(format_tensor(failing_actions_list))
    with pytest.raises(NonValidActionsError):
-        states = env._step(states, actions)  # pyright: ignore
+        states = env._step(states, actions)


What about exchanging the name of env.step and env._step? Currently, env._step is the one that needs to be called externally (as seen in samplers.py), which seems unusual for something with a private-style naming convention.

The same goes for env._backward_step.

Last year, we used to have maskless_step and step.
We then changed them to step and _step respectively.

A user that defines their environment needs to define step only, and _step handles the masking for them.

When I don't work on the codebase for 1-2 months and go back to it, I agree that _step is confusing. What do you think of we change it to safe_step? (obviously, in a new enviornment, the user would still need to write step only).

@josephdviviano , your opinion here would be appreciated too. Thanks

I don’t really have a strong preference on this —safe_step seems to be fine also.

I don't love the name safe_step, which implies the existence of unsafe_step.

I understand the use of env._step to be correct in this case (how it is called by the Sampler) - of course this is all subjective but I'm comfortable with the current naming --

Let me know what you think:

https://claude.ai/share/8e7a4b6a-7347-4b8e-b064-2f510c2a6d3e

One option might be to call this method env._base_step -- but I think we should keep the _ which denotes to the user of the library "you shouldn't call this method unless you really know what you're doing".

tutorials/examples/train_conditional.py

src/gfn/samplers.py

Change `args.replay_capacity` to `args.replay_buffer_size` to align with parameter naming convention

Improve documentation for the __getitem__ method in StatePairs to clarify batch dimension indexing and note potential differences in intermediary and terminating states batch shapes

…fixpyright

saleml · 2025-02-26T10:14:36Z

Thank you for your commit @hyeok9855. I have addressed all your points, and left a question.

src/gfn/containers/trajectories.py

…which to calulcate PB. this fixes that

Extend test coverage for hypergrid training by introducing a new parametrized test that checks different loss functions and replay buffer sizes. Also add new configuration options to HypergridArgs and CommonArgs classes to support these variations.

…hods Refactor the States class to remove the _log_rewards attribute and associated methods. Update StatePairs and related classes to handle log rewards more explicitly, including modifications to initialization, concatenation, and indexing methods.

Modify the `initialize` method to simplify type checking and initialization of training objects. Move the initialization logic outside of the condition and ensure the buffer is only initialized when no training objects exist.

hyeok9855

I approved with one concern; see below.

I really appreciate this PR, @saleml !!

src/gfn/containers/transitions.py

hyeok9855 · 2025-02-26T15:25:14Z

src/gfn/containers/state_pairs.py

@@ -60,7 +60,7 @@ def __init__(
        self.terminating_conditioning = terminating_conditioning

    def __len__(self) -> int:
-        return len(self.intermediary_states) + len(self.terminating_states)
+        return min(len(self.intermediary_states), len(self.terminating_states))


This approach feels a bit like a workaround. Using the sum of the two lengths seems more reasonable to me.

It looks like we might need to refactor this a bit more. For example, instead of storing intermediary_states and terminating_states in separate variables, we could combine them into one and use two sets of indices to track whether a state is intermediate or terminating.

If you agree, I'll go ahead and create an issue for this. Let me know if you have any better suggestions!"

I agree it's a bit hacky. This is only because we have to have a len function in

def sample(self, n_samples: int) -> Container: """Samples a subset of the container.""" return self[torch.randperm(len(self))[:n_samples]]

in containers/base.py.
Please go ahead and raise the issue. Thanks for your review.

josephdviviano

Partial review of the core features. I didn't check everything. Thanks very much to @hyeok9855 for his thorough review. I do however have some important comments to address.

josephdviviano · 2025-02-25T18:11:13Z

.github/workflows/pre-commit.yml

not sure I agree with removing this file completely. What i agree with is skipping the tests (but leave in black / etc).

this was a duplicate of ci.py. Everything is still tested for on github (e.g., this PR)

josephdviviano · 2025-02-25T18:12:14Z

.github/workflows/python-package-conda.yml

I suppose this is fine, this was duplicated effort more or less, but I worry that pytorch geometric might require us to be much more bound to conda, so I wonder if this is premature.

josephdviviano · 2025-02-25T18:12:40Z

README.md

josephdviviano · 2025-02-27T19:39:38Z

pyproject.toml

@@ -25,11 +25,12 @@ classifiers = [
 einops = ">=0.6.1"
 numpy = ">=1.21.2"
 python = "^3.10"
-torch = ">=1.9.0"
+torch = "==2.6.0"


>= I think.

josephdviviano · 2025-02-27T19:40:20Z

pyproject.toml


 # dev dependencies.
 black = { version = "24.3", optional = true }
 flake8 = { version = "*", optional = true }
+pyright = {version = "*", optional = true}


I think we should pin this to a version, in case the rules change, and it leads to changes being requested across the library.

josephdviviano · 2025-02-27T19:41:18Z

src/gfn/actions.py

+from typing import ClassVar, List, Sequence



I think we rely too much on nonstandard types throughout the code for us to bother imho, but consistency is important.

josephdviviano · 2025-02-27T19:47:32Z

src/gfn/gflownet/base.py

+            # We know this is safe because PFBasedGFlowNet's loss accepts these arguments
+            return self.loss(env, training_samples, recalculate_all_logprobs=True)


Can recalculation be automatically configured IIF we're doing on policy training? We once had a flag to this effect. If a warning is thrown here, I'm not sure what the user can actually do about it.

josephdviviano · 2025-02-27T19:55:55Z

src/gfn/utils/modules.py

@@ -34,6 +34,7 @@ def __init__(
        self._output_dim = output_dim

        if trunk is None:
+            hidden_dim = hidden_dim or 256


I'm not a fan of this magic number. Init has a default value for hidden_dim, so why do we need this raw 256?

josephdviviano · 2025-02-27T20:16:18Z

testing/test_environments.py

+        states = env._step(states, actions)

    # Step 4 fails due an invalid input action.
    actions = env.actions_from_tensor(format_tensor(failing_actions_list))
    with pytest.raises(NonValidActionsError):
-        states = env._step(states, actions)  # pyright: ignore
+        states = env._step(states, actions)


I don't love the name safe_step, which implies the existence of unsafe_step.

I understand the use of env._step to be correct in this case (how it is called by the Sampler) - of course this is all subjective but I'm comfortable with the current naming --

Let me know what you think:

https://claude.ai/share/8e7a4b6a-7347-4b8e-b064-2f510c2a6d3e

josephdviviano · 2025-02-27T21:40:50Z

testing/test_environments.py

+        states = env._step(states, actions)

    # Step 4 fails due an invalid input action.
    actions = env.actions_from_tensor(format_tensor(failing_actions_list))
    with pytest.raises(NonValidActionsError):
-        states = env._step(states, actions)  # pyright: ignore
+        states = env._step(states, actions)


One option might be to call this method env._base_step -- but I think we should keep the _ which denotes to the user of the library "you shouldn't call this method unless you really know what you're doing".

- Relax torch version constraint to >=2.6.0 - Pin pyright version to 1.1.395 - Modify warning message in base.py to provide clearer guidance on log probability recalculation - Enforce hidden_dim requirement in MLP initialization

saleml · 2025-02-28T05:58:11Z

One option might be to call this method env._base_step -- but I think we should keep the _ which denotes to the user of the library "you shouldn't call this method unless you really know what you're doing"

No strong opinion here. We can have this discussion outside this PR.

Thanks for your review. I have addressed your comments.

josephdviviano

Additional comment

josephdviviano · 2025-02-28T14:56:55Z

src/gfn/gflownet/base.py

+            # We know this is safe because PFBasedGFlowNet's loss accepts these arguments
+            return self.loss(env, training_samples, recalculate_all_logprobs=True)


@saleml I don't understand what the purpose of this warning is. If we're throwing a warning, this implies the user can actually do something to improve efficiency. Right now the only thing the user can do is subclass this gflownew base class and override this method completely.

I'm going to file an issue about this.

Salem Lahlou added 30 commits February 21, 2025 14:22

fix: pyright issues in env.py

9584b7f

Merge branch 'master' into fixpyright

608530d

update pre commit and pyproject

8c0c265

fix containers getitem

78a2dbd

allow a None log_reward

dbf3a20

fix return type of to_non_initial_intermediary_and_terminating_states

7e94726

minor typing changes in DB

8962c4f

minor typing changes in TB

eefe19f

minor typing changes in box utils

8a7e91a

fix actions setitem

81193cb

minor changes in samplers.py

4ac2445

minor changes in states.py

192662c

minor changes in FM

b306036

minor changes in line.py

c5a1b65

minor changes in modules.py

a51a519

ensure output of loss is always a tensor

d5e28ce

add setters for input and output dim of MLP

014fb22

remove some pyright ignore

cb118fe

Update Actions class type hints

27cecf6

Add pre-commit configuration for Python code quality

946800d

Refactor states.py type hints and error handling

1e6aaf2

- Improve type hints in stack_states method - Add robust handling of log rewards in stacking - Remove pyright ignore comments - Use cast for type safety in DiscreteStates - Improve type conversion and error handling

Add states_from_batch_shape method to DiscreteEnv

413a45c

- Implement type-safe method to generate batch of initial states - Ensure return type is DiscreteStates with an assertion - Extends base class method with discrete environment specifics

Import StatePairs container in containers module

c325c59

- Add import for StatePairs from state_pairs module - Extend container module to include the new StatePairs class

Update preprocessors to use DiscreteStates type hint

3664f36

- Modify EnumPreprocessor and OneHotPreprocessor to use DiscreteStates type hint - Update type annotations for get_states_indices and preprocess methods - Improve type safety for discrete state preprocessing

hyeok9855 reviewed Feb 25, 2025

View reviewed changes

src/gfn/containers/replay_buffer.py Show resolved Hide resolved

hyeok9855 reviewed Feb 25, 2025

View reviewed changes

tutorials/examples/train_conditional.py Outdated Show resolved Hide resolved

hyeok9855 reviewed Feb 25, 2025

View reviewed changes

src/gfn/samplers.py Outdated Show resolved Hide resolved

hyeok9855 and others added 5 commits February 26, 2025 02:55

refactor the LocalSearchSampler

31e01ec

Update replay buffer capacity parameter name in train_hypergrid.py

38ef8bb

Change `args.replay_capacity` to `args.replay_buffer_size` to align with parameter naming convention

Enhance StatePairs __getitem__ docstring

efdcf54

Improve documentation for the __getitem__ method in StatePairs to clarify batch dimension indexing and note potential differences in intermediary and terminating states batch shapes

remove useless variable

cb12cd6

Merge branch 'fixpyright' of https://github.com/GFNOrg/torchgfn into …

7d3c744

…fixpyright

blackifying

b6fb613

hyeok9855 reviewed Feb 26, 2025

View reviewed changes

src/gfn/containers/trajectories.py Show resolved Hide resolved

Salem Lahlou added 6 commits February 26, 2025 16:17

sometimes, when using replay buffer, there is no valid transition on …

9485ea0

…which to calulcate PB. this fixes that

allow log_probs to be None in transitions

61605dd

Fix StatePairs length calculation to handle unequal state list sizes

bc7fbec

hyeok9855 approved these changes Feb 26, 2025

View reviewed changes

revert the removed check. it's useful

285eebf

saleml removed the request for review from younik February 27, 2025 07:53

hyeok9855 mentioned this pull request Feb 27, 2025

__len__ of StatePairs #247

Open

josephdviviano reviewed Feb 27, 2025

View reviewed changes

Update dependency versions and improve warning message

efcac6f

- Relax torch version constraint to >=2.6.0 - Pin pyright version to 1.1.395 - Modify warning message in base.py to provide clearer guidance on log probability recalculation - Enforce hidden_dim requirement in MLP initialization

saleml mentioned this pull request Mar 2, 2025

gflownet.base.loss_from_trajectories always recalculates logprobs #248

Open

josephdviviano reviewed Mar 3, 2025

View reviewed changes

josephdviviano merged commit 2ef0824 into master Mar 3, 2025
2 checks passed

josephdviviano deleted the fixpyright branch March 4, 2025 05:16

saleml mentioned this pull request Mar 6, 2025

Make pyright happy #235

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete Type Safety: Eliminating All Pyright Errors #245

Complete Type Safety: Eliminating All Pyright Errors #245

saleml commented Feb 21, 2025 •

edited

Loading

hyeok9855 Feb 25, 2025

saleml Feb 25, 2025

hyeok9855 Feb 25, 2025

josephdviviano Feb 27, 2025

josephdviviano Feb 27, 2025

saleml commented Feb 26, 2025

hyeok9855 left a comment

hyeok9855 Feb 26, 2025

saleml Feb 27, 2025

hyeok9855 Feb 27, 2025

josephdviviano left a comment

josephdviviano Feb 25, 2025

saleml Feb 28, 2025

josephdviviano Feb 25, 2025

saleml Feb 28, 2025

josephdviviano Feb 25, 2025

josephdviviano Feb 27, 2025

saleml Feb 28, 2025

josephdviviano Feb 27, 2025

saleml Feb 28, 2025

josephdviviano Feb 27, 2025

josephdviviano Feb 27, 2025

josephdviviano Feb 27, 2025

saleml Feb 28, 2025

josephdviviano Feb 27, 2025

josephdviviano Feb 27, 2025

saleml commented Feb 28, 2025

josephdviviano left a comment

josephdviviano Feb 28, 2025

		# We know this is safe because PFBasedGFlowNet's loss accepts these arguments
		return self.loss(env, training_samples, recalculate_all_logprobs=True)

Complete Type Safety: Eliminating All Pyright Errors #245

Complete Type Safety: Eliminating All Pyright Errors #245

Conversation

saleml commented Feb 21, 2025 • edited Loading

🎯 Complete Type Safety: Eliminating All Pyright Errors

🌟 Major Achievement

🏗️ Key Architectural Improvements

1. New Type-Safe Containers 📦

2. Enhanced Type Safety in Core Components ⚡

3. Configuration & Quality Assurance 🛠️

💫 Impact & Importance

Why This Matters

Strategic Timing ⏰

🔄 Next Steps

🎓 Technical Details

🚀 Call to Action

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saleml commented Feb 26, 2025

hyeok9855 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephdviviano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saleml commented Feb 28, 2025

josephdviviano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saleml commented Feb 21, 2025 •

edited

Loading