Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ml for synthesis prototype - Refactoring #260

Merged
merged 31 commits into from
Jan 8, 2024

Conversation

ryukzak
Copy link
Owner

@ryukzak ryukzak commented Dec 29, 2023

To: https://github.com/ryukzak/nitta/pull/167/files

Changes overview:

  • Add Makefile with key project actions (build, test, lint, format).
  • Reorganise CI flow. Separate ml and ui jobs from nitta build/test.
  • Add popover with all scores to subforest screen.
  • Bump stackage snapshot, to be hls-powered.
  • Some haskell related refactoring.
  • Update tensorflow from 2.12 to 2.13. Add mac os specific dependencies.
  • Add hack to specify NITTA_RUN_COMMAND from env variable.

TODOs:

  • Docker image for development: add make target to multi-platform build and push docker to docker-hub
  • Replace black via ruff autoformatter in CI, Makefile, and dev-docker-image
  • Rename score field in NodeView to defScore (UI, Haskell, maybe ML backend)
  • Add score selector to subForest screen.
  • Change CLI argument port type from Int with magic -1 to Maybe Int
  • Push ML scores to STM during synthesis process (NITTA.Synthesis.Explore:subForestIO).
subForestIO
    BackendCtx{nodeScores, mlBackendGetter}
    tree@Tree{sSubForestVar} = do
        (firstTime, subForest) <-
            atomically $
                tryReadTMVar sSubForestVar >>= \case
                    Just subForest -> return (False, subForest)
                    Nothing -> do
                        subForest <- exploreSubForestVar tree
                        putTMVar sSubForestVar subForest
                        return (True, subForest)

        when firstTime $ traceProcessedNode tree

        -- FIXME: ML scores are evaluated here every time subForestIO is called. how to cache it like the default score? IO in STM isn't possible.
        -- also it looks inelegant, is there a way to refactor it?
        let modelNames = mapMaybe (T.stripPrefix mlScoreKeyPrefix) nodeScores
        if
            | null subForest -> return subForest
            | null nodeScores -> return subForest
            | null modelNames -> return subForest
            | otherwise -> do
                MlBackendServer{baseUrl} <- mlBackendGetter
                case baseUrl of
                    Nothing -> return subForest
                    Just mlBackendBaseUrl -> do
                        -- (addMlScoreToSubforestSkipErrorsIO subForestAccum modelName) gets called for each modelName
                        foldM (addMlScoreToSubforestSkipErrorsIO mlBackendBaseUrl) subForest modelNames
  • Dockerfile with dev containers and other related stuff (vscode workspace) should be relocated from ml/synthesis to a separate dir, perhaps.
  • Currently ML if the config is invalid, the new evaluate script will fail with generic errors (TypeError, KeyError, etc.). It would be useful to write a JSON Schema describing evaluation configs. It will help to provide relevant field descriptions in editors and to implement a config validation in the script itself.
  • Simplify environment setup and nitta -- MLBackend integration. It shouldn't have such complex, like it happens in Makefile:ml-nitta
  • During the model training, if we use completely different synthesis trees for validation dataset instead of a random split over all shuffled training data, the validation loss and mae will be significantly worse. It means that the current model doesn't generalize well to other synthesis trees, although synthesis may still succeed, as it was shown. Improving those new validation metrics is crucial for further ML synthesis development. Need more input node parameters? Adjusting random descent in MCTS-like style too to fix negative labels share? UPD: some related work was done and yielded significant results in regards to fixing this, but mode's generalization abilities still require improvement.
  • Add tensorflow version check during dev-image build (should be the same with pyproject.toml)
  • I think the best solution here is to add NITTA_ML_BACKEND_RUN_COMMAND to provide more control. But I move it to the new issue because:
  • Ml for synthesis prototype - Refactoring #260 (comment)

@ryukzak ryukzak self-assigned this Dec 29, 2023
@ryukzak ryukzak changed the base branch from master to ml-for-synthesis-prototype December 29, 2023 19:06
@ryukzak ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from 0a05375 to a8843d9 Compare December 30, 2023 13:01
@ryukzak ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from a8843d9 to 3add8bd Compare December 30, 2023 16:10
@ryukzak ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from 3add8bd to de72eb9 Compare December 30, 2023 16:20
@ryukzak ryukzak changed the title Draft: Ml for synthesis prototype - Refactoring Ml for synthesis prototype - Refactoring Dec 30, 2023
@ryukzak ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from 9ebb5c2 to 0d5d1e6 Compare January 6, 2024 01:39
@ryukzak ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from 0d5d1e6 to 3945599 Compare January 6, 2024 01:41
Copy link
Collaborator

@iburakov iburakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor stuff, yet important enough.

Suggested new TODOs:

  • write all make docker-dev-* commands and update the ml/synthesis/README.md

This used to be (VarValTime v x t, UnitTag tag). See below for more info.
-}
type SynthesisMethodConstraints tag v x t = (VarValTimeJSON v x t, ToJSON tag, UnitTag tag)

-- FIXME: Validate the type above, its usages and meaning in the context of changes described below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should've been either moved with the SynthesisMethodConstraints or removed

Copy link
Owner Author

@ryukzak ryukzak Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't sure is I answer to this question or not, but the main purpose of explicit type signatures here is: https://ryukzak.github.io/nitta/haddock/nitta/NITTA-Synthesis-Method.html

Without type alias it transform in a mess.

Comment deleted

Copy link
Collaborator

@iburakov iburakov Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, got it. Yes, it's an answer, thank you.

It's a shame this type pinning caused me to spend another ~3-4 hours on debugging/leaning when ToJSON constraints had to be added. Compiler errors looked completely cryptic to a Haskell junior that was completely unaware those constraints exist :D

Probably, I would've stumbled upon them somewhere else anyway.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
ml/synthesis/pyproject.toml Outdated Show resolved Hide resolved
ml/synthesis/Dockerfile Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
ryukzak and others added 6 commits January 6, 2024 20:40
Co-authored-by: Ilya Burakov <speedwatson@gmail.com>
Co-authored-by: Ilya Burakov <speedwatson@gmail.com>
Co-authored-by: Ilya Burakov <speedwatson@gmail.com>
Co-authored-by: Ilya Burakov <speedwatson@gmail.com>
@ryukzak ryukzak requested a review from iburakov January 7, 2024 14:33
Copy link
Collaborator

@iburakov iburakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there

Makefile Show resolved Hide resolved
@@ -49,3 +49,6 @@ def _find_root_dir():
MODELS_DIR = Path(_models_dir_env) if _models_dir_env else ML_SYNTHESIS_DIR / "models"

ML_BACKEND_BASE_URL_FILEPATH = ".ml_backend_base_url"

# high priority env var which overrides command provided in the config file
NITTA_RUN_COMMAND_OVERRIDE = os.environ.get("NITTA_RUN_COMMAND", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used env var name (NITTA_RUN_COMMAND) doesn't match

env:
NITTA_RUN_COMMAND_OVERRIDE: nitta

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it may be better to do it like

class EnvVarNames:
    MODELS_DIR = "NITTA_ML_SYNTHESIS_MODELS_DIR"
    NITTA_RUN_COMMAND = "NITTA_RUN_COMMAND"

...

NITTA_RUN_COMMAND_OVERRIDE = os.environ.get(EnvVarNames.NITTA_RUN_COMMAND, None)

it's consistent with the

_models_dir_env = os.environ.get(EnvVarNames.MODELS_DIR)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't fully understand the idea behind EnvVarNames class, but consistency is an argument. Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is quite boring, EnvVarNames.MODELS_DIR had to be reused in tests:

async with run_nitta_server(
EXAMPLES_DIR / "fibonacci.lua",
nitta_args=f'--score="ml_{model_name}" --method=NoSynthesis -e',
env={EnvVarNames.MODELS_DIR: tmp_models_dir.resolve()},
) as nitta:
ml_scores = await _get_scores(await nitta.get_base_url())
assert non_ml_scores != ml_scores

So I preferred to do it this way instead of hardcoding :)
Class is just for namespacing.

Regardless, I think it's good to have a single place where all the env var names (that python modules understand and use) are listed together, so they're not scattered across the code as literals. If there ever will be a need to generate docs for them, it'll make it simpler.

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
ml/synthesis/pyproject.toml Outdated Show resolved Hide resolved
@ryukzak ryukzak merged commit 5966a6b into ml-for-synthesis-prototype Jan 8, 2024
13 checks passed
@ryukzak ryukzak deleted the ml-for-synthesis-prototype-wip branch January 8, 2024 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants