Ml for synthesis prototype - Refactoring #260

ryukzak · 2023-12-29T19:06:11Z

To: https://github.com/ryukzak/nitta/pull/167/files

Changes overview:

Add Makefile with key project actions (build, test, lint, format).
Reorganise CI flow. Separate ml and ui jobs from nitta build/test.
Add popover with all scores to subforest screen.
Bump stackage snapshot, to be hls-powered.
Some haskell related refactoring.
Update tensorflow from 2.12 to 2.13. Add mac os specific dependencies.
Add hack to specify NITTA_RUN_COMMAND from env variable.

TODOs:

Docker image for development: add make target to multi-platform build and push docker to docker-hub
Replace black via ruff autoformatter in CI, Makefile, and dev-docker-image
Rename score field in NodeView to defScore (UI, Haskell, maybe ML backend)
Add score selector to subForest screen.
Change CLI argument port type from Int with magic -1 to Maybe Int
Push ML scores to STM during synthesis process (NITTA.Synthesis.Explore:subForestIO).

subForestIO
    BackendCtx{nodeScores, mlBackendGetter}
    tree@Tree{sSubForestVar} = do
        (firstTime, subForest) <-
            atomically $
                tryReadTMVar sSubForestVar >>= \case
                    Just subForest -> return (False, subForest)
                    Nothing -> do
                        subForest <- exploreSubForestVar tree
                        putTMVar sSubForestVar subForest
                        return (True, subForest)

        when firstTime $ traceProcessedNode tree

        -- FIXME: ML scores are evaluated here every time subForestIO is called. how to cache it like the default score? IO in STM isn't possible.
        -- also it looks inelegant, is there a way to refactor it?
        let modelNames = mapMaybe (T.stripPrefix mlScoreKeyPrefix) nodeScores
        if
            | null subForest -> return subForest
            | null nodeScores -> return subForest
            | null modelNames -> return subForest
            | otherwise -> do
                MlBackendServer{baseUrl} <- mlBackendGetter
                case baseUrl of
                    Nothing -> return subForest
                    Just mlBackendBaseUrl -> do
                        -- (addMlScoreToSubforestSkipErrorsIO subForestAccum modelName) gets called for each modelName
                        foldM (addMlScoreToSubforestSkipErrorsIO mlBackendBaseUrl) subForest modelNames

Dockerfile with dev containers and other related stuff (vscode workspace) should be relocated from ml/synthesis to a separate dir, perhaps.
Currently ML if the config is invalid, the new evaluate script will fail with generic errors (TypeError, KeyError, etc.). It would be useful to write a JSON Schema describing evaluation configs. It will help to provide relevant field descriptions in editors and to implement a config validation in the script itself.
Simplify environment setup and nitta -- MLBackend integration. It shouldn't have such complex, like it happens in Makefile:ml-nitta
During the model training, if we use completely different synthesis trees for validation dataset instead of a random split over all shuffled training data, the validation loss and mae will be significantly worse. It means that the current model doesn't generalize well to other synthesis trees, although synthesis may still succeed, as it was shown. Improving those new validation metrics is crucial for further ML synthesis development. Need more input node parameters? Adjusting random descent in MCTS-like style too to fix negative labels share? UPD: some related work was done and yielded significant results in regards to fixing this, but mode's generalization abilities still require improvement.
Add tensorflow version check during dev-image build (should be the same with pyproject.toml)
I think the best solution here is to add NITTA_ML_BACKEND_RUN_COMMAND to provide more control. But I move it to the new issue because:
Ml for synthesis prototype - Refactoring #260 (comment)

…red)

iburakov

Mostly minor stuff, yet important enough.

Suggested new TODOs:

write all make docker-dev-* commands and update the ml/synthesis/README.md

iburakov · 2024-01-06T12:59:30Z

src/NITTA/Synthesis/Method.hs

-This used to be (VarValTime v x t, UnitTag tag). See below for more info.
-}
-type SynthesisMethodConstraints tag v x t = (VarValTimeJSON v x t, ToJSON tag, UnitTag tag)
-
 -- FIXME: Validate the type above, its usages and meaning in the context of changes described below.


This comment should've been either moved with the SynthesisMethodConstraints or removed

I don't sure is I answer to this question or not, but the main purpose of explicit type signatures here is: https://ryukzak.github.io/nitta/haddock/nitta/NITTA-Synthesis-Method.html

Without type alias it transform in a mess.

Comment deleted

I see, got it. Yes, it's an answer, thank you.

It's a shame this type pinning caused me to spend another ~3-4 hours on debugging/leaning when ToJSON constraints had to be added. Compiler errors looked completely cryptic to a Haskell junior that was completely unaware those constraints exist :D

Probably, I would've stumbled upon them somewhere else anyway.

README.md

Makefile

ml/synthesis/src/components/data_crawling/nitta/nitta_running.py

ml/synthesis/pyproject.toml

ml/synthesis/Dockerfile

Makefile

Co-authored-by: Ilya Burakov <speedwatson@gmail.com>

…n@gmail.com>

iburakov

Almost there

Makefile

iburakov · 2024-01-07T15:02:09Z

ml/synthesis/src/consts.py

@@ -49,3 +49,6 @@ def _find_root_dir():
 MODELS_DIR = Path(_models_dir_env) if _models_dir_env else ML_SYNTHESIS_DIR / "models"

 ML_BACKEND_BASE_URL_FILEPATH = ".ml_backend_base_url"
+
+# high priority env var which overrides command provided in the config file
+NITTA_RUN_COMMAND_OVERRIDE = os.environ.get("NITTA_RUN_COMMAND", None)


used env var name (NITTA_RUN_COMMAND) doesn't match

nitta/.github/workflows/ci.yml

Lines 246 to 247 in ecd35f9

env:

NITTA_RUN_COMMAND_OVERRIDE: nitta

also it may be better to do it like

class EnvVarNames: MODELS_DIR = "NITTA_ML_SYNTHESIS_MODELS_DIR" NITTA_RUN_COMMAND = "NITTA_RUN_COMMAND" ... NITTA_RUN_COMMAND_OVERRIDE = os.environ.get(EnvVarNames.NITTA_RUN_COMMAND, None)

it's consistent with the

_models_dir_env = os.environ.get(EnvVarNames.MODELS_DIR)

Don't fully understand the idea behind EnvVarNames class, but consistency is an argument. Thank you!

The idea is quite boring, EnvVarNames.MODELS_DIR had to be reused in tests:

nitta/ml/synthesis/src/tests/test_smoke.py

Lines 80 to 86 in ffa9302

async with run_nitta_server(

EXAMPLES_DIR / "fibonacci.lua",

nitta_args=f'--score="ml_{model_name}" --method=NoSynthesis -e',

env={EnvVarNames.MODELS_DIR: tmp_models_dir.resolve()},

) as nitta:

ml_scores = await _get_scores(await nitta.get_base_url())

assert non_ml_scores != ml_scores

So I preferred to do it this way instead of hardcoding :)
Class is just for namespacing.

Regardless, I think it's good to have a single place where all the env var names (that python modules understand and use) are listed together, so they're not scattered across the code as literals. If there ever will be a need to generate docs for them, it'll make it simpler.

Makefile

ml/synthesis/pyproject.toml

…f-fix Fix Tensorflow deps for Windows

Co-authored-by: Ilya Burakov <speedwatson@gmail.com>

ryukzak added 3 commits December 28, 2023 02:05

Bump stackage snapshot (for ghc-9.2.8 compatible with latest hls-powe…

513ecd0

…red)

Improve CLI documentation.

7fec9b2

Web/UI safe upgrades

6117beb

ryukzak self-assigned this Dec 29, 2023

ryukzak changed the base branch from master to ml-for-synthesis-prototype December 29, 2023 19:06

ryukzak added 2 commits December 29, 2023 23:35

Refactoring haskell code

3f66495

Bump tensorflow and add macos specific dependencies for ml.

44f471f

ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from 0a05375 to a8843d9 Compare December 30, 2023 13:01

Add scores in popover in subforest table.

71a01cb

ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from a8843d9 to 3add8bd Compare December 30, 2023 16:10

ryukzak requested a review from iburakov December 30, 2023 16:20

Rework CI and add Makefile

de72eb9

ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from 3add8bd to de72eb9 Compare December 30, 2023 16:20

ryukzak changed the title ~~Draft: Ml for synthesis prototype - Refactoring~~ Ml for synthesis prototype - Refactoring Dec 30, 2023

ryukzak mentioned this pull request Dec 30, 2023

Add ML for synthesis prototype to repository #167

Merged

ryukzak added 5 commits January 5, 2024 15:54

Update root README file: add Makefile info, add ML setup comments.

f0b91ab

Bump node version in docker container for dev deps.

fdd7ae6

Refactoring. Move type helpers to *.Types modules.

69aef37

Fix CI/Makefile conflict

174ad21

Add linux specific deps for tensorflow. Bump py deps.

0636193

ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from 9ebb5c2 to 0d5d1e6 Compare January 6, 2024 01:39

Fix Makefile: build-prod, ML_MODEL search. Add docker-dev-{build,run}

3945599

ryukzak force-pushed the ml-for-synthesis-prototype-wip branch from 0d5d1e6 to 3945599 Compare January 6, 2024 01:41

iburakov requested changes Jan 6, 2024

View reviewed changes

ryukzak and others added 6 commits January 6, 2024 20:40

Update README.md

f85e126

Co-authored-by: Ilya Burakov <speedwatson@gmail.com>

Update README.md

3601755

Co-authored-by: Ilya Burakov <speedwatson@gmail.com>

Update TENSORFLOW_VER in dev image

aeb2c04

Co-authored-by: Ilya Burakov <speedwatson@gmail.com>

Update ml/synthesis/src/components/data_crawling/nitta/nitta_running.py

f0941f3

Co-authored-by: Ilya Burakov <speedwatson@gmail.com>

Add tensorflow/poetry configuration source

c2d1de1

Apply suggestions and small refactoring from Ilya Burakov <speedwatso…

f268fb1

…n@gmail.com>

Extend Makefile to support all docker-dev-image related targets.

a93a304

ryukzak requested a review from iburakov January 7, 2024 14:33

ryukzak added 2 commits January 7, 2024 15:35

Fix formatting.

83c68ab

Remove .stack-work cache (weeder find removed files in cache and fail)

ecd35f9

iburakov requested changes Jan 7, 2024

View reviewed changes

iburakov and others added 4 commits January 7, 2024 21:16

Clarify/update comments about TENSORFLOW_VER

b2135a6

Fix the tf deps in pyproject.toml for native (cpu-only) Windows

8e976f5

Sync the poetry.lock

6d73b06

Merge pull request #261 from ryukzak/ml-for-synthesis-prototype-wip-t…

5cd88ac

…f-fix Fix Tensorflow deps for Windows

ryukzak mentioned this pull request Jan 7, 2024

Extract todos to issues. #262

Open

ryukzak and others added 5 commits January 7, 2024 23:22

Fix Makefile: shell interaction for docker-build

47c7497

Co-authored-by: Ilya Burakov <speedwatson@gmail.com>

Improve Makefile:ml-clean

8a421b5

Refactoring: store envvar name in EnvVarNames

ffa9302

Fix Makefile: shell interaction for docker-build (part 2)

64edcc4

Fix formatting

8306454

iburakov approved these changes Jan 8, 2024

View reviewed changes

ryukzak merged commit 5966a6b into ml-for-synthesis-prototype Jan 8, 2024
13 checks passed

ryukzak deleted the ml-for-synthesis-prototype-wip branch January 8, 2024 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ml for synthesis prototype - Refactoring #260

Ml for synthesis prototype - Refactoring #260

ryukzak commented Dec 29, 2023 •

edited

Loading

iburakov left a comment

iburakov Jan 6, 2024

ryukzak Jan 6, 2024 •

edited

Loading

iburakov Jan 8, 2024 •

edited

Loading

iburakov left a comment

iburakov Jan 7, 2024

iburakov Jan 7, 2024

ryukzak Jan 7, 2024

iburakov Jan 8, 2024

	async with run_nitta_server(
	EXAMPLES_DIR / "fibonacci.lua",
	nitta_args=f'--score="ml_{model_name}" --method=NoSynthesis -e',
	env={EnvVarNames.MODELS_DIR: tmp_models_dir.resolve()},
	) as nitta:
	ml_scores = await _get_scores(await nitta.get_base_url())
	assert non_ml_scores != ml_scores

Ml for synthesis prototype - Refactoring #260

Ml for synthesis prototype - Refactoring #260

Conversation

ryukzak commented Dec 29, 2023 • edited Loading

iburakov left a comment

Choose a reason for hiding this comment

iburakov Jan 6, 2024

Choose a reason for hiding this comment

ryukzak Jan 6, 2024 • edited Loading

Choose a reason for hiding this comment

iburakov Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

iburakov left a comment

Choose a reason for hiding this comment

iburakov Jan 7, 2024

Choose a reason for hiding this comment

iburakov Jan 7, 2024

Choose a reason for hiding this comment

ryukzak Jan 7, 2024

Choose a reason for hiding this comment

iburakov Jan 8, 2024

Choose a reason for hiding this comment

ryukzak commented Dec 29, 2023 •

edited

Loading

ryukzak Jan 6, 2024 •

edited

Loading

iburakov Jan 8, 2024 •

edited

Loading