Initial refactoring for `stateless_llama.py` #213

renxida · 2023-12-01T01:29:16Z

This partially addresses #184 #185

IanNod · 2023-12-01T18:48:50Z

python/turbine_models/custom_models/stateless_llama.py

+
+
+
+def run_vmfb_comparison(args):


can you just remove run_vmfb_comparison from stateless llama to a completely separate runner that just takes a vmfb and runs it? Also have a way to disable and enable the torch comparison

Good idea, on it!

IanNod · 2023-12-01T18:53:08Z

python/turbine_models/custom_models/stateless_llama.py

@@ -2,6 +2,8 @@
 import sys


Ultimately I think we just want stateless_llama to only contain llama specific code like the StateUpdateModule class and for now json_schema, and have the rest as separate more generic llm exporter code using the model_builder.py to generate IR/vmfb, and llm vmfb runner. Also we need to clean out the examples dir that has old pathing referenced here #207

renxida · 2023-12-04T15:10:31Z

Also thinking about including an easy command to be able to chat with LLAMA. Since that's what people would try when they approach a new LLM anyway. @IanNod what do you think of this as a first task after i get the refactoring done?

renxida · 2023-12-04T15:34:36Z

Reading exporter.py and model_builder.py, it sounds like I need to:

break compile_hf_transformer_model() down further and put as much as I can into model_builder.py and exporter.py instead.
make stateless_llama do the following:
- generate & run VMFB by default (potentially making a chat interface as default - - people LOVE chat)
- when a special argument is supplied, run the VMFB & torch comparison.
maybe remove compile_hf_transformer_model entirely

dan-garvey · 2023-12-04T18:04:59Z

python/shark_turbine/aot/compiled_module.py

+        "--iree-input-type=torch",
+        "--iree-vm-bytecode-module-output-format=flatbuffer-binary",
+        "--mlir-print-debuginfo",
+        "--mlir-print-op-on-diagnostic=false",
+        "--iree-llvmcpu-target-cpu-features=host",
+        "--iree-llvmcpu-target-triple=x86_64-linux-gnu",
+        "--iree-llvmcpu-enable-microkernels",
+        "--iree-llvmcpu-stack-allocation-limit=256000",
+        "--iree-stream-resource-index-bits=64",
+        "--iree-vm-target-index-bits=64",
+        "--iree-vm-bytecode-module-strip-source-map=true",
+        "--iree-util-zero-fill-elided-attrs",
+        "--iree-vm-target-truncate-unsupported-floats",
+        "--iree-codegen-check-ir-before-llvm-conversion=false",
+        "--iree-vm-bytecode-module-output-format=flatbuffer-binary",
+        "--iree-opt-const-expr-hoisting=False",


So these flags are something I copy pasted offhandedly from a vicuna.py custom compile command in a google doc somewhere, I don't think they're a canonical solid list to just make a default for all compiled models in turbine. Some of them are probably fine but probably worth looking through.

dan-garvey · 2023-12-04T18:05:37Z

python/shark_turbine/aot/compiled_module.py

+        )
+        with open(path, "wb+") as f:
+            f.write(flatbuffer_blob)
+        print("saved to ", path)


use a logger

dan-garvey · 2023-12-04T18:06:10Z

python/shark_turbine/aot/compiled_module.py

+
+        flatbuffer_blob = ireec.compile_str(
+            module_str,
+            target_backends=["llvm-cpu"],


the target backend should probably be a function argument

dan-garvey · 2023-12-04T18:08:19Z

python/shark_turbine/aot/exporter.py

+
+    return all_pkv_tensors
+
+def export_llama(


Don't put any llama specific code here. This should be for general exporting only. If we want to make abstractions for what we did in the stateless_llama model (i.e. export pkvs as a global) we can do that, but it shouldn't be llama specific and probably warrants its own py file somewhere.

dan-garvey · 2023-12-04T18:09:36Z

python/turbine_models/custom_models/compile_hf_transformer_model.py

+
+
+
+# TODO (Dan): replace this with a file once I figure out paths on windows exe


This needs to not exist more than once in the whole codebase, and could come from its own python file

(as opposed to json file, which is harder to bundle in an exe)

dan-garvey · 2023-12-04T18:11:24Z

python/turbine_models/custom_models/stateless_llama_demo.sh

Lets make a python runner wrapper that can do both compile/inference from one cli invocation. Bash scripts aren't os agnostic

dan-garvey · 2023-12-04T18:13:53Z

Reading exporter.py and model_builder.py, it sounds like I need to:

break compile_hf_transformer_model() down further and put as much as I can into model_builder.py and exporter.py instead.

make stateless_llama do the following:

generate & run VMFB by default (potentially making a chat interface as default - - people LOVE chat)

when a special argument is supplied, run the VMFB & torch comparison.

maybe remove compile_hf_transformer_model entirely

Can see some of my comments, but broadly nothing model specific should go into exporter.py

Chat functionality already exists in SHARK, which imports this, you can add code to do it here too, but it may be unmercenary.

renxida · 2023-12-05T18:21:36Z

Current plan:

move llama specific code out of exporter and into turbine_models/custom_models/export_stateless_llama.py
continue to reuse as much code as possible from shark_turbine.aot.compiled_module and exporter.py to export
look at compilation flags
make iree backend in compiled_module.save_vmfb a parameter
use loggers to replace print
i already made a CLI chat interface for LLAMA. Will make that the default when somebody runs python stateless_llama.py
make separate vmfb comparison entrypoint

…hat was actually checked out

…stateless_llama.py

renxida · 2023-12-06T19:46:49Z

All of this would be easier to do once we have some end to end tests. I've taken notes on Dan & Ian's comments & will open another PR later.

IanNod reviewed Dec 1, 2023

View reviewed changes

dan-garvey requested changes Dec 4, 2023

View reviewed changes

add tests

7999ddb

renxida force-pushed the refactor-stateless_llama.py branch from 50c8032 to 7999ddb Compare December 6, 2023 16:55

Xida Ren added 5 commits December 6, 2023 17:04

modify self hosted runner

1f39383

remove install python phase that fails

a72d1ea

separate tests and give placeholder for chat test

cb0becd

remove cd for vmfb comparison test so that it would run on the repo t…

5aca135

…hat was actually checked out

correct typo

a827607

renxida force-pushed the refactor-stateless_llama.py branch from f478670 to a827607 Compare December 6, 2023 17:47

Xida Ren added 2 commits December 6, 2023 18:12

Merge branch 'main' of github.com:nod-ai/SHARK-Turbine into refactor-…

1e326e6

…stateless_llama.py

fix problem in merged code

98dbe9a

renxida closed this Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial refactoring for `stateless_llama.py` #213

Initial refactoring for `stateless_llama.py` #213

renxida commented Dec 1, 2023

IanNod Dec 1, 2023

renxida Dec 4, 2023

IanNod Dec 1, 2023

renxida commented Dec 4, 2023

renxida commented Dec 4, 2023 •

edited

Loading

dan-garvey Dec 4, 2023

dan-garvey Dec 4, 2023

dan-garvey Dec 4, 2023

dan-garvey Dec 4, 2023

dan-garvey Dec 4, 2023

dan-garvey Dec 4, 2023

dan-garvey Dec 4, 2023

dan-garvey commented Dec 4, 2023

renxida commented Dec 5, 2023

renxida commented Dec 6, 2023




		# TODO (Dan): replace this with a file once I figure out paths on windows exe




		def run_vmfb_comparison(args):

Initial refactoring for stateless_llama.py #213

Initial refactoring for stateless_llama.py #213

Conversation

renxida commented Dec 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

renxida commented Dec 4, 2023

renxida commented Dec 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dan-garvey commented Dec 4, 2023

renxida commented Dec 5, 2023

renxida commented Dec 6, 2023

Initial refactoring for `stateless_llama.py` #213

Initial refactoring for `stateless_llama.py` #213

renxida commented Dec 4, 2023 •

edited

Loading