Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial refactoring for stateless_llama.py #213

Closed
wants to merge 8 commits into from

Conversation

renxida
Copy link
Contributor

@renxida renxida commented Dec 1, 2023

This partially addresses #184 #185




def run_vmfb_comparison(args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just remove run_vmfb_comparison from stateless llama to a completely separate runner that just takes a vmfb and runs it? Also have a way to disable and enable the torch comparison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, on it!

@@ -2,6 +2,8 @@
import sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately I think we just want stateless_llama to only contain llama specific code like the StateUpdateModule class and for now json_schema, and have the rest as separate more generic llm exporter code using the model_builder.py to generate IR/vmfb, and llm vmfb runner. Also we need to clean out the examples dir that has old pathing referenced here #207

@renxida
Copy link
Contributor Author

renxida commented Dec 4, 2023

Also thinking about including an easy command to be able to chat with LLAMA. Since that's what people would try when they approach a new LLM anyway. @IanNod what do you think of this as a first task after i get the refactoring done?

@renxida
Copy link
Contributor Author

renxida commented Dec 4, 2023

Reading exporter.py and model_builder.py, it sounds like I need to:

  • break compile_hf_transformer_model() down further and put as much as I can into model_builder.py and exporter.py instead.
  • make stateless_llama do the following:
    • generate & run VMFB by default (potentially making a chat interface as default - - people LOVE chat)
    • when a special argument is supplied, run the VMFB & torch comparison.
  • maybe remove compile_hf_transformer_model entirely

Comment on lines 436 to 451
"--iree-input-type=torch",
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
"--mlir-print-debuginfo",
"--mlir-print-op-on-diagnostic=false",
"--iree-llvmcpu-target-cpu-features=host",
"--iree-llvmcpu-target-triple=x86_64-linux-gnu",
"--iree-llvmcpu-enable-microkernels",
"--iree-llvmcpu-stack-allocation-limit=256000",
"--iree-stream-resource-index-bits=64",
"--iree-vm-target-index-bits=64",
"--iree-vm-bytecode-module-strip-source-map=true",
"--iree-util-zero-fill-elided-attrs",
"--iree-vm-target-truncate-unsupported-floats",
"--iree-codegen-check-ir-before-llvm-conversion=false",
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
"--iree-opt-const-expr-hoisting=False",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these flags are something I copy pasted offhandedly from a vicuna.py custom compile command in a google doc somewhere, I don't think they're a canonical solid list to just make a default for all compiled models in turbine. Some of them are probably fine but probably worth looking through.

)
with open(path, "wb+") as f:
f.write(flatbuffer_blob)
print("saved to ", path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a logger


flatbuffer_blob = ireec.compile_str(
module_str,
target_backends=["llvm-cpu"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the target backend should probably be a function argument


return all_pkv_tensors

def export_llama(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put any llama specific code here. This should be for general exporting only. If we want to make abstractions for what we did in the stateless_llama model (i.e. export pkvs as a global) we can do that, but it shouldn't be llama specific and probably warrants its own py file somewhere.




# TODO (Dan): replace this with a file once I figure out paths on windows exe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to not exist more than once in the whole codebase, and could come from its own python file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as opposed to json file, which is harder to bundle in an exe)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make a python runner wrapper that can do both compile/inference from one cli invocation. Bash scripts aren't os agnostic

@dan-garvey
Copy link
Member

Reading exporter.py and model_builder.py, it sounds like I need to:

  • break compile_hf_transformer_model() down further and put as much as I can into model_builder.py and exporter.py instead.

  • make stateless_llama do the following:

    • generate & run VMFB by default (potentially making a chat interface as default - - people LOVE chat)
    • when a special argument is supplied, run the VMFB & torch comparison.
  • maybe remove compile_hf_transformer_model entirely

Can see some of my comments, but broadly nothing model specific should go into exporter.py

Chat functionality already exists in SHARK, which imports this, you can add code to do it here too, but it may be unmercenary.

@renxida
Copy link
Contributor Author

renxida commented Dec 5, 2023

Current plan:

  • move llama specific code out of exporter and into turbine_models/custom_models/export_stateless_llama.py
  • continue to reuse as much code as possible from shark_turbine.aot.compiled_module and exporter.py to export
  • look at compilation flags
  • make iree backend in compiled_module.save_vmfb a parameter
  • use loggers to replace print
  • i already made a CLI chat interface for LLAMA. Will make that the default when somebody runs python stateless_llama.py
  • make separate vmfb comparison entrypoint

@renxida renxida force-pushed the refactor-stateless_llama.py branch from 50c8032 to 7999ddb Compare December 6, 2023 16:55
@renxida renxida force-pushed the refactor-stateless_llama.py branch from f478670 to a827607 Compare December 6, 2023 17:47
@renxida
Copy link
Contributor Author

renxida commented Dec 6, 2023

All of this would be easier to do once we have some end to end tests. I've taken notes on Dan & Ian's comments & will open another PR later.

@renxida renxida closed this Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants