Refactor components #6

bryce13950 · 2023-11-21T22:42:41Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

This reverts commit 91a5712.

This reverts commit cecf93e.

This reverts commit 9d8e91a.

This reverts commit ef4518b.

This reverts commit beb014e.

Remove the pytorch versioning fix as this has been solved with the latest pytorch version. Also format with even better toml so that the pyproject is easier to read.

…sformerLensOrg#477) * Fixing numerical issues * Added qwen lol * setup local * allclose * Added qwen * Cleaned up implementation * removed untested models * Cleaned up implementation removed untested models * commented untested models * formatting * fixed mem issues + trust_remote_code * formatting * merge * Force rerun checks --------- Co-authored-by: Andy Arditi <andyrdt@gmail.com>

* Add a function to convert nanogpt weights * Remove need for bias parameter

* Add Support for CodeLlama-7b * Reformat --------- Co-authored-by: Neel Nanda <neelnanda27@gmail.com>

--------- Co-authored-by: Alan <41682961+alan-cooney@users.noreply.github.com>

* add LlamaForCausalLM arch. parsing and 01-ai/Yi * fix attn bias dim error * fix attn dim error... again * add chat models * format * add sentencepiece for yi-chat tokenizers * update poetry.lock * update gqa comment * update poetry.lock --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

…g#533)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* make cspell not mad * add new init methods Add in kaiming, xavier, and (incomplete) MuP initializations * Various small typo, comments, and bugfixes * tests for inits * more cspell edits so it's happy * run black with default -l 88 * fix to make docs compile properly * accidently is not a word, whoops

* chore: fixing type errors and enabling mypy * updated pyproject * fixing typing after merging updates * fixed correct typing for float --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

@coolvision

* add moe config options * bump transformers version, needed for hf mixtral * add architecture config * add moe component, no hooks yet * add convert_mixtral_weights * formatting * fix convert_mixtral_weights * fixes * rename moe state_dict names * add multi-gpu fixes by @coolvision * fix einsum * fix moe forward pass * cap mixtral context, model working * disable ln folding for moe (for now) * update htconfig docstring with moe options * formatting * add benchmarker to test_hooked_transformer * add moe gate and chosen expert hooks * formatting * add moe dtype warning * add special cases page to docs * formatting * fix missing .cfg * fix doc heading level, add desc. to moe hook points * fix formatting * fix new mypy errors * fix mypy issues for real this time * rename moe gate hook names --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

…ings (TransformerLensOrg#538) * Update black line length to 100 * run black with -l 100 * edit contributing.md to include new line length * add black -l 100 to .vscode for convenience * fixed merge saving error * fixed merge issue in params * ran format * ran format on tests --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Refactor hook_points * restored remaining refactor * ran format * added partial registering again * restored prepend * added type comment again * fixed spacing --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* qkv initial fix * add test and update BertBlock * formatting changes * fix flaky gqa test * move helper function to utils * ran reformat --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* fixed install version and key name * fixed remaining issues with no position experiment * removed extra key

* fixed install version and key name * fixed remaining issues with no position experiment * removed extra key * fixed othello in colab

* added optional token to transfomers loading * added secret for make docs command * ran format * added gated models instructions * rearranged env setting * moved hf token * added temporary log * changed secret reference * changed env variable reference * changed token reference * changed back to secrets reference * removed microsoft models from remote code list * updated token again

* Start work on adding llama. * Remove v2 from arxiv URL. * Remove llama special case (breaks because hf_config is not defined). * Remove TODO. llama-2-70b-hf and Llama 3 models all have n_key_value_heads set so they'll use Grouped-Query Attention. * Add back check for non-hf-hosted models. * Hardcode Llama-3 configs. See discussion on TransformerLensOrg#549 for why. --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* working demo of 4bit quantized Llama * add memory info to the demo * cleanup, asserts for quantization * hooks reading/writing * test in colab; do not import Int8Params * add some comments * format; fix optional argument use * merge with main * format * ran format * locked attribution patching to 1.1.1 * fixed demo for current colab * minor typing fixes for mypy * fixing typing issue * removing extra W_Q W_O * ignored merge artifacts & push for proper CI run --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: hannamw <mh2parker@gmail.com>

* removed deuplicate rearrange block * removed duplicate variables * fixed param name

* revised demo testing to check all demos * separated demos * changed demo test order * rearranged test order * updated attribution patching to run differnt code in github * rearranged tests * updated header * updated grokking demo * updated bert for testing * updated bert demo * ran cells * removed github check * removed cells to skip * ignored output of loading cells * removed other tests

* implement HookedSAETransformer * clean up imports * apply format * only recompute error if use_error_term * add tests * run format * fix import * match to hooks API * improve doc strings * improve demo * address Arthur feedback * try to fix indent: * try to fix indent again * change doc code block

* reworked CI to publish code coverage report * added coverage report to docs * added support for python 3.12 and removed extra steps on legacy versions of python * moved main check back to python 3.11 * removed coverage flag * moved download command * fixed name * specified file name * removed link

bryce13950 and others added 30 commits November 15, 2023 23:46

finished reworking initial components

0c7bbc4

reworked components a bit

444fb32

Merge branch 'main' into refactor-components

d16f3bf

fixed cached data setting

68c0c15

ran format

11105de

reformatted components section

7ba755f

reverted attention changes

91a5712

reverted layer norm changes

cecf93e

reverted mlp changes

9d8e91a

Revert "reverted attention changes"

b9a9ba5

This reverts commit 91a5712.

Revert "reverted layer norm changes"

da0acf3

This reverts commit cecf93e.

Revert "reverted mlp changes"

10d609e

This reverts commit 9d8e91a.

removed some model tests

ef4518b

Revert "removed some model tests"

6cfb08f

This reverts commit ef4518b.

removed model tests

8832b7d

added model back

502fe65

lowered accuracy

beb014e

Revert "lowered accuracy"

a7ca7ea

This reverts commit beb014e.

added model back to test loop

fbf03a6

reverted accuracy change

1c7a8bd

added proper headers

42c195a

Merge branch 'main' into refactor-components

a6d3603

Clean up project config (TransformerLensOrg#463)

ce82675

Remove the pytorch versioning fix as this has been solved with the latest pytorch version. Also format with even better toml so that the pyproject is easier to read.

added import again

af99428

updated attention component

11f2088

added new line

98486bd

Add a function to convert nanogpt weights (TransformerLensOrg#475)

5754a0b

* Add a function to convert nanogpt weights * Remove need for bias parameter

Add support for CodeLlama-7b (TransformerLensOrg#469)

535fadf

* Add Support for CodeLlama-7b * Reformat --------- Co-authored-by: Neel Nanda <neelnanda27@gmail.com>

Make LLaMA 2 loadable directly from HF (TransformerLensOrg#458)

33222e5

--------- Co-authored-by: Alan <41682961+alan-cooney@users.noreply.github.com>

collingray and others added 30 commits April 3, 2024 02:08

updated docs to account for additional test suites (TransformerLensOr…

de9a70b

…g#533)

bugfix subscripted generics (TransformerLensOrg#534)

bae7977

Fix platform markers (TransformerLensOrg#510)

f052f39

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

chore: fixing type errors and enabling mypy (TransformerLensOrg#516)

afe7900

* chore: fixing type errors and enabling mypy * updated pyproject * fixing typing after merging updates * fixed correct typing for float --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

Refactor hook_points (TransformerLensOrg#505)

b156ce1

* Refactor hook_points * restored remaining refactor * ran format * added partial registering again * restored prepend * added type comment again * fixed spacing --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

Fix split_qkv_input for grouped query attention (TransformerLensOrg#520)

1553e81

* qkv initial fix * add test and update BertBlock * formatting changes * fix flaky gqa test * move helper function to utils * ran reformat --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

locked attribution patching to 1.1.1 (TransformerLensOrg#541)

3aaac2e

Demo no position fix (TransformerLensOrg#544)

d2415b4

* fixed install version and key name * fixed remaining issues with no position experiment * removed extra key

Othello colab fix (TransformerLensOrg#545)

f22a406

* fixed install version and key name * fixed remaining issues with no position experiment * removed extra key * fixed othello in colab

fixed demo for current colab (TransformerLensOrg#546)

65df48b

Fixed device being set to cpu:0 instead of cpu (TransformerLensOrg#551)

d8270c8

removed deuplicate rearrange block (TransformerLensOrg#555)

6cd64d5

* removed deuplicate rearrange block * removed duplicate variables * fixed param name

Merge branch 'main' into refactor-components

c0f6729

removed import

07e8f38

cleaned imports

517fab5

ran black

4053079

updated doc string

2d1ee77

finished fixing docstring

38b4dda

fixed mypi error

e2e0578

Merge branch 'dev' into refactor-components

3d0a87a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor components #6

Refactor components #6

Uh oh!

bryce13950 commented Nov 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Refactor components #6

Are you sure you want to change the base?

Refactor components #6

Uh oh!

Conversation

bryce13950 commented Nov 21, 2023

Description

Type of change

Screenshots

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants