Skip to content

Conversation

@bryce13950
Copy link

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

bryce13950 and others added 30 commits November 15, 2023 23:46
This reverts commit beb014e.
Remove the pytorch versioning fix as this has been solved with the latest pytorch version. Also format with even better toml so that the pyproject is easier to read.
…sformerLensOrg#477)

* Fixing numerical issues

* Added qwen lol

* setup local

* allclose

* Added qwen

* Cleaned up implementation

* removed untested models

* Cleaned up implementation

removed untested models

* commented untested models

* formatting

* fixed mem issues + trust_remote_code

* formatting

* merge

* Force rerun checks

---------

Co-authored-by: Andy Arditi <andyrdt@gmail.com>
* Add a function to convert nanogpt weights

* Remove need for bias parameter
* Add Support for CodeLlama-7b

* Reformat

---------

Co-authored-by: Neel Nanda <neelnanda27@gmail.com>
---------

Co-authored-by: Alan <41682961+alan-cooney@users.noreply.github.com>
collingray and others added 30 commits April 3, 2024 02:08
* add LlamaForCausalLM arch. parsing and 01-ai/Yi

* fix attn bias dim error

* fix attn dim error... again

* add chat models

* format

* add sentencepiece for yi-chat tokenizers

* update poetry.lock

* update gqa comment

* update poetry.lock

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* make cspell not mad

* add new init methods

Add in kaiming, xavier, and (incomplete) MuP initializations

* Various small typo, comments, and bugfixes

* tests for inits

* more cspell edits so it's happy

* run black with default -l 88

* fix to make docs compile properly

* accidently is not a word, whoops
* chore: fixing type errors and enabling mypy

* updated pyproject

* fixing typing after merging updates

* fixed correct typing for float

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* add moe config options

* bump transformers version, needed for hf mixtral

* add architecture config

* add moe component, no hooks yet

* add convert_mixtral_weights

* formatting

* fix convert_mixtral_weights

* fixes

* rename moe state_dict names

* add multi-gpu fixes by @coolvision

* fix einsum

* fix moe forward pass

* cap mixtral context, model working

* disable ln folding for moe (for now)

* update htconfig docstring with moe options

* formatting

* add benchmarker to test_hooked_transformer

* add moe gate and chosen expert hooks

* formatting

* add moe dtype warning

* add special cases page to docs

* formatting

* fix missing .cfg

* fix doc heading level, add desc. to moe hook points

* fix formatting

* fix new mypy errors

* fix mypy issues for real this time

* rename moe gate hook names

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
…ings (TransformerLensOrg#538)

* Update black line length to 100

* run black with -l 100

* edit contributing.md to include new line length

* add black -l 100 to .vscode for convenience

* fixed merge saving error

* fixed merge issue in params

* ran format

* ran format on tests

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* Refactor hook_points

* restored remaining refactor

* ran format

* added partial registering again

* restored prepend

* added type comment again

* fixed spacing

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* qkv initial fix

* add test and update BertBlock

* formatting changes

* fix flaky gqa test

* move helper function to utils

* ran reformat

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* fixed install version and key name

* fixed remaining issues with no position experiment

* removed extra key
* fixed install version and key name

* fixed remaining issues with no position experiment

* removed extra key

* fixed othello in colab
* added optional token to transfomers loading

* added secret for make docs command

* ran format

* added gated models instructions

* rearranged env setting

* moved hf token

* added temporary log

* changed secret reference

* changed env variable reference

* changed token reference

* changed back to secrets reference

* removed microsoft models from remote code list

* updated token again
* Start work on adding llama.

* Remove v2 from arxiv URL.

* Remove llama special case (breaks because hf_config is not defined).

* Remove TODO.

llama-2-70b-hf and Llama 3 models all have n_key_value_heads set so
they'll use Grouped-Query Attention.

* Add back check for non-hf-hosted models.

* Hardcode Llama-3 configs.

See discussion on TransformerLensOrg#549
for why.

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* working demo of 4bit quantized Llama

* add memory info to the demo

* cleanup, asserts for quantization

* hooks reading/writing

* test in colab; do not import Int8Params

* add some comments

* format; fix optional argument use

* merge with main

* format

* ran format

* locked attribution patching to 1.1.1

* fixed demo for current colab

* minor typing fixes for mypy

* fixing typing issue

* removing extra W_Q W_O

* ignored merge artifacts & push for proper CI run

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: hannamw <mh2parker@gmail.com>
* removed deuplicate rearrange block

* removed duplicate variables

* fixed param name
* revised demo testing to check all demos

* separated demos

* changed demo test order

* rearranged test order

* updated attribution patching to run differnt code in github

* rearranged tests

* updated header

* updated grokking demo

* updated bert for testing

* updated bert demo

* ran cells

* removed github check

* removed cells to skip

* ignored output of loading cells

* removed other tests
* implement HookedSAETransformer

* clean up imports

* apply format

* only recompute error if use_error_term

* add tests

* run format

* fix import

* match to hooks API

* improve doc strings

* improve demo

* address Arthur feedback

* try to fix indent:

* try to fix indent again

* change doc code block
* reworked CI to publish code coverage report

* added coverage report to docs

* added support for python 3.12 and removed extra steps on legacy versions of python

* moved main check back to python 3.11

* removed coverage flag

* moved download command

* fixed name

* specified file name

* removed link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.