-
Notifications
You must be signed in to change notification settings - Fork 26
Docs/historical versions #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
fdalvi
merged 35 commits into
fdalvi:docs/historic_versions
from
yifan:docs/historical_versions
Aug 29, 2022
Merged
Docs/historical versions #31
fdalvi
merged 35 commits into
fdalvi:docs/historic_versions
from
yifan:docs/historical_versions
Aug 29, 2022
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixed buggy test where evaluation tensor would have more classes than training tensor, resulting in an out of bounds failure. This only happened intermittently as the data was randomly generated.
return_predictions now returns actual predictions instead of the gold labels.
Loading a huggingface model assumed a tokenizer with the same name was present. This is not required anymore.
Activation saving code is now part of its own module in data.writer instead of being part of the extractor. Added tests for writer as well.
While creating tensors for use in downstream probes, the API now allows for automatically binarizing the dataset (so there are only two labels in the resulting tensors). Additional, this commit also implements multiclass and binary class datasets while creating the tensors.
Added Contribution guidelines and minor documentation fixes.
This commit implements the Probeless method introduced by the following paper: Antverg, Omer and Belinkov, Yonatan "On The Pitfalls Of Analyzing Idividual Neurons in Language Models." ICLR'22
Several options were being ignored when transformers_extractor was used as a script. This commit fixes this, and a bug which caused activations of the wrong shape to be saved during decomposition.
This commit adds support for annotating raw text with binary labels depending on presence of words from a given vocab, a regex filter or an arbitrary function that takes a word and returns a boolean label.
Training a linear regression probe resulted in a SyntaxError because of an incorrect parameter name. This commit fixes this and adds some tests around the same functionality.
- Moved all scripts to `scripts` folder - Dependencies are now defined in `setup.cfg` instead of conda/pip requirements.txt - Dev dependencies are now defined in `setup.cfg` - Updated contribution guidelines and installation instructions - Switched test runner from `green` to `pytest` - Updated GitHub actions runner
Data for control tasks can now be created using the functions in `neurox.data.contraol_task`. Detailed commit history before squash: * control task module * control task example in notebook * test class for ct prep * seq labeling case sensitivity * code formatting * example description in NB * typo * moved ct to data package * rename method, rm dead code * adapt asserts in tests * reorder/rename method params and return value
`get_top_words` hardcoded the threshold to deem a "word" to be relevant to a neuron. This commit makes the threshold an additional argument the user can specify when using the function.
This commit implements low-precision activation extraction, saving and loading, which helps with storage space as well as saving/loading times. * optional dtypes dtype can explicitly specified in all extraction, writing and loading, probe training and probe eval. Also, probes are trained with mixed precision * fix default x_dtype in create tensor - broken test, probably broken during merge * more efficient dtype assignment during extraction * rm dtype from write_activations in writer * rename x_dtype to dtype * adapt linear_probe - rm mixed precision - convert probe to float() in evaluate_probe if necessary. No inplace operation, creates copy of the object * give dtype to JSON writer even if it is not needed * no autocast and mixed prec. in training * clarify method comment * typo/format * rm special case for different writers * always evaluate in float32 regardless
All toolkit and test code has been formatted with `ufmt` to enforce consistency in the codebase and future commits.
* Added GitHub action to check code formatting * Fix action yaml * Introduce formatting error to test GH action * Revert "Introduce formatting error to test GH action" This reverts commit 96c5795.
This commit implements the Intersection over Union method to rank neurons against various target labels introduced by the following paper: Mu, J., & Andreas, J. (2020). Compositional explanations of neurons. Advances in Neural Information Processing Systems, 33, 17153-17163. Detailed commit history * add iou probe * add iou probe * iou probing * format * modify testing * add comments
This commit implements the Gaussian method for probing neurons against various target labels introduced by the following paper: Lucas Torroba Hennigen, Adina Williams, and Ryan Cotterell. Intrinsic probing through dimension selection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 197–216, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.15.a Detailed commit history: * gaussian probe * gaussian probe * format code * modift * modify * modify * changes on Gaussian
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request injects readthedocs to historical commits, enables readthedocs documentation for historical versions