Docs/historical versions #31

yifan · 2022-08-28T17:13:27Z

This pull request injects readthedocs to historical commits, enables readthedocs documentation for historical versions

Fixed buggy test where evaluation tensor would have more classes than training tensor, resulting in an out of bounds failure. This only happened intermittently as the data was randomly generated.

return_predictions now returns actual predictions instead of the gold labels.

Fixes Issue #6

…vations

Loading a huggingface model assumed a tokenizer with the same name was present. This is not required anymore.

Activation saving code is now part of its own module in data.writer instead of being part of the extractor. Added tests for writer as well.

…ctivation files

While creating tensors for use in downstream probes, the API now allows for automatically binarizing the dataset (so there are only two labels in the resulting tensors). Additional, this commit also implements multiclass and binary class datasets while creating the tensors.

Added Contribution guidelines and minor documentation fixes.

This commit implements the Probeless method introduced by the following paper: Antverg, Omer and Belinkov, Yonatan "On The Pitfalls Of Analyzing Idividual Neurons in Language Models." ICLR'22

Several options were being ignored when transformers_extractor was used as a script. This commit fixes this, and a bug which caused activations of the wrong shape to be saved during decomposition.

This commit adds support for annotating raw text with binary labels depending on presence of words from a given vocab, a regex filter or an arbitrary function that takes a word and returns a boolean label.

Training a linear regression probe resulted in a SyntaxError because of an incorrect parameter name. This commit fixes this and adds some tests around the same functionality.

- Moved all scripts to `scripts` folder - Dependencies are now defined in `setup.cfg` instead of conda/pip requirements.txt - Dev dependencies are now defined in `setup.cfg` - Updated contribution guidelines and installation instructions - Switched test runner from `green` to `pytest` - Updated GitHub actions runner

Data for control tasks can now be created using the functions in `neurox.data.contraol_task`. Detailed commit history before squash: * control task module * control task example in notebook * test class for ct prep * seq labeling case sensitivity * code formatting * example description in NB * typo * moved ct to data package * rename method, rm dead code * adapt asserts in tests * reorder/rename method params and return value

`get_top_words` hardcoded the threshold to deem a "word" to be relevant to a neuron. This commit makes the threshold an additional argument the user can specify when using the function.

This commit implements low-precision activation extraction, saving and loading, which helps with storage space as well as saving/loading times. * optional dtypes dtype can explicitly specified in all extraction, writing and loading, probe training and probe eval. Also, probes are trained with mixed precision * fix default x_dtype in create tensor - broken test, probably broken during merge * more efficient dtype assignment during extraction * rm dtype from write_activations in writer * rename x_dtype to dtype * adapt linear_probe - rm mixed precision - convert probe to float() in evaluate_probe if necessary. No inplace operation, creates copy of the object * give dtype to JSON writer even if it is not needed * no autocast and mixed prec. in training * clarify method comment * typo/format * rm special case for different writers * always evaluate in float32 regardless

All toolkit and test code has been formatted with `ufmt` to enforce consistency in the codebase and future commits.

* Added GitHub action to check code formatting * Fix action yaml * Introduce formatting error to test GH action * Revert "Introduce formatting error to test GH action" This reverts commit 96c5795.

This commit implements the Intersection over Union method to rank neurons against various target labels introduced by the following paper: Mu, J., & Andreas, J. (2020). Compositional explanations of neurons. Advances in Neural Information Processing Systems, 33, 17153-17163. Detailed commit history * add iou probe * add iou probe * iou probing * format * modify testing * add comments

This commit implements the Gaussian method for probing neurons against various target labels introduced by the following paper: Lucas Torroba Hennigen, Adina Williams, and Ryan Cotterell. Intrinsic probing through dimension selection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 197–216, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.15.a Detailed commit history: * gaussian probe * gaussian probe * format code * modift * modify * modify * changes on Gaussian

Yifan Zhang and others added 30 commits August 28, 2022 14:01

readthedocs

77eac32

Fixed buggy test

d091b86

Fixed buggy test where evaluation tensor would have more classes than training tensor, resulting in an out of bounds failure. This only happened intermittently as the data was randomly generated.

Fixes Issue #5

5750e55

return_predictions now returns actual predictions instead of the gold labels.

Fixed conda dependencies for linux

b51460c

Added corpus analysis module

bf08950

Refactored extraction code to avoid global cache.

69571ac

Fixes Issue #6

Added functionality to perform filtering when visualizing neuron acti…

47b9a5a

…vations

Tokenizer can now be specified separately

25624d0

Loading a huggingface model assumed a tokenizer with the same name was present. This is not required anymore.

Updated test suite for transformers model/tokenizer loader

9c74721

Added activations writer class.

ac3705f

Activation saving code is now part of its own module in data.writer instead of being part of the extractor. Added tests for writer as well.

Added options to decompose and filter specific layers into separate a…

cb4f6f5

…ctivation files

Added generic activation writer tests

0546542

Moved writer options to writer module

3a4ba80

Added writer documentation and code formatting

0213c5c

Improved documentation.

447f1ed

Added Contribution guidelines and minor documentation fixes.

Implemented Probeless method (#12)

bf9d84f

This commit implements the Probeless method introduced by the following paper: Antverg, Omer and Belinkov, Yonatan "On The Pitfalls Of Analyzing Idividual Neurons in Language Models." ICLR'22

Documentation syntax fixes

a16442b

Bugfixes in decomposition during layer extraction

9b6e161

Several options were being ignored when transformers_extractor was used as a script. This commit fixes this, and a bug which caused activations of the wrong shape to be saved during decomposition.

Binary annotation and filtering (#8)

3357667

This commit adds support for annotating raw text with binary labels depending on presence of words from a given vocab, a regex filter or an arbitrary function that takes a word and returns a boolean label.

Linear regression probe bugfix

e4e3736

Training a linear regression probe resulted in a SyntaxError because of an incorrect parameter name. This commit fixes this and adds some tests around the same functionality.

Release 1.0.9

c36f5b7

Update CHANGELOG.md

e54ad5c

Fixed typo in GitHub workflow

581d26b

Added scripts for code formatting

8740d90

Updated contributing guidelines with dev environment information

3fa2a77

Add minimum threshold argument in get_top_words (#19)

1c67ef8

`get_top_words` hardcoded the threshold to deem a "word" to be relevant to a neuron. This commit makes the threshold an additional argument the user can specify when using the function.

fdalvi and others added 5 commits August 28, 2022 16:16

Show missing lines in test coverage summary

4705a1f

Formatted all code with ufmt

46c1627

All toolkit and test code has been formatted with `ufmt` to enforce consistency in the codebase and future commits.

Added GitHub action to check code formatting (#23)

8457d67

* Added GitHub action to check code formatting * Fix action yaml * Introduce formatting error to test GH action * Revert "Introduce formatting error to test GH action" This reverts commit 96c5795.

fdalvi merged commit 291ab81 into fdalvi:docs/historic_versions Aug 29, 2022

yifan deleted the docs/historical_versions branch August 29, 2022 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docs/historical versions #31

Docs/historical versions #31

Uh oh!

yifan commented Aug 28, 2022 •

edited

Loading

Uh oh!

Uh oh!

Docs/historical versions #31

Docs/historical versions #31

Uh oh!

Conversation

yifan commented Aug 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yifan commented Aug 28, 2022 •

edited

Loading