How to obtain vectors for activation capping when reproducing figures

Hello, authors.

Thank you for sharing your wonderful work.

I am attempting to reproduce results and would like to apply **_activation capping_**.

First, I have a question regarding **how to extract the vectors required for** **_activation capping_**. (e.g., {'vector: 'layer_55/contrast_role_pos3_default1', 'cap': 117.0) I would actually like to check steered and unsteered responses.

Second, I would like to know if there is a guide or notebook on the overall method for **reproducing Figure 11-13**. Due to resource constraints, I am using Qwen3-0.6B model instead of models used in the paper.

Thanks,
Jeewoo Sul

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to obtain vectors for activation capping when reproducing figures #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to obtain vectors for activation capping when reproducing figures #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions