Popular repositories Loading
-
refusal_direction
refusal_direction PublicCode and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
-
-
CircuitsVis
CircuitsVis PublicForked from TransformerLensOrg/CircuitsVis
Mechanistic Interpretability Visualizations using React
Jupyter Notebook
-
-
SycophancySteering
SycophancySteering PublicForked from nrimsky/CAA
Modulating sycophancy in llama-2 via activation steering
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.