[Comments Thread] Interfaces for Explaining Transformer Language Models - Part 1 #9

jalammar · 2020-12-16T13:55:11Z

jalammar
Dec 16, 2020
Maintainer

I would love your feedback on the article, the series, and on the library!

eamoore · 2020-12-25T15:28:41Z

eamoore
Dec 25, 2020

The visualization in "Factorizing Activations of a Single Layer" is stunning. Interpretation of the roles of activation factors makes sense. The layers are clusterings of neurons by activation value. One contribution of this work is to map neuron activation inside transformer-based language models; other work has looked at that for other deep neural networks.

As a reader, I am not familiar with the foundations of this work - how Transformers work, how BERT works - so understanding the references is essential to understanding how the system implemented here actually works. The references here look really good and I appreciate the way the are presented. The explanation is quite parsimonious for a reader like me, but it's fair to say I am not the intended reader for this particular paper. It's clear there's much more to come.

I am familiar with LIME and SHAP and the idea of an interpretable surrogate model. Is it fair to say that Ecco builds a surrogate view by interrogating a trained transformer model to build these clusters of activation values? How are input tokens mapped to to factors? You are looking at activation values for that specfic input? Are the clusters stable across many inputs?

Input saliency reminds me of statistical sensitivity analysis. I am interested in policy simulation through Structural Causal Models with stochastic inputs. SCMs can be implemented as trained NN. Naively, transformers look like a generative, multi stage models. So, thats's something for me to learn more about. I wonder, though, if you see any potential application of this work to explaining the actions of agents in an RL setting, eg what factors of a policy or a sensed environment are "salient" in terms generating an observed system state?

I could not render the notebooks via the web. Maybe that's a colab thing. It would be helpful if pdf renderings of the notebooks were available.

1 reply

jalammar Dec 26, 2020
Maintainer Author

Is it fair to say that Ecco builds a surrogate view by interrogating a trained transformer model to build these clusters of activation values?

Surrogate models approximate an underlying model. In our case, the activations are actually those of the model itself.

How are input tokens mapped to to factors?

Ecco collects the activations in matrix of dimensions (layer, neuron, position) -- where position is the sequence of tokens in both input and output treated as one sequence. Before factoring, we merge all the layers (or the ones specified by the user) so the dimensions are (neuron, position). We present that to the NMF model as such:

You are looking at activation values for that specfic input?

Yes.

Are the clusters stable across many inputs?

This remains to be investigated. I'm really curious and wondering about the best way to quantify this.

I wonder, though, if you see any potential application of this work to explaining the actions of agents in an RL setting, eg what factors of a policy or a sensed environment are "salient" in terms generating an observed system state?

Saliency is commonly used in computer vision and NLP classification. I didn't come across many examples of it being used for sequences of predictions -- natural language generation in this case, and potentially agents in an RL setting. I would think that saliency of this kind is a useful addition to the toolbox of a developer or researcher trying to understand the behavior of a model or RL agent. We're still not at a stage where they shed conclusive light on the behavior of the model, but they're one way to probe the blackbox that could be useful in certain cases.

I could not render the notebooks via the web. Maybe that's a colab thing. It would be helpful if pdf renderings of the notebooks were available.

Did you try opening the Colab links? The jupyter notebooks will not render the javascript, unfortunately, but clicking the colab links at the top should send you to a page that works (and where you can change the inputs).

Thank you so much for your feedback and great questions @eamoore!

LopezGG · 2021-08-03T15:13:38Z

LopezGG
Aug 3, 2021

Hi Jay,
I am big fan of your work on simplifying and understanding the techniques. Thank you for this amazing work. A quick question , is this library only for GPT based models. I see

lm = ecco.from_pretrained('bert-base-uncased', activations=True)

in one of your notebooks. Is there some documentation on how I can add my custom bert based model to ecco to visualize it.

2 replies

pbanach Aug 9, 2021

Hello @jalammar,
+1 on the question above, is there/will there be a way to visualize features of your own fine-tuned model ?
Many thanks already for the amazing work!
Kind regards

jalammar Aug 11, 2021
Maintainer Author

Hi @LopezGG! Thank you so much!

Ecco supports BERT and RoBERTa in addition to GPT. Some features (like input saliency), are GPT only, though.

I'll note your and @pbanach's request for loading custom models and put that on the top of the requested features list. I'd love to know which visualization feature you're each interested in seeing first. Neuron activations? Something else?

animeshseemendra · 2022-02-11T07:20:29Z

animeshseemendra
Feb 11, 2022

Hi, I want to experiment with Input Saliency Maps, but the notebook is not accessible now. Can you help?

0 replies

Victordmz · 2023-06-28T17:04:01Z

Victordmz
Jun 28, 2023

Do you have an idea for the reason why in NMF there always is a factor of neurons that intently focus on the first token in the sequence, and only on that token?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Comments Thread] Interfaces for Explaining Transformer Language Models - Part 1 #9

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

[Comments Thread] Interfaces for Explaining Transformer Language Models - Part 1 #9

jalammar Dec 16, 2020 Maintainer

Replies: 4 comments · 3 replies

eamoore Dec 25, 2020

jalammar Dec 26, 2020 Maintainer Author

LopezGG Aug 3, 2021

pbanach Aug 9, 2021

jalammar Aug 11, 2021 Maintainer Author

animeshseemendra Feb 11, 2022

Victordmz Jun 28, 2023

jalammar
Dec 16, 2020
Maintainer

Replies: 4 comments 3 replies

eamoore
Dec 25, 2020

jalammar Dec 26, 2020
Maintainer Author

LopezGG
Aug 3, 2021

jalammar Aug 11, 2021
Maintainer Author

animeshseemendra
Feb 11, 2022

Victordmz
Jun 28, 2023