Language models can explain neurons in language models
We use GPT-4 to routinely write explanations for the habits of neurons in massive language models and to attain these explanations. We launch a dataset of those (imperfect) explanations and scores for each neuron in GPT-2.