The expert AI interpretability lab

Empowering AI Engineers to embed explanations into production AI models, without retraining. Unprecedented accuracy for recovery of true prediction drivers in real time.

Accuracy of explainability

Benchmark using highly heterogeneous data (~32K samples and ~1K features)

89%

OuterProduct

20%

SHAP

Current industry standard for accuracy

8%

Integrated Gradients

Accuracy of explainability

Benchmark using highly heterogeneous data (~32K samples and ~1K features)

89%

OuterProduct

20%

SHAP

Current industry standard for accuracy

8%

Integrated Gradients

Accuracy of explainability

Benchmark using highly heterogeneous data (~32K samples and ~1K features)

89%

OuterProduct

20%

SHAP

Current industry standard for accuracy

8%

Integrated Gradients

OuterProduct sets the new standard for explainability of AI models.

We are the creators of the first accurate, real-time AI explainability engine.

Foundational research

Logo for the journal Science

Science · Feb 19, 2026

Steering and Monitoring of AI Models

Daniel Beaglehole, Adit Radhakrishnan, Enric Boix-Adserà, Mikhail Belkin

Artificial intelligence (AI) models contain much of human knowledge. Understanding the representation of this knowledge will lead to improvements in model capabilities and safeguards. Building on advances in feature learning, we developed an approach for extracting linear representations of semantic notions or concepts in AI models. We showed how these representations enabled model steering, through which we exposed vulnerabilities and improved model capabilities. We demonstrated that concept representations were transferable across languages and enabled multiconcept steering. Across hundreds of concepts, we found that larger models were more steerable and that steering improved model capabilities beyond prompting. We showed that concept representations were more effective for monitoring misaligned content than for using judge models. Our results illustrate the power of internal representations for advancing AI safety and model capabilities.


Logo for the journal Science

Science · Mar 7, 2024

Feature Learning in Neural Networks

Adit Radhakrishnan, Daniel Beaglehole, Parthe Pandit, Mikhail Belkin

Understanding how neural networks learn features, or relevant patterns in data, for prediction is necessary for their reliable use in technological and scientific applications. In this work, we presented a unifying mathematical mechanism, known as average gradient outer product (AGOP), that characterized feature learning in neural networks. We provided empirical evidence that AGOP captured features learned by various neural network architectures, including transformer-based language models, convolutional networks, multilayer perceptrons, and recurrent neural networks. Moreover, we demonstrated that AGOP, which is backpropagation-free, enabled feature learning in machine learning models, such as kernel machines, that a priori could not identify task-specific features. Overall, we established a fundamental mechanism that captured feature learning in neural networks and enabled feature learning in general machine learning models.

Logo for the journal Science

Science · Feb 19, 2026

Steering and Monitoring of AI Models

Daniel Beaglehole, Adit Radhakrishnan, Enric Boix-Adserà, Mikhail Belkin

Artificial intelligence (AI) models contain much of human knowledge. Understanding the representation of this knowledge will lead to improvements in model capabilities and safeguards. Building on advances in feature learning, we developed an approach for extracting linear representations of semantic notions or concepts in AI models. We showed how these representations enabled model steering, through which we exposed vulnerabilities and improved model capabilities. We demonstrated that concept representations were transferable across languages and enabled multiconcept steering. Across hundreds of concepts, we found that larger models were more steerable and that steering improved model capabilities beyond prompting. We showed that concept representations were more effective for monitoring misaligned content than for using judge models. Our results illustrate the power of internal representations for advancing AI safety and model capabilities.


Logo for the journal Science

Science · Mar 7, 2024

Feature Learning in Neural Networks

Adit Radhakrishnan, Daniel Beaglehole, Parthe Pandit, Mikhail Belkin

Understanding how neural networks learn features, or relevant patterns in data, for prediction is necessary for their reliable use in technological and scientific applications. In this work, we presented a unifying mathematical mechanism, known as average gradient outer product (AGOP), that characterized feature learning in neural networks. We provided empirical evidence that AGOP captured features learned by various neural network architectures, including transformer-based language models, convolutional networks, multilayer perceptrons, and recurrent neural networks. Moreover, we demonstrated that AGOP, which is backpropagation-free, enabled feature learning in machine learning models, such as kernel machines, that a priori could not identify task-specific features. Overall, we established a fundamental mechanism that captured feature learning in neural networks and enabled feature learning in general machine learning models.

Logo for the journal Science

Science · Feb 19, 2026

Steering and Monitoring of AI Models

Daniel Beaglehole, Adit Radhakrishnan, Enric Boix-Adserà, Mikhail Belkin

Artificial intelligence (AI) models contain much of human knowledge. Understanding the representation of this knowledge will lead to improvements in model capabilities and safeguards. Building on advances in feature learning, we developed an approach for extracting linear representations of semantic notions or concepts in AI models. We showed how these representations enabled model steering, through which we exposed vulnerabilities and improved model capabilities. We demonstrated that concept representations were transferable across languages and enabled multiconcept steering. Across hundreds of concepts, we found that larger models were more steerable and that steering improved model capabilities beyond prompting. We showed that concept representations were more effective for monitoring misaligned content than for using judge models. Our results illustrate the power of internal representations for advancing AI safety and model capabilities.


Logo for the journal Science

Science · Mar 7, 2024

Feature Learning in Neural Networks

Adit Radhakrishnan, Daniel Beaglehole, Parthe Pandit, Mikhail Belkin

Understanding how neural networks learn features, or relevant patterns in data, for prediction is necessary for their reliable use in technological and scientific applications. In this work, we presented a unifying mathematical mechanism, known as average gradient outer product (AGOP), that characterized feature learning in neural networks. We provided empirical evidence that AGOP captured features learned by various neural network architectures, including transformer-based language models, convolutional networks, multilayer perceptrons, and recurrent neural networks. Moreover, we demonstrated that AGOP, which is backpropagation-free, enabled feature learning in machine learning models, such as kernel machines, that a priori could not identify task-specific features. Overall, we established a fundamental mechanism that captured feature learning in neural networks and enabled feature learning in general machine learning models.

Launch AI Reasoning Apps today

© OuterProduct 2026

Launch AI Reasoning Apps today

© OuterProduct 2026