Verifiable Reasoning with Scenarios

OuterProduct Labs

Quick Summary


  • OuterProduct introduces scenarios to reason over scoring models.

  • Scenarios provide verifiable reasoning for model scores by giving intervention strategies that are guaranteed to flip scores.  

  • Scenarios give agents context for traceable decisioning and recommendations. 


Introduction

The key to deploying effective, trustable AI agents in any enterprise is giving them a means of verifying their reasoning.  AI agents have been incredibly successful in domains like coding and math where their reasoning can be automatically verified for correctness.  For example, coding agents (Claude Code, GPT Codex) have revolutionized coding since they can iteratively test and update code until it runs successfully.  

How do we bring verifiable reasoning to enterprise applications built on structured data and scoring models? 

In this post, we introduce scenarios, a new capability of our reasoning engine that unlocks verifiable reasoning for enterprise processes that use scoring models (underwriting, compliance, fraud, marketing, sales, experimentation, etc.)

Given any model score, scenarios give minimal intervention strategies on input attributes that are guaranteed to change the model score to a specified level.  As a result, scenarios enable verifiable reasoning for scoring models.  For example, consider modern underwriting systems, which contain scoring models for approval or denial.  Suppose a model score suggested denying an applicant a new credit card.  Scenarios would surface reasons for denial that are fully justifiable: intervening on corresponding attributes would flip the model score to the approval range.

Scenarios enable intuitive and verifiable reasoning, very much aligned with human reasoning.  For example, upon recommending denial for an applicant, an underwriter might justify their decision by pointing out that the applicant had too much utilization across credit cards.  The underwriter has implicitly defined a scenario since the applicant likely would have been approved if they had lower utilization.

Scenarios give AI agents the ability to reason over scoring models and develop trustable, targeted recommendations.  Returning to the underwriting example above, agents can reason over millions of model scores to surface emerging lending trends and suggest new rules and policies that improve coverage while mitigating risk.

Examples of verifiable reasoning from scenarios 

Scenarios enable verifiable reasoning across a number of enterprise applications beyond underwriting described above.  

Take for example a fraud analyst who must manually review potentially fraudulent cases within minutes. Instead of receiving a score plus an overwhelming amount of information, scenarios would provide concise, justifiable reasons for the alert (e.g., the score would decrease to a non-alert range if the counterparty had at least one prior transaction with the account in the past 90 days).  

As another example, consider a relationship manager who wants to know what counteroffer would keep a customer onboard.  Knowing that the customer is likely to churn is not enough to inform an intelligent retention strategy.  It would be far more useful to know that (a) waiving a maintenance fee for two cycles or (b) raising an overdraft limit are guaranteed to move the customer below the churn threshold.

Scenarios provide these direct, targeted strategies. 

OuterProduct scenarios in action

To make the above examples more concrete, we showcase what scenarios discover on real data.   

We started with two datasets for evaluating scenarios.  The first dataset was from LendingClub, containing ~40,000 Peer-to-Peer (P2P) loans with over 100 attributes. The goal for this dataset was to predict whether a loan will be fully paid or not. The second dataset was the Give Me Some Credit (GMSC) dataset, containing over 100,000 real anonymized credit records and 10 attributes. The goal for this dataset was to predict whether the applicant will have a serious delinquency (over 90 days late on a payment) within 2 years.

For each dataset, we first built scoring models for evaluating the risk of default (both models achieved ~80% accuracy).  We then computed scenarios for each score from our models (which took a matter of seconds to compute across all the data).  

For the smaller LendingClub dataset, scenarios revealed that ~20% of model scores could be changed by modifying one attribute, typically the income, loan purpose, or debt-to-income ratio of the applicant.  On the other hand, for the more complex GMSC dataset, almost all scenarios (~92%) involved diverse combinations of two or more attributes.  As an example, for one applicant, scenarios revealed that the model score would flip if the applicant both increased their monthly income by $1.8k and decreased their debt burden by 39%. In both of these examples, scenarios provide direct, verifiable reasoning for why the model assigned a high risk score to a given applicant or loan.

Constraining attributes to enable actionable scenarios

In addition to providing verifiable reasoning for model scores, scenarios can also be used to generate recommendations for improving decisioning.  For example, in sales, an agent could use scenarios on top of lead scoring models to build, evaluate, and recommend campaign strategies for improving lead conversion.

Yet, to actually generate actionable recommendations, scenarios need to respect real-world constraints.  For lending applications, increasing an applicant's age by 70 years or decreasing the requested loan amount to a negative value to approve a loan would be unrealistic.  OuterProduct’s reasoning engine enables customization of attribute ranges, unchangeable attributes (such as age), and allowable directions for changing an attribute (credit history can only increase).  By enforcing these realistic constraints, scenarios power recommendations that are guaranteed to be both verifiable and actionable.

Scenarios enable verifiable agentic workflows

AI agents excel in environments where they are able to verify and iteratively refine their outputs.  To date, verification has only existed in specific domains like coding (for example, agents can write, test, and refine code until it runs). 

Scenarios, the latest addition to the OuterProduct reasoning engine, unlock verifiable reasoning for any enterprise application running on scoring models and structured data.  

With scenarios, agents are able to reason over model scores and generate recommendations for continual product improvement, all in a verifiable, trusted manner.