Why don't patent AI vendors publish accuracy benchmarks?

The most charitable explanation is that the industry is young and benchmark standards do not yet exist. The less charitable, and more likely, explanation is that vendors have tested their tools internally and found the results insufficiently impressive to publish. A vendor sitting on great accuracy data would publish it. The silence is informative.

Is the Stanford hallucination study applicable to patent AI?

Yes. The Magesh et al. study tested legal AI tools on citation accuracy, the same fundamental capability required for patent prosecution AI. Patent prosecution requires accurate citation of prior art, MPEP sections, and case law. If major legal AI tools hallucinate 17 to 33% of legal citations, patent-specific tools face the same underlying challenge unless they implement architectural safeguards like source-document validation.

PatentBench is the first open, reproducible benchmark suite for evaluating patent prosecution AI. It measures accuracy on real patent prosecution tasks including Office Action analysis, rejection classification, claim amendment quality, and citation verification. The methodology, data, and results are fully public.

How should I evaluate a patent AI tool before buying?

Ask four questions: (1) Do you publish reproducible benchmarks? (2) What is your hallucination rate on MPEP and case law citations? (3) Have you disclosed failure modes? (4) Has an independent party evaluated your tool? If the vendor cannot answer all four, you lack sufficient information to assess malpractice risk.

What does "50% more productive" mean without accuracy data?

Nothing actionable. A tool that produces responses 50% faster but hallucinates 20% of the time is a net liability. Productivity claims without accuracy data are marketing, not evidence. Speed multiplied by inaccuracy equals increased malpractice risk, not increased productivity.

All Posts

AccountabilityMar 20, 202612 min read

The Patent AI Transparency Crisis: $100M in Funding, Zero Published Accuracy Data

The patent AI industry has raised over $100 million in venture funding. Not a single company has published reproducible accuracy benchmarks. Here is the vendor-by-vendor evidence.

Roger HahnPatent Attorney (USPTO Reg. No. 46,376) | JD, MBA, MS | Founder, ABIGAIL

$100M+ in Funding. Zero Published Benchmarks.

In the last three years, patent AI startups have collectively raised over $100 million in venture capital. Solve Intelligence closed a $55M Series B. Patlytics raised $21M. IPRally secured $35M. PatSnap reached IPO-level scale. These are not small bets -- investors are pricing in the assumption that AI will transform patent prosecution.

But there is a problem that no one in the industry wants to talk about: not one of these companies has published reproducible accuracy benchmarks for their core patent prosecution capabilities.

Not a single hallucination rate. Not a single error rate on claim amendments. Not a single verifiable accuracy metric on Office Action analysis. The entire industry is asking patent attorneys to trust AI with their clients' intellectual property -- and offering nothing but marketing copy as evidence that it works.

The core problem

Patent attorneys have a duty of candor to the USPTO. They face malpractice liability for every statement in every filing. Yet the tools they are being sold have never been independently evaluated for accuracy. The vendors know their accuracy numbers. They have chosen not to share them.

Vendor-by-Vendor Analysis

We reviewed every major patent AI vendor's public materials -- websites, blog posts, press releases, academic publications, and customer case studies. Here is what we found.

Vendor	Funding	Founded	Published Benchmarks	Accuracy Claims	What's Verified
Solve Intelligence	$55M (Series B, Dec 2025)	2022	None	"50% more productive"	Nothing. Productivity claim has no published methodology, sample size, or accuracy data.
Patlytics	$21M	2023	None	"18x customer growth"	Nothing. Customer growth is a sales metric, not an accuracy metric.
IPRally	$35M	2018	None	Blog post references search metrics	Nothing. Blog discussed methodology but published zero actual data.
PatSnap	IPO-level	2007	Single metric	"81% X Hit Rate"	One metric for prior art search only. No data on prosecution analysis, claim amendments, or hallucination rates.
DeepIP	Undisclosed	2020	None	Self-published feature comparisons	Nothing. Feature lists are not accuracy data.
Lexis+ AI	$650M acquisition (RELX)	N/A	None (self-published)	General AI assistant for legal research	Stanford study found 17% hallucination rate on legal citations.
Westlaw AI	Thomson Reuters	N/A	None (self-published)	AI-assisted legal research	Stanford study found 33% hallucination rate on legal citations.

Data compiled from public sources as of March 2026. If any vendor believes this is inaccurate, we invite them to publish their benchmarks.

The Stanford Hallucination Study

In 2025, researchers at Stanford (Magesh et al.) published what is arguably the most important empirical study on AI accuracy in legal practice. They systematically tested major legal AI tools for hallucinated citations -- cases, statutes, and references that the AI presented as real but that do not exist.

Westlaw AI: 33% hallucination rateCritical

One in three legal citations generated by Westlaw AI was fabricated. These are not paraphrasing errors or minor inaccuracies -- they are citations to cases and authorities that do not exist.

Lexis+ AI: 17% hallucination rateHigh

Nearly one in five citations was fabricated. While better than Westlaw, this is still catastrophic for a tool marketed to practicing attorneys who face sanctions for citing nonexistent authorities.

General LLMs: 58-82% hallucination rateCritical

GPT-4, Claude, and other general-purpose LLMs hallucinated legal citations at rates between 58% and 82%. This establishes the baseline that legal-specific tools are trying to improve on.

The Stanford study matters for patent AI because it demonstrates an uncomfortable truth: even the largest, best-funded legal AI tools hallucinate at alarming rates. And those are the tools that have actually been tested. The patent AI vendors listed above have never been independently tested at all.

If Westlaw and Lexis -- with billions of dollars in resources -- produce hallucination rates of 17-33%, what should we expect from patent AI startups that have never submitted to independent evaluation?

What "50% More Productive" Actually Means

Solve Intelligence's flagship claim is that their tool makes patent attorneys "50% more productive." This claim has been repeated in press releases, investor materials, and marketing content. Let us examine what it actually tells you.

Questions this claim does not answer

What is the accuracy of the AI's Office Action analysis?
How often does it hallucinate prior art citations?
What percentage of suggested claim amendments introduce new matter?
How was "productivity" measured? Time to completion? Output volume? Quality?
What was the sample size? How were participants selected?
Was the study conducted by an independent party or by the vendor?

A tool that produces Office Action responses 50% faster is worthless if 20% of those responses contain hallucinated citations. It is worse than worthless -- it is a malpractice liability accelerator. Speed without accuracy is not productivity. It is risk multiplication.

The same logic applies to every vendor on the list. Customer growth, feature comparisons, and productivity claims are not substitutes for accuracy data. The only metric that matters for a patent attorney is: how often does this tool produce incorrect output that could harm my client?

The Buyer's Checklist: 4 Questions for Every Patent AI Vendor

Before you sign a contract with any patent AI vendor, ask these four questions. If they cannot answer all four, you do not have enough information to evaluate the tool's safety for your practice.

Do you publish reproducible benchmarks?

Not marketing metrics. Not customer testimonials. Published accuracy data on a disclosed dataset using a disclosed methodology that an independent party could reproduce. If the answer is no, ask why.

What is your hallucination rate on MPEP and case law citations?

Any tool that generates legal citations must disclose how often those citations are fabricated. The Stanford study showed rates of 17-33% for major legal AI tools. If a vendor does not know their hallucination rate, they have not measured it. If they have measured it and will not share it, draw your own conclusions.

Have you disclosed your failure modes?

Every AI system has failure modes -- categories of input where it performs poorly. A vendor that claims their tool works equally well on all technology areas, all rejection types, and all prosecution scenarios is either lying or has not tested thoroughly enough to discover the failure modes.

Has an independent party evaluated your tool?

Self-reported accuracy is not accuracy. Vendor-selected case studies are not benchmarks. Ask whether any third party -- academic, journalistic, or institutional -- has independently evaluated the tool. If not, you are relying entirely on the vendor's self-assessment.

The Revealed Preference Argument

Economists use the term "revealed preference" to describe the idea that people's actual choices reveal their true beliefs, regardless of what they say. The same principle applies to patent AI vendors and accuracy data.

Consider the following logic:

If a vendor's tool performed well on independent benchmarks, publishing those results would be a massive competitive advantage.
Publishing benchmarks is relatively cheap. The datasets exist. The evaluation frameworks exist. A single engineer could run a benchmark suite in a week.
Every major patent AI vendor has chosen not to publish benchmarks.
Therefore, the most likely explanation is that they expect the results to be unfavorable.

This is not speculation. This is basic incentive analysis. A vendor sitting on great accuracy numbers would publish them immediately -- it would be the single most effective marketing asset they could produce. The silence is the data.

When a vendor tells you their tool is "accurate" or "reliable" but refuses to publish numbers, they are asking you to trust their marketing department over your own due diligence. For a profession built on evidence and precision, this should be unacceptable.

PatentBench: Open Benchmarks for Patent AI

We built PatentBench because we were tired of the silence. PatentBench is the first open, reproducible benchmark suite for patent prosecution AI. It is designed to do what no vendor has been willing to do: measure accuracy on real patent prosecution tasks with transparent methodology.

Open Methodology

Every evaluation metric, dataset, and scoring rubric is public. Anyone can reproduce our results.

Public Data

The benchmark dataset is derived from real USPTO Office Actions and prosecution histories. No synthetic data.

Real Prosecution Tasks

Benchmarks cover Office Action analysis, rejection classification, claim amendment quality, and citation accuracy.

Vendor-Neutral

Any patent AI tool can be evaluated against the same benchmarks. We challenge every competitor to submit their tool for evaluation.

An Open Challenge to Every Patent AI Vendor

We have published our benchmarks, our methodology, and our results. We challenge Solve Intelligence, Patlytics, IPRally, PatSnap, and every other patent AI vendor to do the same. Run PatentBench against your tool and publish the results. If your tool is as good as you claim, you have nothing to lose.

Explore PatentBench GitHub Repository

Frequently Asked Questions

Stay Updated on Patent AI Accountability

Get notified when we publish new vendor evaluations, benchmark results, and transparency reports.

See What Transparent Patent AI Looks Like

Abigail publishes its accuracy data, discloses its failure modes, and subjects itself to independent evaluation via PatentBench. Try it free and see the difference.

Discussion

0 comments

Create an ABIGAIL account to post comments instantly (no moderation wait) and get $25 in credit to try our AI patent prosecution tools.

0/4000

First comments are held for moderation. Subsequent comments post instantly.

Discussion

0 comments

Create an ABIGAIL account to post comments instantly (no moderation wait) and get $25 in credit to try our AI patent prosecution tools.

0/4000

First comments are held for moderation. Subsequent comments post instantly.

$100M+ in Funding. Zero Published Benchmarks.

Vendor-by-Vendor Analysis

The Stanford Hallucination Study

What "50% More Productive" Actually Means

Questions this claim does not answer

The Buyer's Checklist: 4 Questions for Every Patent AI Vendor

Do you publish reproducible benchmarks?

What is your hallucination rate on MPEP and case law citations?

Have you disclosed your failure modes?

Has an independent party evaluated your tool?

The Revealed Preference Argument

PatentBench: Open Benchmarks for Patent AI

An Open Challenge to Every Patent AI Vendor

Frequently Asked Questions

Stay Updated on Patent AI Accountability

See What Transparent Patent AI Looks Like

Related Guides

Discussion

Discussion