Why don't patent AI vendors publish benchmarks?

There are several likely reasons. Until PatentBench, there was no standardized benchmark for patent prosecution tasks. Publishing numbers creates accountability. Many patent AI tools are thin wrappers around general-purpose LLMs, and the vendors may not have rigorous internal testing processes.

PatentBench is the first open benchmark suite for patent prosecution AI. It contains 7,200+ test cases across 5 domains: rejection parsing, claim-prior art mapping, argument generation, amendment drafting, and procedural compliance.

How does the Stanford legal AI hallucination study relate to patent AI?

Magesh et al. at Stanford found that Lexis+ AI hallucinated 17% of the time and Westlaw AI hallucinated 33% of the time on general legal research tasks. Patent prosecution is more technically demanding, making rigorous benchmarking even more critical.

What is LegalBench and why doesn't it cover patents?

LegalBench is a collaborative benchmark with 162 tasks for evaluating legal reasoning in AI systems. However, none of its 162 tasks address patent prosecution. Patent prosecution requires highly specialized domain knowledge that general legal benchmarks do not capture.

Can any vendor run PatentBench on their tool?

Yes. PatentBench is open source and available on GitHub. Any vendor can run the benchmark against their tool and publish the results. The test cases, evaluation rubrics, and scoring methodology are all publicly available.

Does Abigail publish its PatentBench scores?

Yes. Abigail publishes full PatentBench results, including per-domain breakdowns and failure analysis. Patent attorneys have the right to see exactly how well an AI tool performs before trusting it with their clients' intellectual property.

All Posts

Industry AnalysisMar 20, 20268 min read

Why No Patent AI Tool Publishes Benchmarks

Patent prosecution is a $7B+ market. Dozens of AI vendors compete for it. Zero of them publish accuracy benchmarks. Here is why that should concern every patent attorney evaluating these tools.

Roger HahnPatent Attorney (USPTO Reg. No. 46,376) | JD, MBA, MS | Founder, ABIGAIL

A $7 Billion Market With Zero Accountability

Patent prosecution -- the process of responding to USPTO Office Actions, drafting claims, and navigating examiner rejections -- generates over $7 billion in annual legal fees in the United States alone. A wave of AI startups and legal tech incumbents now promise to automate significant portions of this work. They promise faster responses, lower costs, and higher-quality arguments.

There is one thing none of them promise: measurable accuracy.

Not a single patent AI vendor publishes standardized benchmarks showing how their tool performs on real patent prosecution tasks. No accuracy scores. No error rates. No independent validation. Attorneys are being asked to trust these tools with their clients' intellectual property based entirely on marketing claims and demo videos.

The Benchmark Scorecard: Every Major Vendor

We surveyed every major AI tool used in patent prosecution. The result is a table of blanks.

Vendor	Funding / Scale	Published Benchmarks	Open Methodology
Solve Intelligence	$55M raised	None	None
Patlytics	$21M raised	None	None
IPRally	$35M raised	None	None
PatSnap	IPO-level ($300M+)	None	None
Lexis+ AI (LexisNexis)	$650M acquisition	None	None
Westlaw AI (Thomson Reuters)	Public company	None	None

Combined, these companies represent over $1 billion in venture capital and acquisitions. Not one dollar has produced a published accuracy benchmark for patent prosecution tasks.

The Stanford Precedent: What Happens When You Actually Test Legal AI

Magesh et al. (2024) -- Stanford Study on Legal AI Hallucination

Researchers at Stanford tested the two largest legal AI platforms -- Lexis+ AI and Westlaw AI -- on real legal research tasks. The results were sobering:

17% hallucination rate for Lexis+ AI
33% hallucination rate for Westlaw AI (CoCounsel)

These tools are designed for general legal research -- finding cases, summarizing holdings, answering legal questions. Patent prosecution is substantially harder. It requires parsing technical claim language, mapping claim elements to prior art disclosures, understanding 35 USC rejection bases, and generating amendments that must be supported by the original specification word-for-word.

If the best-funded legal AI tools hallucinate 17-33% of the time on general legal tasks, what happens when you apply similar technology to patent prosecution without independent benchmarking? No one knows. Because no one has published the numbers.

The Benchmark Gap in Legal AI

Other domains of AI have rigorous benchmarking ecosystems. Medical AI has clinical trial requirements. Autonomous vehicles have standardized safety tests. Even general-purpose LLMs compete on published benchmarks like MMLU, HumanEval, and HellaSwag.

Legal AI has made some progress. LegalBench, a collaborative effort from legal NLP researchers, established 162 tasks for evaluating legal reasoning. But none of those 162 tasks address patent prosecution. Zero tasks for claim interpretation. Zero for Office Action analysis. Zero for rejection classification or amendment drafting.

Similarly, legalbenchmarks.ai -- one of the few sites attempting to track legal AI performance -- covers contract review only. No patent prosecution coverage at all.

LegalBench

162 tasks

0 patent prosecution tasks

legalbenchmarks.ai

Contracts only

0 patent prosecution tasks

Patent-specific benchmarks

Until PatentBench

Introducing PatentBench: The First Patent Prosecution Benchmark

We built what the industry would not. PatentBench is the first open benchmark suite designed specifically for patent prosecution AI. It tests the tasks that patent attorneys actually perform, using real Office Actions and expert-validated evaluation criteria.

7,200+

Test cases

Prosecution domains

Real

USPTO Office Actions

Open

Methodology & rubrics

PatentBench covers the five core domains of patent prosecution AI:

1
Rejection Parsing
Can the AI correctly identify each rejection type, the rejected claims, and the statutory basis from an Office Action?
2
Claim-Prior Art Mapping
Can the AI accurately map which claim elements the examiner says are taught by which prior art references?
3
Argument Generation
Can the AI produce technically accurate arguments distinguishing the claims from the cited prior art?
4
Amendment Drafting
Can the AI suggest claim amendments that (a) overcome the rejection and (b) have support in the original specification?
5
Procedural Compliance
Does the AI correctly handle deadlines, response formatting, and USPTO procedural requirements?

Every test case uses real Office Actions from the USPTO. Every evaluation rubric was validated by registered patent attorneys. The methodology is fully open -- anyone can inspect the test cases, run the benchmarks, and verify the results.

The Challenge: Publish Your Numbers

We are publishing our PatentBench results. Every score. Every failure mode. Every domain breakdown. We are doing this because patent attorneys deserve to make informed decisions about the tools they use to prosecute their clients' patents.

We challenge every patent AI vendor to do the same. Run PatentBench against your tool. Publish the results. If your product is as good as your marketing claims, the numbers will show it.

View PatentBench Results GitHub Repository

Frequently Asked Questions

Stay Updated on Patent AI Benchmarks

Get notified when we publish new PatentBench results, vendor comparisons, and patent prosecution AI analysis.

Discussion

0 comments

Create an ABIGAIL account to post comments instantly (no moderation wait) and get $25 in credit to try our AI patent prosecution tools.

0/4000

First comments are held for moderation. Subsequent comments post instantly.

Discussion

0 comments

Create an ABIGAIL account to post comments instantly (no moderation wait) and get $25 in credit to try our AI patent prosecution tools.

0/4000

First comments are held for moderation. Subsequent comments post instantly.

A $7 Billion Market With Zero Accountability

The Benchmark Scorecard: Every Major Vendor

The Stanford Precedent: What Happens When You Actually Test Legal AI

Magesh et al. (2024) -- Stanford Study on Legal AI Hallucination

The Benchmark Gap in Legal AI

Introducing PatentBench: The First Patent Prosecution Benchmark

Rejection Parsing

Claim-Prior Art Mapping

Argument Generation

Amendment Drafting

Procedural Compliance

The Challenge: Publish Your Numbers

Frequently Asked Questions

Related Guides

Stay Updated on Patent AI Benchmarks

Discussion

Discussion