Unit tests verify that your code does what you wrote. They do not verify that your data makes sense.

An invoice marked "Sent" for 47 days without transitioning to "Overdue" is not a code bug - the status transition logic works fine. It is a data bug - something upstream never triggered the transition. A bank account balance that disagrees with the sum of its transactions is not a failing test. It is a gap between two domains that no single module owns.

These bugs survive every CI pipeline. They live at the boundaries between your domains - accounting, invoicing, analytics, document sharing - where no test has jurisdiction. Finding them requires something that can read across all domains at once.

We built PaperLink's MCP server so AI assistants could manage business data through conversation. What we did not expect: the same server turns out to be a powerful tool for finding data problems we never knew we had.

How We Discovered This

PaperLink has 25 MCP tools across six domains: transactions, bank accounts, invoices, companies, clients, and products. We built them for data entry - "record this expense," "create this invoice," "add this client." Standard conversational accounting.

Then we started asking different questions. Not "create X" but "show me everything that looks wrong."

The shift happened when we asked Claude: "Show me all transactions without a category." The AI called list-transactions and returned a list of entries we had forgotten about - expenses that got recorded but never categorized. Not bugs in the code. Gaps in the data. Invisible to every automated test we run.

That was the moment we realized: an AI with read access to multiple domains is a QA engineer. Not the kind that writes Playwright tests - the kind that audits your actual data for consistency.

The Questions We Now Ask

Here is the set of cross-domain checks we run against PaperLink's production data. The uncategorized transactions check is the one that started this - we ran it and found real gaps. The others are checks we have since added to our routine. Not all of them have caught real issues yet, but each one targets a class of inconsistency that unit tests structurally cannot cover.

Accounting: Balance Reconciliation

"Does the bank account balance match the sum of its transactions?"

The AI calls list-accounts to get current balances, then list-transactions filtered by account to sum recorded entries. If the numbers diverge, one of two things happened: a transaction was missed during import, or a balance sync failed silently. Both systems pass their own tests. The inconsistency lives between them.

Invoicing: Stuck State Machines

"Which invoices have been in Sent status for more than 30 days?"

The AI calls list-invoices filtered by status, then checks dates. An invoice sitting in "Sent" for 47 days should have transitioned to "Overdue" automatically. If it did not, the scheduled job failed, the client record was deleted, or there is a timezone bug in the date comparison. The AI does not fix these - it surfaces them for a developer to investigate.

"Are there invoices with a zero total?"

A zero-total invoice means broken line items. Quantities set to zero, prices missing, or a calculation that failed silently. These are valid records in the database - they pass schema validation - but they represent real business problems.

Cross-Domain: Revenue Consistency

"Compare total invoiced revenue with total recorded income in accounting."

This is the check that catches aggregation drift. Invoice totals are computed from line items. Accounting revenue comes from payment transactions. If these two numbers disagree, something in the pipeline is wrong - a payment was recorded without matching an invoice, or an invoice was paid but the income transaction was never created.

The most valuable checks compare data that should agree but is computed independently. Invoice totals vs. accounting revenue. Transaction sums vs. account balances. These independent computations create a natural audit trail - disagreement means a bug exists somewhere in the pipeline.

What MCP Gives You That SQL Does Not

You could run these checks with SQL. Many teams do - monitoring dashboards, scheduled queries, alerting rules. We have done it ourselves. The difference is not capability but practical friction. Here is what we found matters in our implementation.

Business authorization is built into every tool. In PaperLink's MCP server, every tool filters by team automatically. With raw SQL, you write WHERE team_id = X on every query and hope nobody forgets. One missed filter means reading another team's data. This is not a property of MCP as a protocol - another server might not enforce this. But a well-designed MCP server bakes authorization into the tool layer, so the person asking questions cannot accidentally cross boundaries.

No schema knowledge required. In our case, invoices live in a documents table, categories have polymorphic ownership, and Account means OAuth tokens while FinancialAccount means bank accounts. Nobody asking ad hoc questions should need to know this. Our MCP tools return business objects - "Invoice #247, Sent, $3,200" - not raw rows with 40 columns and cryptic foreign keys. Again, this depends on how you build your tools - but the abstraction is the point.

Checks evolve without code changes. This one is inherent to the pattern, not implementation-specific. When you think of a new consistency check, you type it in English. No SQL query to write, no dashboard widget to build, no deployment to schedule. Tomorrow you might ask "which clients have invoices but no recorded payments in the last 90 days?" That check did not exist before you asked - and it required no engineering work to create.

The first two properties depend on how you build your MCP server. The third is universal. Together, they make it practical to run dozens of ad hoc consistency checks where previously you would maintain two or three.

How to Try This With Your Own MCP Server

This approach works with any MCP server that exposes read access to multiple domains. You do not need PaperLink specifically - the pattern applies to any SaaS product with an MCP integration.

Identify boundary questions. Where should two domains agree? Revenue in invoices vs. revenue in accounting. Document access logs vs. NDA signatures. User counts vs. subscription seats. Write these as natural language questions.

Run them periodically. Ask your AI assistant these questions weekly, or after major data imports. Some teams automate this with Claude Code slash commands - a markdown file that runs a predefined set of cross-domain checks.

Investigate divergences. When numbers disagree, use the AI to drill in. "Why is this invoice at $0?" leads to "show me its line items" leads to "when was this product's price last updated?" Each follow-up is a new tool call, not a new SQL query to write.

Decide what to fix. Some inconsistencies need code changes (add a missing status transition). Some need data migrations (backfill empty categories). Some just need monitoring (alert when account balances drift beyond a threshold). The AI helps you find them - what you do with them is an engineering decision.

If you want to try this with PaperLink's data, the MCP server is on the official Anthropic MCP Registry. The setup guide takes under 60 seconds - one command, no installation.

What This Is and What It Is Not

This is an early practice, not an established methodology. The QA industry is investing heavily in AI-powered testing, but almost entirely focused on UI testing (Playwright, visual regression) and code testing (test generation, self-healing selectors). Data-level consistency checking through MCP is a different category. When we researched this topic, every article we found about MCP and QA focused on UI testing or code testing - Playwright MCP for browser automation, Applitools on visual regression, OpenObserve's AI agent council for E2E test generation. All valuable work. None of it addresses using MCP to audit production data for cross-domain consistency. That does not mean nobody is doing it - but it does suggest the idea is underexplored.

We are not claiming this replaces unit tests, integration tests, or dedicated QA tools. Those catch code bugs reliably. What they do not catch - what they are not designed to catch - is the slow drift of production data across domain boundaries.

Your MCP server already has the tools. Your AI already has the reasoning capability. The missing piece is the question.

Try asking: "Does our data make sense?"

Your MCP Server Is a QA Engineer You Haven't Hired Yet

How We Discovered This

The Questions We Now Ask

Accounting: Balance Reconciliation

Invoicing: Stuck State Machines

Cross-Domain: Revenue Consistency

What MCP Gives You That SQL Does Not

How to Try This With Your Own MCP Server

What This Is and What It Is Not

Готові спробувати PaperLink?

Схожі записи

AI-Powered Accounting: Why Conversation Beats Forms

How to Connect Your AI Assistant to PaperLink

Document Analytics for Consulting Firms