thenumerix

The Story

From Invoice Chaos to Full Automation

A six-step transformation from an 8-person AP team to a serverless AI pipeline processing 15,000 invoices per month.

1

15,000

Invoices/Month, 8 People, 25 Minutes Each

The AP team received invoices by email, PDF, portal, and fax. Every invoice required manual keying into Acumatica — vendor lookup, GL code from memory, entity routing by gut instinct. At $25/invoice fully loaded, that is $375,000 per year in AP labor alone.

2

9 Entities

200+ Vendors, Zero Consistent Format

Ashford's 9 hospitality entities (Stirling, Remington, Premier, OpenKey…) each had different GL structures, approval thresholds, and vendor relationships. The same vendor invoiced different entities differently. No OCR tool alone could handle the variance without a judgment layer.

3

97%

Azure Document Intelligence Extracts in One Call

Azure Form Recognizer's pre-built invoice model extracts vendor name, invoice number, amounts, and line items — all in one REST call at $0.01/page. Per-field confidence scores flag low-confidence extractions. Output: structured JSON ready for the judgment layer.

4

94%

Claude AI: The GL Code Judgment Layer

Rule-based GL coding works for 78% of invoices. It fails on new vendors, multi-line descriptions, and cross-entity allocations. A TF-IDF + logistic regression classifier trained on 2 years of GL history achieves 94% accuracy — 16 percentage points better than hand-coded rules.

5

73%

3-Way Match + Hierarchical Approval Routing

PO vs. Invoice vs. Goods Receipt with 2%/5% tolerance bands. Invoices under $5K that pass 3-way match auto-approve — covering 73% of all invoice volume. Only exceptions or high-value invoices require a human decision. Approval queue cut by 73%.

6

$0.013

Results: 95% Touchless, Full Audit Trail

OCR ($0.01) + Claude AI ($0.003) = $0.013/invoice vs. $15–$40 manual. 95% of invoices touch zero human hands. Full immutable audit trail in Azure SQL: extraction confidence, GL reasoning, match result, approval chain, ERP response code. DSO dropped 30% on AR.

Live Demo

See It From Every Angle

ELI5 mode explains the value to each stakeholder. Engineer mode runs a live 6-stage AP pipeline simulation with real synthetic invoices.

💼

CFO Perspective

The Math That Matters

AP cost $25/invoice × 15,000/month = $375K/year. Now it is $0.013/invoice in compute cost. That is a $370K annual saving with tighter controls, faster month-end close, and a 30% DSO reduction on AR — freeing up working capital that was sitting in float.

📉 40-day DPO → same-day posting • cash flow optimization

📋

AP Clerk Perspective

Your Job Got Better

You no longer key 15,000 invoices a month. The system handles the 95% that are straightforward. You handle 750 complex, high-value, relationship-sensitive cases per month — the ones that actually require human expertise, negotiation, and judgment.

✅ 750 meaningful cases vs. 15,000 routine data entries

⚙️

Data Engineer Perspective

The Architecture is Serverless

Email / blob → Logic Apps trigger → Form Recognizer REST → Azure Function validates → GL classifier → Acumatica REST POST → SQL audit log. Zero VMs. The entire pipeline at 15K invoices/month costs roughly $195 in Azure compute.

⚡ $0.013/invoice • 3.2s avg latency • 0 servers to manage

🔍

Auditor Perspective

Every Decision Is Traceable

Every invoice carries an immutable record: OCR field confidence, GL assignment plus model reasoning, 3-way match result with variance details, approval chain with timestamps, and ERP HTTP response code. SOC 2-compatible. No more "who approved this?" investigations.

🔒 Immutable SQL audit trail • RBAC per entity • AES-256 at rest

Classroom

How It Works — Lesson by Lesson

Six slides building from the business problem to fraud prevention. Each slide adds one layer of the system.

Slide 1 of 6

The True Cost of Manual AP

Before automation, invoice processing was the most expensive back-office function per transaction. Every invoice required a human to receive it, read it, look up the vendor, recall the GL code, key it into the ERP, route it for approval by email, follow up, and finally post. That human loop cost $15–$40 per invoice fully loaded — and introduced a 1–3% error rate that compounded into rework, late payments, and vendor relationship damage.

$25

Avg cost per manual invoice

25 min

Processing time per invoice

40 days

Average DPO (Days Payable Outstanding)

1–3%

Manual entry error rate

Slide 2 of 6

Azure Document Intelligence: Structured Extraction

Azure Form Recognizer's pre-built invoice model is a fine-tuned transformer that has been trained on millions of invoice documents. It extracts vendor name, invoice number, invoice date, due date, purchase order number, total amount, tax, and every line item — all in a single REST API call. Each field comes with a confidence score (0.0–1.0). Fields below 0.85 are flagged for human review. The output is structured JSON at $0.01/page — a 2,400× cost reduction vs. manual keying.

97%

Field-level extraction accuracy

0.85

Confidence threshold for auto-processing

$0.01

Cost per page (vs. $25 manual)

180+

Supported languages

Slide 3 of 6

Why AI Beats Rules for GL Code Classification

Rule-based GL coding uses static IF/THEN logic: "if vendor = AWS then GL = 6300." This works for 78% of invoices — the ones with known vendors and obvious categories. It fails on new vendors (no rule exists), multi-line invoices (line items span multiple GL accounts), and ambiguous descriptions ("Professional services" could be 6600 or 6800 depending on context). A TF-IDF vectorizer converts invoice text into a 5,000-feature sparse matrix. Logistic regression trained on 2 years of GL history learns the associations. Result: 94% accuracy — and it improves quarterly as you retrain on new data.

78%

Rule-based accuracy ceiling

94%

AI classifier accuracy

5,000

TF-IDF features (1-gram + 2-gram)

0.70

Confidence threshold for auto-posting

Slide 4 of 6

The 3-Way Match Protocol

Two-way matching compares only the Purchase Order against the Invoice. It catches pricing errors but misses quantity fraud: a vendor ships 80 units, invoices for 100, and the PO matches on price. Three-way match adds the Goods Receipt Note (GRN) from the warehouse, checking that received quantity equals invoiced quantity. Real invoices almost never match exactly — shipping, tax rounding, and FX create small variances. Tolerance bands (2% per line, 5% total) separate genuine fraud from noise. Setting them correctly is the entire engineering challenge: too tight and you flood reviewers with false positives; too loose and fraud slips through.

2%

Per-line item tolerance band

5%

Invoice total tolerance band

40%

False exception rate at exact match

8%

False exception rate with tolerance bands

Slide 5 of 6

Hierarchical Approval Routing

Flat approval — sending everything to the department head — creates a bottleneck and trains reviewers to rubber-stamp because every invoice looks the same. Hierarchical routing assigns approval authority by value and risk: small routine invoices auto-approve, mid-range go to cost center managers who know the context, large invoices escalate to VP with full supporting detail. The velocity trigger is the key innovation: when a cost center's 30-day rolling spend exceeds 120% of budget, ALL invoices for that center escalate one level automatically — creating self-enforcing budget guardrails with no manual policy changes required.

<$5K

Auto-approve if 3-way matched (73% of volume)

$5K–$50K

Cost center manager approval

>$50K

VP approval required

120%

Budget velocity trigger for auto-escalation

Slide 6 of 6

Fraud Prevention: MinHash LSH

The #1 AP fraud vector is the resubmitted invoice: a vendor submits the same invoice with a slightly different number, date, or amount. Exact-match deduplication catches identical duplicates but misses near-duplicates. MinHash Locality-Sensitive Hashing (LSH) solves this. It converts invoice line items into a set of minhash signatures and groups similar invoices into the same hash bucket. Any invoice landing in the same bucket as a recent invoice from the same vendor, within $50, within 30 days, is held for human review. The technique scales to millions of invoice pairs without comparing each pair explicitly — O(n) instead of O(n²). Combined with spend velocity tracking, the system provides self-healing fraud detection that tightens automatically when anomalies appear.

MinHash

LSH for near-duplicate invoice detection

30 days

Lookback window for duplicate check

$50

Amount variance threshold for LSH match

O(n)

Time complexity (vs. O(n²) brute force)

1 / 6

Key Points

Four Engineering Decisions That Made It Work

The automation works because of specific, deliberate engineering choices — each with a measurable impact.

+16 pts

AI vs. Rules for GL Coding

Rule-based GL classification plateaus at 78% accuracy — it cannot generalize to new vendors or ambiguous multi-line invoices. A TF-IDF + logistic regression classifier on 2 years of history achieves 94%. The 16-point gap widens further on hospitality-specific vendor categories where rules require constant manual maintenance.

Why it matters: 94% accuracy is the threshold where AP team trust in the system exceeds resistance to adoption

40% → 8%

Tolerance Bands Cut False Exceptions

Exact-match validation flags 40% of invoices as exceptions — creating a workload that overwhelms reviewers and destroys the ROI of automation. A 2% per-line and 5% total tolerance reduces false exceptions to 8%. Engineering the tolerance thresholds is the critical calibration: they must be loose enough to avoid noise but tight enough to catch the 2.3% genuine overpayment rate.

Why it matters: false exception rate drives reviewer workload and determines whether the system is trusted or bypassed

$375K → $195

Serverless Economics at Scale

The entire pipeline at 15,000 invoices/month: Form Recognizer OCR at $0.01/page, Azure Functions at near-zero, Claude API at $0.003/invoice, SQL writes at $0.001, Logic Apps orchestration at $0.001. Total: $0.013/invoice or $195/month — down from $375,000/year in AP labor. The cost structure is perfectly variable: it scales to zero on slow months and to any volume without infrastructure investment.

Why it matters: variable cost structure eliminates the risk of over-provisioning for peak capacity

73%

Auto-Approval Covers the Majority

73% of invoices are under $5,000 and pass 3-way match validation. These auto-approve without any human touch. The approval queue drops from 15,000 to 4,050 invoices/month. Controllers now review only the invoices that genuinely benefit from human judgment — high-value, new-vendor, or exception-flagged. The spend velocity trigger adds automatic escalation when cost centers overspend, creating budget guardrails without policy changes.

Why it matters: routing 73% automatically lets approvers give full attention to the 27% that actually need them

Production Code

Production Implementation

The three core algorithms — extraction, matching, and classification — that power the pipeline.

Invoice Data Extraction — Azure Form Recognizer (Python)

from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.identity import DefaultAzureCredential
import os

CONFIDENCE_THRESHOLD = 0.85

def extract_invoice(blob_url: str) -> dict:
    """Extract structured fields from a PDF invoice
    using Azure AI Document Intelligence."""
    client = DocumentAnalysisClient(
        endpoint=os.environ["FORM_RECOGNIZER_ENDPOINT"],
        credential=DefaultAzureCredential()
    )
    poller = client.begin_analyze_document_from_url(
        "prebuilt-invoice", blob_url
    )
    result = poller.result()

    for invoice in result.documents:
        fields = invoice.fields
        extracted = {
            "vendor_name":  _safe(fields, "VendorName"),
            "invoice_num":  _safe(fields, "InvoiceId"),
            "invoice_date": _safe(fields, "InvoiceDate"),
            "due_date":     _safe(fields, "DueDate"),
            "total":        _safe(fields, "InvoiceTotal"),
            "line_items":   _extract_lines(fields),
            "confidence":   invoice.confidence,
        }
        if invoice.confidence < CONFIDENCE_THRESHOLD:
            extracted["needs_review"] = True
            extracted["review_reason"] = (
                f"Low confidence ({invoice.confidence:.2f})"
            )
        return extracted

    return {"error": "No invoice detected", "needs_review": True}

def _safe(fields, key):
    f = fields.get(key)
    return f.value if f else None

def _extract_lines(fields):
    items = fields.get("Items")
    if not items:
        return []
    lines = []
    for item in items.value:
        sf = item.value
        lines.append({
            "description": _safe(sf, "Description"),
            "quantity":    _safe(sf, "Quantity"),
            "unit_price":  _safe(sf, "UnitPrice"),
            "amount":      _safe(sf, "Amount"),
            "confidence":  item.confidence,
        })
    return lines

3-Way Match Engine — PO / Invoice / GRN (Python)

from dataclasses import dataclass
from Levenshtein import distance as lev_distance

TOLERANCE_LINE  = 0.02   # 2% per line item
TOLERANCE_TOTAL = 0.05   # 5% on invoice total

@dataclass
class MatchResult:
    status: str           # "matched" | "exception"
    score: float          # 0.0 - 1.0
    exceptions: list
    explanation: str

def three_way_match(po, invoice, grn) -> MatchResult:
    """Compare PO, Invoice, and Goods Receipt line-by-line."""
    exceptions = []

    # 1. GRN must exist before matching
    if grn is None:
        return MatchResult("exception", 0.0,
            [{"type": "MISSING_GRN",
              "detail": f"No GRN for PO {po['po_number']}"}],
            "Goods receipt not yet recorded.")

    # 2. Match line items (fuzzy on description)
    matched, total_po, total_inv = 0, 0.0, 0.0
    for po_line in po["lines"]:
        best = _fuzzy_find(po_line, invoice["lines"])
        if best is None:
            exceptions.append({"type": "MISSING_LINE",
                "detail": f"PO line '{po_line['desc']}' not on invoice"})
            continue

        # Price variance check
        diff = abs(best["amount"] - po_line["amount"])
        if diff / max(po_line["amount"], 0.01) > TOLERANCE_LINE:
            exceptions.append({"type": "PRICE_VARIANCE",
                "detail": f"{po_line['desc']}: PO ${po_line['amount']:.2f}"
                           f" vs INV ${best['amount']:.2f}"})

        # Quantity check against GRN
        grn_line = _fuzzy_find(po_line, grn["lines"])
        if grn_line and grn_line["qty"] != best.get("qty", 0):
            exceptions.append({"type": "QTY_VARIANCE",
                "detail": f"{po_line['desc']}: GRN {grn_line['qty']}"
                           f" vs INV {best.get('qty', '?')}"})

        matched += 1
        total_po  += po_line["amount"]
        total_inv += best["amount"]

    # 3. Total tolerance check
    if total_po > 0:
        var = abs(total_inv - total_po) / total_po
        if var > TOLERANCE_TOTAL:
            exceptions.append({"type": "TOTAL_VARIANCE",
                "detail": f"PO ${total_po:.2f} vs INV ${total_inv:.2f}"
                           f" ({var:.1%} variance)"})

    score = matched / max(len(po["lines"]), 1)
    status = "matched" if not exceptions else "exception"
    return MatchResult(status, score, exceptions,
        f"{matched}/{len(po['lines'])} lines, {len(exceptions)} exceptions")

def _fuzzy_find(target, candidates):
    best, best_d = None, 999
    for c in candidates:
        d = lev_distance(target["desc"].lower(), c["desc"].lower())
        if d < best_d:
            best, best_d = c, d
    return best if best_d < 3 else None

GL Code Classifier — TF-IDF + Logistic Regression (Python)

import joblib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

CONFIDENCE_THRESHOLD = 0.70

# ── Training ──────────────────────────────────────────────────────
def train_gl_classifier(history_df):
    """Train on historical descriptions -> GL codes.
    Expects columns: 'description', 'gl_code'."""
    pipe = Pipeline([
        ("tfidf", TfidfVectorizer(
            max_features=5000, ngram_range=(1, 2),
            sublinear_tf=True, stop_words="english")),
        ("clf", LogisticRegression(
            max_iter=1000, C=5.0,
            class_weight="balanced", solver="lbfgs")),
    ])
    pipe.fit(history_df["description"], history_df["gl_code"])
    joblib.dump(pipe, "models/gl_classifier.pkl")
    return pipe

# ── Prediction ────────────────────────────────────────────────────
def predict_gl(description: str, pipe=None):
    """Return GL code with confidence; top-3 fallback
    when confidence < threshold."""
    if pipe is None:
        pipe = joblib.load("models/gl_classifier.pkl")

    probas  = pipe.predict_proba([description])[0]
    classes = pipe.classes_
    top_idx = probas.argsort()[::-1]
    best_code = classes[top_idx[0]]
    best_conf = probas[top_idx[0]]

    if best_conf >= CONFIDENCE_THRESHOLD:
        return {
            "gl_code":    best_code,
            "confidence": round(float(best_conf), 3),
            "method":     "auto",
        }

    # Low confidence: surface top 3 for human review
    return {
        "gl_code":    best_code,
        "confidence": round(float(best_conf), 3),
        "method":     "review",
        "suggestions": [
            {"gl_code":    classes[i],
             "confidence": round(float(probas[i]), 3)}
            for i in top_idx[:3]
        ],
    }

About This Demo

Technology Stack

Every component in the AP/AR automation pipeline — from email ingestion to ERP posting.

Finance Process Flows — AP/AR Automation

Production-grade invoice processing pipeline for Ashford Hospitality's 9 entities

Azure Logic Apps Azure Document Intelligence Claude 3.5 Sonnet Azure Functions (Python) Acumatica Cloud ERP Azure Key Vault Azure SQL Database Power BI Embedded Azure Communication Services Azure Monitor Azure AD / RBAC scikit-learn TF-IDF MinHash LSH Python-Levenshtein OpenAPI 3.0

Finance Process Flows
Reimagined with AI

From Invoice Chaos to Full Automation

Invoices/Month, 8 People, 25 Minutes Each

200+ Vendors, Zero Consistent Format

Azure Document Intelligence Extracts in One Call

Claude AI: The GL Code Judgment Layer

3-Way Match + Hierarchical Approval Routing

Results: 95% Touchless, Full Audit Trail

See It From Every Angle

The Math That Matters

Your Job Got Better

The Architecture is Serverless

Every Decision Is Traceable

AP Invoice Pipeline — Live Simulator

How It Works — Lesson by Lesson

The True Cost of Manual AP

Azure Document Intelligence: Structured Extraction

Why AI Beats Rules for GL Code Classification

The 3-Way Match Protocol

Hierarchical Approval Routing

Fraud Prevention: MinHash LSH

Four Engineering Decisions That Made It Work

AI vs. Rules for GL Coding

Tolerance Bands Cut False Exceptions

Serverless Economics at Scale

Auto-Approval Covers the Majority

Production Implementation

Technology Stack

Finance Process FlowsReimagined with AI

From Invoice Chaos to Full Automation

Invoices/Month, 8 People, 25 Minutes Each

200+ Vendors, Zero Consistent Format

Azure Document Intelligence Extracts in One Call

Claude AI: The GL Code Judgment Layer

3-Way Match + Hierarchical Approval Routing

Results: 95% Touchless, Full Audit Trail

See It From Every Angle

The Math That Matters

Your Job Got Better

The Architecture is Serverless

Every Decision Is Traceable

AP Invoice Pipeline — Live Simulator

How It Works — Lesson by Lesson

The True Cost of Manual AP

Azure Document Intelligence: Structured Extraction

Why AI Beats Rules for GL Code Classification

The 3-Way Match Protocol

Hierarchical Approval Routing

Fraud Prevention: MinHash LSH

Four Engineering Decisions That Made It Work

AI vs. Rules for GL Coding

Tolerance Bands Cut False Exceptions

Serverless Economics at Scale

Auto-Approval Covers the Majority

Production Implementation

Technology Stack

Finance Process Flows
Reimagined with AI