Your DMS Has a New Job: From Document Store to Intelligence Layer

By Zane Harker posted 20 days ago

Like

Please enjoy this blog post by Zane Harker, Director of Product, AI & Automation, NetDocuments.

User expectations are changing fast.

That was the through-line at a recent ILTA roundtable on AI in legal. The people on the call were not skeptics. They were knowledge management leaders, IT directors, and innovation partners invested in seeing AI work inside their firms. And the consistent observation across the table was that what users want from search is shifting under our feet. Lawyers increasingly expect the system to understand their goals, infer the intent behind a half-formed question, and come back with something synthesized rather than something to sift through. In a word, “make it like Google.”

This certainly isn't the first time I've encountered that framing. Lawyers spend their day moving between Google, ChatGPT, and other consumer-grade tools that have trained them to expect personalized results, intent inferred as if by magic, and synthesized, grounded answers to questions asked in plain language. Then they open a document management system, and the experience feels like a different era.

Closing that gap is a core problem in legal AI right now. Perhaps less obvious is why the gap exists in the first place—why is it so hard to make legal search “like Google?”

Where the Google analogy breaks down

It's true that there are many valuable user experience lessons to be gleaned from Google and other internet behemoths. With billions of users and an ocean of behavioral data, hyperscale products are acting on the best quantitative signals of search success that money can buy. What’s more, their experiences have become so familiar and commonplace that they set the norms and expectations for how search should work online. In that light, distilling their design principles and applying them to search workflows makes good sense.

But a closer look at the dynamics behind public web search reveals deep structural differences that explain why consumer Googling and tools for the legal enterprise are different not just on the surface, but at their core.

Search marketplace optimization. Today, virtually any Google result you can see is a piece of content that wants desperately to be found—and has usually paid handsomely for the privilege. Web search is a public market, and publishers invest enormous resources in making their work discoverable in it via search engine optimization, paid placement, structured data, link building. Legal is famously the most expensive vertical in paid search; industry benchmarks put the average legal cost-per-click around $8.58, with individual high-value terms reaching hundreds of dollars per click!

Contrast the economic pressure shaping a web search corpus with a typical legal organization—does your best content desperately want to be found? Who's incentivized to invest in its findability? For most firms, the reality is that a pleading is drafted for a matter, filed, and promptly abandoned. A research memo answers one partner's question on one Tuesday and then sits. Rarely is their investment after the fact in making these documents easier to surface for the next person who might need them. And the knowledge managers brave enough to try to face a steep uphill battle, sifting for valuable signals in a steadily growing pile of work product. The system is not incentivized to improve the relevance of the corpus.

Surveillance of content and behavior. Google and other hyperscale consumer services work with public data and, in many jurisdictions, carte blanche license to observe and record the content of all user interactions. By tracking what people click on after they search, Google learns from user behavior at planetary scale: billions of queries, billions of clicks, every day. Over time, that's what teaches the system which results addressed the user's intent, as opposed to merely matching the words in it.

Inside a firm, those signals aren’t available. Partly it's a volume problem, with legal search representing a tiny trickle compared to mass-market traffic that can quickly drive statistical significance. Legal search also implies a breadth and depth not typical of consumer web searches, both because the needs of different practice areas vary so much, and because searches performed in the context of expert legal work carry fundamentally different expectations than the ones you use to choose a restaurant for dinner. But most intractably, it's a confidentiality problem. As custodians of privileged legal data, responsible vendors simply cannot track much of the data that could in principle improve the system. The contents of searches and results are truly private and remain opaque to the vendor as much to any other party. Privacy-preserving techniques exist for evaluating the efficacy of algorithmic variants but leveraging them in a way that yields explainable improvement requires deep, specialized investment. The feedback that makes web search good remains largely off-limits in the environments where legal professionals operate.

Where the signal has to come from

If legal document collections don't self-optimize for findability and behavioral data is highly constrained by privacy requirements, effective relevance signals have to be constructed elsewhere, from the inside. In practice today that means leaning on three complementary techniques that each contribute something the others can't.

Keyword search remains essential and is sometimes underrated amid the current enthusiasm for Retrieval-Augmented Generation (RAG). There are many cases in legal specifically where lexical precision is the relevance signal, such as finding citations, defined terms, party names, statute numbers, and exact terms of art. A vector model trained to find semantically similar text will happily return near-matches when the user needed something more exact.

Vector search handles what keyword search can't: paraphrase, conceptual similarity, and the common case where the user knows what a document is about but not what it's called or how the original drafter phrased it. This complementarity with lexical signals closes a gap for natural language questions and is especially powerful when combined with Boolean in a hybrid retrieval model. In Corpora full of very large documents, vector search also unlocks novel and efficient means for AI agents to locate and consult relevant extracts without overrunning their limited context windows.

Metadata supplies the structural context that neither keyword nor vector can infer from text alone: what kind of document this is, whose, when, in what state, related to what else. Using AI to profile individual documents makes this problem tractable at the scale of a DMS, automatically detecting document type, jurisdiction, practice area, client, project, matter, author, execution status, etc. The harder half—that I believe will distinguish the next-generation DMS from a well-organized file share—is understanding documents in the context of the other documents around them. Example applications include auto-generated case chronologies, auto-linked amendments and supersessions, and consistency checks between exhibits and testimony.

High-quality automated metadata allows AI agents with advanced reasoning capabilities to perform increasingly complex tasks in highly flexible ways. Consider the requirements behind even a simple request like "find the most recent fully executed settlement agreement on the Acme matter." The system needs to recognize that a document is a settlement agreement and not a draft brief that mentions one. It needs to identify which version is the final executed copy, distinguishing it from prior redlines, signature-page-only PDFs, and internal drafts that may share most of their text. It needs to scope to a single matter without bleeding into adjacent matters for the same client. None of those determinations are fully resolved by similarity scores or by exact phrase matching. They depend on the metadata layer and the relationships between documents.

This is hard work. Cross-document, matter-aware understanding is an active problem. Auto-profiling accuracy is uneven across document types. Version lineage is genuinely difficult when documents move between systems, get renamed, and get partially copied into new work product.

But it is also the right problem. The ceiling on answer quality in any legal AI system is the quality of what gets retrieved. In the absence of conventional consumer signals, that ceiling is set by how well document system signals like keyword indexes, vector stores, and proactive metadata are composed. That's where the investment must be made.

The foundation, not the inference engine

For now, frontier-capability LLM inference is only available from a handful of the world's largest research labs. Legal organizations will make a variety of choices about where the user-facing experience for that inference lives, whether through an embedded first-party harness, through a custom-built internal tool, or through an enterprise ChatGPT or Claude chat connected securely to source material through the Model Context Protocol (MCP). We can expect user expectations in that arena to continue to evolve rapidly alongside the AI market. But whatever inference layer a firm chooses—whether built into the DMS, layered above it, or assembled across multiple tools—it will be fundamentally constrained by the findability of the content underneath it. Answer quality cannot exceed retrieval quality. Keyword, vector, and metadata, along with the governance model around them, must work in concert to surface the right content at the right moment, while ensuring the privacy of both user and firm.

That's the DMS's new job. Not to be the answer engine in every instance, but to be the governed retrieval foundation that any credible answer engine can be reliably built upon.

Sources & References
Legal vertical CPC benchmarks: The Ad Spend, "Google Ads Benchmarks for Legal Services 2025"; iLawyer Marketing, "Most Expensive Google Ads Keywords in the Legal Industry – 2025." High-CPC personal injury and accident keyword data drawn from publicly reported industry analyses.

DeepJudge founding background and quoted positioning: deepjudge.ai "About" page and "The Future of Legal AI is Here and It Lives in Your Knowledge" announcement.

ILTA roundtable observations cited from the author's own participation. Other technical claims reflect general literature on retrieval-augmented generation and permission-aware retrieval architectures.

Related Content
User Expectations of Search in a GenAI World