The marketing promise for premium legal RAG-based models was a hallucination-free experience. The empirical reality is different. Why?

It is a structural problem, created by the way Large Language Models are created. The process includes inputting large amounts of information. This typically includes all the publicly available information on the Internet.

The next step is Reinforcement Learning from Human Feedback (RLHF). Human trainers grade AI model ansers reward responses that are confident, complete, and responsive. This makes the model prefer to provide an answer rather than admit ignorance. It has been trained to be a “people pleaser,” even when the facts don’t support the conclusion.

 A Stanford study published in the Journal of Empirical Legal Studies found that Westlaw’s AI hallucinated 33% of the time. Lexis+ AI, 17%. The results are similar with other vendors.

As Michael Berman and others have pointed out, the Stanford study is not perfect. Some of its conclusions may not have aged well, and Berman’s critiques on specific points are fair. But the essence of the study is correct: no large language models are error-free. While premium legal research apps using (Retrieval Augmented Generation) models may have fewer hallucinations, none are hallucination-free.

Causes of Helpfulness Bias

Hallucinations are inevitable because of the way Large Language Models are created.

One of the culprits is what I call “Helpfulness Bias.” During Reinforcement Learning from Human Feedback (RLHF), human trainers reward responses that are confident, complete, and responsive. This makes the model prefer to provide an answer rather than admit ignorance. It has been trained to be a “people pleaser,” even when the facts don’t support the conclusion.

An article in Cornell University’s ArXiv repository, titled Towards Understanding Sycophancy in Language Models” found that five state-of-the-art AI assistants consistently exhibited sycophantic behavior across multiple tasks — and that the RLHF process itself is a likely driver. When a response matched a user’s existing views, human evaluators were more likely to prefer it, even over a more accurate alternative. The models learned the lesson: tell people what they want to hear.

These issues are not unique to lawyers. They also affect doctors, as explained in a recent research paper entitled “When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior.”  

Poor Prompts Can Make Hallucinations More Likely

Lawyers can inadvertently make hallucinations more likely. A prompt like “Summarize the main arguments in Judge Learned Hand’s opinion on artificial intelligence liability.” implies that a judge named Hand has written an opinion on AI.

This prompt suggests that there is a 1954 law on the topic of non-compete agreements: Summarize the main provisions of the 1954 federal law banning all non-compete agreements.

Because these models are optimized for “helpfulness,” they will often produce a “yes” or “no” response even if the underlying legal support is nonexistent. You are effectively asking the AI to pick a side rather than conduct an objective analysis. The journal Nature has some thoughts on this phenomenon.

Making Better Answers More Likely (“Discuss” and “Critique”)

There is no magic method to prevent all hallucinations, but there are things you can do to make them less likely. One promising approach is to frame your prompts so they don’t hint at a desired answer. For example:

Some argue that [insert proposition]. Discuss.

Paul Hankin provides some tips that are useful in implementing my approach in an excellent LinkedIn entitled “Removing Bias from Legal AI Through Smarter Prompts“:

  • Ask open-ended questions without hinting at a desired viewpoint or answer
  • If comparing options, don’t ask which one is “better” – ask for an objective rundown of pros and cons for each
  • Carefully review your prompts to detect any framing or language that betrays your personal stance on the issue

I have also gotten better results by a related technique requesting that the AI app critique something:

Some people assert [insert proposition]. What, if any, support for this assertion exists, and what are the strongest counterarguments?

Each of these techniques works for the same reason: they counteract the structural helpfulness bias by signaling to the model that an honest, qualified answer is more valuable than a confident, wrong one.

More Practical Tips

Rebecca Fordon offers some excellent practical advice in her AI Law Librarians article “RAG Systems Can Still Hallucinate“:

  • Ask your vendor which sources are included in the generative AI tool, and only ask questions that can be answered from that data. Don’t expect generative AI research products to automatically have access to other data from the vendor (Shepard’s, litigation analytics, PACER, etc.), as that may take some time to implement.
  • Always read the cases for yourself. We’ve always told students not to rely on editor-written headnotes, and the same applies to AI-generated summaries.
  • Be especially wary if the summary refers to a case not linked. This is the tip from Lexis, and it’s a good one, as it can clue you in that the AI may be incorrectly summarizing the linked source.
  • Ask your questions neutrally. Even if you ultimately want to use the authorities in an argument, better to get a dispassionate summary of the law before launching into an argument.

If you’ve developed other techniques for reducing RAG hallucinations, I’d love to hear about them via LinkedIn DM or comments on this LinkedIn post.

When did “write clearly and persuasively” go from being a goal to being evidence of robot writing?

A Wall Street Journal piece this morning discusses writers deliberately degrading their own work to dodge accusations of AI use.

  • They’re scattering typos like breadcrumbs. 
  • Swapping em dashes for double hyphens.
  • Stuffing in obscure sitcom quotes. 
  • Saying things like hey yo, for real.”

Wouldn’t we all be better off focusing on writing that’s worth reading?

Strunk and White told us to omit needless words. They didn’t say to add needless errors.

============

Question for Today:

How well did I hide the AI assistance?

The LinkedIn post above was written with help from AI. That’s why I was able to publish it in less than two hours (with graphic) from the Wall Street Journal article this morning. Several of the comments on that LinkedIn post added good ideas. Add your own thoughts there.

FWIW, here’s the history of my work with Claude Pro on this.

Some Other Observations

Some authors believe it increases audience confidence in their work if they include a disclaimer of AI use.

It does not increase my confidence in their work. It makes me question their competence and judgment. If you know how to use AI apps, it’s kind of nutty not to use them. Used well, they can lead to a higher quality, more accurate product.

One of the best ways to use AI is to ask it to critique your draft.

Grammarly provides many of the benefits of AI apps, without leaving artifacts. Use the Pro version. I used to hire a very smart part-time editor to review my most important written work products. I haven’t used her once since I started using Grammarly.

Has the Internet made books obsolete? Not so far as I’m concerned. I have 20+ titles in my personal library of books about presentations—and I’ve even read most of them. If I could keep only three, my choices would be:

  1. Public Speaking for Dummies
  2. PowerPoint for Dummies, and
  3. Presentations for Dummies

Since the publication many years ago of Dan Gookin’s DOS for Dummies, the first book in the successful  Dummies line of technical books, I’ve been ambivalent about the company’s naming and marketing strategy.  However, when a book’s content is good enough, who cares if it has a condescending title?

She begged “Do not do that,” then “STOP OPENCLAW.” Neither worked.

That’s what happened to Summer Yue, Meta’s Director of Alignment at their superintelligence safety lab. By the time she reached her desktop to kill the process manually, the AI agent she’d created had already deleted hundreds of emails. You would expect someone with Yue’s expertise to avoid a problem like this. You would be wrong.

Jennifer Ellis’s article lays out a practical checklist for lawyers considering agentic AI — minimum permissions, confirmation steps, real-world-scale testing, kill switches. Every item on her list is sound. But even if you follow all of them, you’re managing risk, not eliminating it. Yue had a confirmation step. The agent ignored it anyway.

This isn’t a fringe concern. Ziff Davis reports that enterprise AI agents may become the ultimate insider threat — autonomous systems with broad access, acting on stale or misunderstood instructions, with nobody watching in real time. The parallels to law practice are obvious. Lawyers grant agents access to client files, email, and case management systems. A rogue action doesn’t just embarrass you; it can breach confidentiality, spoliate evidence, or torpedo a case.

Anthropic, the vendor behind Claude, labels its own agentic product a “research preview with unique risks due to its agentic nature and internet access.” Read that carefully. This is a company telling you its own tool isn’t fully vetted.

As Rok Popov Ledinski has observed, the gap between what agentic AI can do and what lawyers understand about controlling it is widening, not narrowing. Ellis’s suggestions are the floor, not the ceiling. Most lawyers aren’t ready for that floor.

Hallucinations can hurt your reputation and maybe your wallet. Agentic AI can destroy your law practice.

Every year brings a new legal-technology miracle. In 2026, the most aggressively promoted one may be “AI for discovery.” If you have attended even a single conference lately, you have heard the pitch. AI will slash review costs. AI will eliminate drudgery. AI will—apparently any day now—fetch your coffee. That last claim remains unproven.

What tends to get lost in the enthusiasm surrounding AI for discovery is a basic but critical distinction: not all AI is the same. The market often groups two very different technologies under a single oversized umbrella labeled AI, and the difference between them matters enormously in discovery. Definitions are in order: Technology-assisted review (TAR) is the old, reliable workhorse. It is extractive. It finds what is already there based on mathematical patterns. As an article in the Richmond Journal of Law and Technology demonstrates, it has been in use for more than a decade, is well understood, and has enjoyed broad judicial acceptance.

TAR has earned respect from courts and practitioners who value measurable performance metrics, transparent workflows, and repeatable validation. The Sedona Conference TAR Primer remains the foundational explanation of why TAR works, how it can be audited, and how precision and recall can be evaluated.

Generative AI—large language models such as ChatGPT, Claude, and Gemini—is the new, charismatic intern. It is creative. It quickly generates new text based on probability. It is dazzling at first encounter, articulate, fast, and often helpful. It is also prone to making things up when under pressure.

Generative AI lacks TAR’s long judicial track record in discovery workflows. Chatbots are trained to produce plausible text, not to classify documents according to legal standards. They do not inherently understand responsiveness, confidentiality, privilege, or legal intent. Independent evaluations, including the Stanford HAI Index, consistently warn that while generative models are powerful, they remain unpredictable in risk-sensitive contexts.

MORE at LLRX.com

Let’s stop blaming the hallucinations and focus on the real problem:

Lawyers who don’t do their job because they are too busy, too lazy, or too incompetent.

The lawyer who cites a hallucinated AI case and the lawyer who cites a real case without reading it have committed the same ethical failure. Today, it’s usually one of them who gets disciplined.

AI didn’t invent the fake citation. It just automated it.

Long before ChatGPT, lawyers were citing cases they’d never actually read. I know, because as a law clerk to a U.S. District Court judge, I read the cases they didn’t. The citations were real enough — the cases existed — but they had nothing to do with the argument being made. Every time I found one, I discounted everything else in the brief.

This wasn’t evenly distributed. Large firms, with their layers of associates and research infrastructure, rarely had this problem. The less institutional support a lawyer had, the more likely I was to find phantom relevance in their citations. That’s not an indictment of any particular lawyers — it’s an indictment of a profession that has always tolerated sloppy research as long as no one checked.

AI didn’t create a malpractice problem. It just made the existing one impossible to ignore — because now the cases don’t even exist, which is harder to explain away than citing a real case you obviously never read.

The standard hasn’t changed: if you cite it, you’d better have read it and understood it. The only thing that’s changed is that the shortcuts are getting caught.

The hype machine is working overtime on Agentic AI. Don’t fall for it.

AI chatbots merely respond to prompts. They only give you information. AI agents like Claude Cowork or Openclaw go beyond this. They are built on large language models, but can take action on your behalf.

That sounds great, but there is a big problem: Way too many security risks. AI agents are just too risky, given the current state of the technology. This is true for any business use, but it applies doubly for lawyers, given their ethical duties of client confidentiality.

Prompt injection worries me the most. Bad actors can take control of your agent in surprisingly easy ways.  Other problems include:

 * Greater Hallucination Risks: Hallucinations are a problem with all large language models, but with conventional chatbots, it’s manageable. You can verify the bot’s output before relying on it for anything significant.

 * The “Black Box” Problem: Serious questions remain regarding where client data resides, what is retained, and whether the output can be audited with any degree of legal rigor.

These risks make agentic AI a no-go for the foreseeable future. How long? Until this new product has an extensive track record of safe use in the field. This will probably be at least a year, maybe five years, maybe never.

Pioneers get arrows. Settlers take land.

The promise has become a mantra: AI will free lawyers from drudgery so they can focus on higher-value work. Thomas Martin, writing for the Thomson Reuters Institute, points to research from UC-Berkeley that complicates that story considerably. The study tracked what actually happens when knowledge workers adopt generative AI. They don’t work less. They work more — faster, broader, longer — often without realizing it.

For a profession already deep in a burnout crisis, Martin argues, this should be a wake-up call.

I can confirm the finding from the inside. I use generative AI extensively — Claude, Gemini, ChatGPT — across research, drafting, and analysis. On routine tasks, yes, AI saves time. But on the projects that matter most, I consistently invest more time, not less. The reason is simple: AI has raised my ambition. When a power tool lets you chase a higher quality ceiling, you chase it. The scope of what feels achievable expands, and you expand with it.

The additional time is worth it. The output is genuinely better — more thorough, more polished, more carefully reasoned. But that’s precisely the dynamic the Berkeley researchers identified. The efficiency gains don’t translate into free hours. They get reinvested immediately, almost invisibly, into more demanding work.

This has implications the legal profession hasn’t seriously grappled with. If AI doesn’t reduce workload but intensifies it, then the firms and institutions selling AI adoption as a path to better work-life balance are telling an incomplete story. The real question — the one Martin rightly flags — is whether we’ll make deliberate choices about how AI reshapes legal work, or simply let the tools quietly raise the bar until the new pace feels normal.

Over the past several years, platforms such as Substack have become increasingly attractive to writers seeking to establish themselves as an independent voice. The appeal is obvious. They are easy to use and can turn a writer into a publisher overnight. No web developer is required. Payment systems are integrated, and distribution is built in.

This trend has accelerated as prominent writers have left legacy publishers including the Washington Post, the Wall Street JournalTime Magazine, CBS News, CNN, and NPR in search of stability or independence. Substack markets itself as a refuge for writers who prefer autonomy to corporate hierarchy.

There are good reasons to use Substack and similar businesses, but there are also risks. These platforms are not inherently malign, but they are fragile. Substack is currently the trendy platform, but the key ideas apply to many other platforms, many of which are analyzed in an article entitled Avoiding the Platform Trap: Alternatives to Substack.

There is a seductive simplicity to the modern newsletter platform. It promises to turn a writer into a publisher overnight, without the technical overhead. It is a brilliant bargain, provided one doesn’t look too closely at who owns the title to the land.

Much more on this topic in this LLRX article: “Don’t Build Your House on Rented Land: Why Writers Should Avoid Platform Dependency and How They Can Do So.

Message for My Liberal Friends:

Fact-checking? Good.
Name-calling? Strategic malpractice.

The Facebook post graphic reproduced below illustrates both name-calling and effective fact-checking. If your goal is to change minds, contempt is self-sabotage.

Calling people “stupid” because they disagree with you may feel satisfying. It may earn applause from your side. But it will not persuade a single person who matters.

It will harden them.

Contempt Backfires

Arthur Brooks put it plainly in The Atlantic: Contempt — not disagreement — is what poisons civic life. Treating opponents with disdain doesn’t weaken them. It strengthens their identity and their resolve.
People rarely abandon beliefs because someone mocked them. They defend themselves.

And often, they escalate.

Resentment Is Political Fuel

Many Trump supporters describe feeling culturally disrespected. Jonathan Haidt warned in The New York Times that dismissing people as ignorant or immoral deepens alienation rather than persuasion.

If someone already suspects that “liberals look down on people like me,” calling them stupid doesn’t weaken that belief.

It confirms it.

And resentment is a powerful motivator.

Even Politicians Learn This the Hard Way

When Hillary Clinton used the phrase “basket of deplorables,” it became a rallying cry for her opponents. President Obama later acknowledged that the remark was politically damaging.

Insults mobilize. They do not persuade.

Elections Are Margin Games

You don’t need to persuade everyone. You need to persuade a few.

The loudest voices online are rarely the swing votes. The people who matter most are often quieter — reachable but not yet locked in.

Public shaming is designed for applause.
Persuasion is designed for outcomes.

Those are different audiences.

What Works Better

If you genuinely want to make a difference:

  • Share a calm, well-sourced fact check.
  • Send it privately.
  • Choose someone you believe is persuadable.
  • Lead with respect instead of ridicule.

You don’t need fireworks.
You need one honest conversation that lowers the temperature.

Flip a few — just a few — and the math changes.

Fact-checking is constructive.
Humiliation is counterproductive.

Respect isn’t weakness. It’s strategy.

Found on Facebook:

Screenshot