Skip to content

AI Fundamentals for Finance

How LLMs actually behave — probabilistic outputs, hallucinations, context, intent, agents — and the one rule that keeps AI usable when numbers must be correct, not approximately correct.

Most people meet AI through a chat box. You type a question, you get an answer, and it feels like a calculator that talks. It is not. Working with AI in accounting — where numbers must be correct, not approximately correct — requires a different mental model.

This page covers the minimum you need: how Large Language Models behave, what to feed them, how to ask, how agents work, and the one rule that keeps AI usable for finance.

LLM Nature

A Large Language Model does not look up answers. It predicts the most likely next word, piece by piece, with randomness baked in. Different models trade off speed for depth, but all share this nature.

The consequence is unintuitive: the same prompt can produce different answers. Ask an LLM to compute a tax three times and you may get three slightly different numbers. The fourth try might be wrong by a wider margin.

Same promptLLMAnswer AAnswer BAnswer C

LLMs are probabilistic, not deterministic. Treat their direct output as a draft, never as a verdict.

Hallucinations

When an LLM does not know something, it does not stop. It guesses — fluently, confidently, and often wrongly. It will invent account names that don’t exist, cite tax rules that were never written, and reference invoices it has never seen. This is called hallucination, and it is not a bug to be patched away. It is a direct consequence of how the model works.

The lesson is simple: never trust raw LLM output for facts. Verify, ground, or — better — route the work through something deterministic.

Context

Hallucinations get worse when the model has nothing to ground itself on. That is where context comes in.

An LLM only knows what is in its context window right now: your current prompt, the files you attached, the recent conversation. It does not know your books. It does not remember last week. Each session starts blank.

You build context by handing it relevant pieces — a chart of accounts, a transaction list, a policy document, a project’s AGENTS.md, an installed skill. Persistent context (skills, project files) saves you from pasting the same information every time.

But context has a sweet spot. Too little, and the model invents. Too much, and it loses focus, mixes unrelated pieces, and slows down. The key is curation — give the model exactly what it needs to answer the question in front of it, nothing more.

Intent

Most people instinctively talk to AI the way they would write instructions for a person: first do this, then do that, then check this. That habit comes from automation — scripts, macros, step-by-step recipes. With AI it works against you.

AI is not an automation engine you feed steps to. It is closer to a worker you hand a goal to. You describe what you need and why, and let the model figure out the how. That shift is often called intent.

It matters because the engine underneath is probabilistic (see LLM Nature). A vague goal gives a probabilistic model room to drift. A clear outcome gives it a target to aim at — and gives you something concrete to check at the end.

Compare:

  • Step-by-step (the old habit): “Open my book, filter transactions tagged #sales for Jan–Mar, sum the VAT column, convert to EUR, give me the total.” You are scripting the work. And unlike a real script, sending the right steps does not guarantee the right answer — the model can still misread a step, skip one, or quietly invent a number along the way. You inherit all the risk of the steps and all the risk of the model.
  • Intent (what done looks like): “I need the VAT I owe for Q1 2025, in EUR, ready to file. Use my Bkper book as the source of truth.” You describe the outcome. The agent decides the how — which transactions to pull, which tag to trust, which math to run, and often whether to answer directly or write a small script that computes it for you (more on that in AI in Accounting).

Pair intent with success criteria — a plain way of saying how will I know the answer is right when I see it? For an accountant that is usually concrete: an expected total you already estimated, a report shape that matches last quarter’s, a reconciliation that should come out to zero, a specific account whose closing balance you know. Without success criteria the model has no way to know when it is done — and neither do you.

Agents

An agent is an LLM running in a loop with tools. At each step the model proposes an action, runs a tool (a CLI command, a script, an API call), and observes the result. That observation feeds the next step — which may be progress, a correction, a retry, or a completely different approach. The loop keeps turning until the success criteria are met.

LLMToolDonepropose & runobservesuccess criteria met

This is the shape behind the bkper CLI agent and every other AI assistant that does real work. The success criteria is what closes the loop — without it, a probabilistic engine running freely produces drift, not progress. And a loop is only as trustworthy as the tools inside it — which is the problem accounting forces us to solve.

AI in Accounting

AI fundamentals apply across finance. The accounting layer is where they get strict — because accounting numbers don’t have a tolerance band.

Accounting cannot be 99% right. A balance sheet that is mostly correct is wrong. A tax filing that is approximately accurate is a problem. And no technique — better prompts, richer context, smarter agents — makes an LLM’s output guaranteed correct. Errors will happen, and inside an agent loop they compound silently between checks.

So the rule is not make the AI correct. Nothing makes the AI correct. The rule is:

Never let unverified LLM output be the final word on a number.

The practical question is how to keep verification cheap. That is what code is for.

When an LLM writes a script that computes the answer, you stop verifying outputs and start verifying the script. You read it once, test it, and trust it as long as it doesn’t change. From then on the same inputs give the same outputs, auditable line by line. Verification becomes a one-time cost instead of a per-result cost.

LLM → OutputLLM$1,234.56$1,238.10$1,221.00LLM → Code → OutputLLMCode$1,234.56$1,234.56$1,234.56

That shifts the rule into a practical split:

  • Deterministic work — tax calculations, reports, reconciliations, financial statements, balance computations — has a single correct answer that must be reproducible. Have the LLM write code (or call a deterministic tool), and verify the code, not each output. The work becomes repeatable, auditable, and reusable.
  • Non-deterministic work — spotting suspicious transactions, surfacing business insights, bootstrapping a chart of accounts, summarizing a period — has no single correct answer. Direct LLM output is acceptable here, but only as a draft for a human to review and decide on.

In both cases the human stays in the loop. AI doesn’t remove the reviewer; it changes what arrives for review. With code carrying the deterministic load, the human is checking artifacts a human can actually check — a script, a report, a flagged item — instead of re-checking every number the model emits.

What’s next

  • AI Tooling — how to get Bkper docs and context into any AI
  • Coding Agents — build Bkper integrations with the bkper agent CLI command