Skip to content

AI Fundamentals

How LLMs actually work — and how to use them safely for accounting. Probabilistic outputs, hallucinations, context, intent, agents, and why deterministic tasks must run through code.

Most people meet AI through a chat box. You type a question, you get an answer, and it feels like a calculator that talks. It is not. Working with AI in accounting — where numbers must be correct, not approximately correct — requires a different mental model.

This page covers the minimum you need: how Large Language Models behave, what to feed them, how to ask, how agents work, and the one rule that keeps AI usable for finance.

LLM Nature

A Large Language Model does not look up answers. It predicts the most likely next word, one token at a time, with randomness baked in. Different models exist with different strengths — some optimized for speed, others for reasoning — but they all share this nature.

The consequence is unintuitive: the same prompt can produce different answers. Ask an LLM to compute a tax three times and you may get three slightly different numbers. The fourth try might be wrong by a wider margin.

Same promptLLMAnswer AAnswer BAnswer C

LLMs are probabilistic, not deterministic. Treat their direct output as a draft, never as a verdict.

Hallucinations

When an LLM does not know something, it does not stop. It guesses — fluently, confidently, and often wrongly. It will invent account names that don’t exist, cite tax rules that were never written, and reference invoices it has never seen. This is called hallucination, and it is not a bug to be patched away. It is a direct consequence of how the model works.

The lesson is simple: never trust raw LLM output for facts. Verify, ground, or — better — route the work through something deterministic.

Context

An LLM only knows what is in its context window right now: your current prompt, the files you attached, the recent conversation. It does not know your books. It does not remember last week. Each session starts blank.

You build context by handing it relevant pieces — a chart of accounts, a transaction list, a policy document, a project’s AGENTS.md, an installed skill. Persistent context (skills, project files) saves you from pasting the same information every time.

But context has a sweet spot. Too little, and the model invents. Too much, and it loses focus, mixes unrelated pieces, and slows down. The skill is curating — give the model exactly what it needs to answer the question in front of it, nothing more.

Intent

Context tells the model what it has. Intent tells it what done looks like.

A vague request like “help me with my taxes” combined with a probabilistic model is a recipe for noise. A clear intent — “compute Q1 VAT owed using transactions tagged #sales, output a single number in EUR” — gives the model a target it can actually hit, and gives you something you can verify.

Always pair intent with success criteria: a number you expect, a report shape, a test that should pass, a screenshot of the result you want. Without success criteria, the model has no way to know when it is done — and neither do you.

Agents

An agent is an LLM running in a loop with tools. At each step the model proposes an action, runs a tool (a CLI command, a script, an API call, an MCP server), and observes the result. That observation feeds the next step — which may be progress, a correction, a retry, or a completely different approach. The loop keeps turning until the success criteria are met.

LLMToolDonepropose & runobservesuccess criteria met

This is the shape behind the bkper CLI agent and every other AI assistant that does real work. The success criteria is what closes the loop — without it, the agent doesn’t know when to stop, and a probabilistic engine running freely produces drift, not progress.

AI in Accounting

Accounting cannot be 99% right. A balance sheet that is mostly correct is wrong. A tax filing that is approximately accurate is a problem. And no matter how well you craft the prompt or curate the context, an LLM’s direct output is never guaranteed to be 100% correct — and errors compound when an agent loops on top of them.

The way out is to recognize what LLMs are genuinely excellent at: writing code. Code is deterministic. The same input always produces the same output. You write it once and it runs forever, identical every time.

LLM → OutputLLM$1,234.56 ✗$1,238.10 ✗$1,221.00 ✗LLM → Code → OutputLLMCode$1,234.56 ✓$1,234.56 ✓$1,234.56 ✓

The rule:

  • Deterministic work — tax calculations, reports, reconciliations, financial statements, balance computations — never trust direct LLM output. Have the LLM write code, and run the code. The result is repeatable, auditable, and reusable.
  • Non-deterministic work — spotting suspicious transactions, surfacing business insights, bootstrapping a chart of accounts, summarizing a period — direct LLM output is fine, because there is no single “correct” answer to compare against.

The investment is one-time: you ask the LLM to build the script, the report, the rule. From then on, you run it. You get the leverage of AI’s intelligence with the certainty of code — which is the only combination that works for finance.

What’s next

  • AI Tooling — how to get Bkper docs and context into any AI
  • Coding Agents — build Bkper integrations with the bkper agent CLI command