Field Notes & Resources is the writing arm of Applied AI North — an applied-AI practice in Toronto. Comparison guides, how-to playbooks, concept definitions, and field essays for teams turning AI from hype into measurable workflow change. Updated monthly. Written by practitioners who ship, for the people who have to make it stick.
Long, structured comparisons of the tools, models, and patterns we use daily. Each one ends with a recommendation, not a shrug. Updated whenever a new model release moves the verdict.
Which model wins which job, measured against four production workloads: structured extraction, multi-step agentic tool use, long-document Q&A, and cost-per-task at scale.
Read the comparison ↗
When to reach for which workflow tool when LLM steps are part of the equation. Self-hosting, branching logic, vendor lock-in, and the breakpoint where a workflow tool stops being the right answer.
Read the comparison ↗
Most AI “agents” are one of three things in a trench coat. Naming them correctly is the first decision that affects scope, price, and evaluation strategy. A field taxonomy with examples.
Read the taxonomy ↗
Three approaches to building a production agent loop. Each one optimizes for something different: control, ergonomics, or velocity. The one we reach for most often, and why.
Read the comparison ↗
Written as numbered steps with the assumption that you will actually try them. Pulled from real engagements, with the parts that don't work crossed out.
Short definitions of the terms we actually use in client conversations, written so a non-technical operator can use them in a meeting on the same day. Each entry links to a longer explainer.
Applied AI is the use of language models, retrieval, and agentic workflows to change how specific work actually gets done inside a business — as a production system that measurably saves time or unlocks throughput, not as a demo or a chatbot. The emphasis is on adoption by the people doing the work.
Context engineering is the deliberate design of what information a model sees at the moment of a request — system prompt, retrieved documents, tool descriptions, history, examples. It accounts for more of output quality than prompt wording does. Most production AI failures are context failures, not prompt failures.
An eval harness is the code and data that lets you measure an AI system's quality against a labeled test set, automatically, on every change. It is the closest thing AI engineering has to a unit-test suite. Without one, every change is a guess and every regression is a surprise.
Human-in-the-loop (HITL) is a system design where an AI produces a draft and a human reviews, edits, or approves it before the outcome is committed. The trigger is usually a confidence threshold: above the floor, ship; below, route to a person. HITL is how production AI stays both fast and trusted.
Anchor prompting is a pattern where the full instruction set lives in an uploaded reference document, and chat prompts invoke it by name. This stabilizes long agent sessions, reduces context rot, and makes instructions easier to version and update than inline prompts.
Context rot is the degradation of model output quality over a long conversation as the context window fills with irrelevant or contradictory information. Symptoms include rule-following drift, repeated answers, and ignored constraints. The fix is structured truncation, not a bigger window.
A confidence floor is the model-reported probability threshold below which a response is routed to a human reviewer instead of being committed automatically. Typical values are 0.80–0.95. The right floor depends on the cost of a wrong answer relative to the cost of a human review.
AI literacy is a working understanding of what current AI tools can and cannot do, how to use them effectively for a specific job, and how to recognize when their output is wrong. It is closer to a research skill than a technical one, and it is the highest-leverage training a non-technical team can do this year.
The essays we write when something repeats often enough across engagements to be worth naming. Roughly two a month. No newsletter funnel, no listicle quotas.
A small section because most of our clients are here. Practical, plain-English writing on Canadian AI policy, data residency, and the questions our procurement teams keep asking.
Short, direct answers to the questions that come up on the first call with new clients. Each answer is forty to sixty words because that is what survives extraction by an answer engine. Same questions, longer answers, live across the rest of this page.
Applied AI is the use of large language models, retrieval systems, and agentic workflows to change how specific work actually gets done inside a business — not as a demo or a chatbot, but as a system in production that measurably saves time, reduces errors, or unlocks new throughput.
Most production-grade applied-AI engagements run four to eight weeks end-to-end. A one-week assessment scopes the work; a two-to-four-week build delivers a working pass; a final pilot and adoption phase hardens the system and trains the team. Projects that run longer usually have an organizational problem, not a technical one.
Published price bands at Applied AI North range from CA$5,500 for a one-week assessment to CA$72,000 for a full agent build with monitoring and adoption support. Most clients combine two or three engagements over six to nine months, averaging CA$25,000 to CA$85,000 in total. Workshops start at CA$3,500.
A chatbot responds to messages. An agent decides what to do, executes tool calls, observes the results, and loops until a task is complete. Agents have memory, planning steps, and side-effects: they send emails, write to databases, or call APIs. Chatbots produce text; agents produce outcomes.
Measure the workflow the AI replaces, not the AI itself. Track time-per-task before and after, error rates against a labeled ground-truth set, human-review percentage, and adoption rate (daily active users among the intended audience). A 92% accurate system used by the whole team beats a 99% accurate system nobody trusts.
Context engineering is the deliberate design of what information an AI model sees at the moment of a request — the system prompt, retrieved documents, tool descriptions, conversation history, and structured examples. It accounts for more of the output quality than prompt wording does. Most production AI failures are context failures, not prompt failures.
An eval set is a labeled collection of input-output examples used to measure how well an AI system performs against your real data, before and after every change. For a document-extraction agent, that means 200 to 2,000 historical documents with the correct answers attached. The eval set is the deliverable; the agent is the side-effect of building it.
Yes. Canadian businesses must comply with PIPEDA on data handling, watch for the AIDA framework under Bill C-27, and consider data-residency preferences for client data. Practically: pick model providers with Canadian or US-only data-residency options, log consent, and avoid sending personally identifiable information to public LLMs without a data-processing agreement.
Human-in-the-loop (HITL) is a system design where an AI generates a draft or decision but a human reviews, approves, or edits it before the outcome is committed. The trigger is usually a confidence threshold: if the model is above 0.86, ship; below, route to a person. HITL is how production AI stays accurate and trusted.
Three signals: leaders can describe the workflow they want to change in plain English; one named person on the team is willing to own the system after launch; and you can produce twenty real examples of the work the AI will do. If any of these is missing, start with a literacy workshop before a build.
RSS or email, your call. Plain text. We will never sell your address, and we will not send anything that isn't writing.
or grab the RSS: /feed.xml ↗