Field Notes & Resources

Practical AI writing for people who have to use it.

Field Notes & Resources is the writing arm of Applied AI North — an applied-AI practice in Toronto. Comparison guides, how-to playbooks, concept definitions, and field essays for teams turning AI from hype into measurable workflow change. Updated monthly. Written by practitioners who ship, for the people who have to make it stick.

▎ FEATURED DEEP READ
// project lifecycle month 1 ─── build month 2 ─── pilot month 3 ─── live month 4 ─── usage ● month 5 ─── usage ● month 6 ─── usage ◐ month 7 ─── usage ○ ← fails here month 8 ─── usage ○ month 9 ─── shelved month 12 ─── replaced // 73% of AI rollouts fail by month 7 // the predictor is at month 0, not month 6
Fig. 1 · A typical post-launch decay curve. The fix happens before the build, not after the failure.
Field essay · 9 min read · Updated May 2026 · By the Applied AI North team

The Six-Month Rule.

Most AI projects fail at month seven. Not because the model regressed, the API moved, or the team lost interest — but because the question of who pays the cost of the system working was never named at month zero. A short diagnostic for AI projects in production.

Adoption Project scoping Change management

Read the essay


§ 01Comparison Guides

Side-by-side, no fence-sitting.

Long, structured comparisons of the tools, models, and patterns we use daily. Each one ends with a recommendation, not a shrug. Updated whenever a new model release moves the verdict.

Comparison · 14 min · Updated Apr 2026

Claude Sonnet 4.5 vs GPT-5 vs Gemini 2.5 Pro

Which model wins which job, measured against four production workloads: structured extraction, multi-step agentic tool use, long-document Q&A, and cost-per-task at scale.

TaskSonnet 4.5GPT-5Gemini 2.5
Extraction●●●●●○●●○
Agentic tools●●●●●●●●○
Long-doc Q&A●●○●●○●●●
Cost / 1M tok$3.00$5.00$1.25
BenchmarksCost analysisProduction

Read the comparison

Comparison · 11 min · Updated Mar 2026

n8n vs Make vs Zapier for AI workflows

When to reach for which workflow tool when LLM steps are part of the equation. Self-hosting, branching logic, vendor lock-in, and the breakpoint where a workflow tool stops being the right answer.

Best forn8nMakeZapier
Self-host
Branching logic●●●●●●●●○
App library●●○●●○●●●
Cost @ scale$$$$$$
Workflow toolsAutomation

Read the comparison

Taxonomy · 9 min · Updated Feb 2026

Concierge, reference, operator: the three agent shapes

Most AI “agents” are one of three things in a trench coat. Naming them correctly is the first decision that affects scope, price, and evaluation strategy. A field taxonomy with examples.

  • CONCIERGECustomer-facing chat. Optimized for time-to-resolution.
  • REFERENCEInternal Q&A over docs. Optimized for citation accuracy.
  • OPERATORMulti-step task agent. Optimized for tool-call reliability.
TaxonomyAgents

Read the taxonomy

Comparison · 10 min · Updated Jan 2026

LangGraph vs Vercel AI SDK vs raw OpenAI Agents

Three approaches to building a production agent loop. Each one optimizes for something different: control, ergonomics, or velocity. The one we reach for most often, and why.

Optimized forLangGraphVercelOpenAI
Control●●●●●○●●○
DX●●○●●●●●○
Time to first run●○○●●○●●●
FrameworksEngineering

Read the comparison


§ 02How-tos & Playbooks

Steps you can follow, not theory.

Written as numbered steps with the assumption that you will actually try them. Pulled from real engagements, with the parts that don't work crossed out.

PLAYBOOK

How to write your first eval set

A nine-step guide to building the labeled test set that should exist before you write a single prompt. Includes the templates we use, the trap most teams fall into at step four, and the size you actually need (smaller than you think).
EvalsStep-by-stepTemplates
How-to · 12 min · Updated Apr 2026
CURRICULUM

A six-hour AI literacy curriculum for non-technical teams

The exact agenda we run for ops, sales, legal, and finance groups: what these tools actually do in twenty minutes, where they fail, three workflows you'll use on Monday, and the privacy material nobody asks about until lunch. Slides included.
LiteracyTrainingCurriculum
Curriculum · 18 min · Updated Mar 2026
PLAYBOOK

Twelve cost-control levers for production AI agents

Caching, prompt compression, model routing, batch inference, context truncation, semantic dedup, fallback ladders, and five more. Each lever with the typical savings, the trade-off, and the line of code that turns it on. Saves a typical client CA$28k a year.
CostProductionEngineering
Playbook · 14 min · Updated Apr 2026
HOW-TO

Confidence floors: the math, the bug, and the fix

When to set the floor at 0.80, when to set it at 1.00, and the subtle bug that bit us on a production extraction agent for half a week. Worked example with code. The right floor depends on the cost of a wrong answer versus the cost of a human review.
HITLMathDebugging
How-to · 8 min · Updated Apr 2026
PLAYBOOK

A two-week build playbook for AI-aware websites

The day-by-day schedule we use to ship custom websites at a quarter of typical agency cost: where AI compresses the work, where a senior practitioner stays in the loop, and the three places where speed should never be the priority.
WebProcess
Playbook · 10 min · Updated Mar 2026

§ 03Concepts & Definitions

A working glossary.

Short definitions of the terms we actually use in client conversations, written so a non-technical operator can use them in a meeting on the same day. Each entry links to a longer explainer.

DEFINITION

What is applied AI?

Applied AI is the use of language models, retrieval, and agentic workflows to change how specific work actually gets done inside a business — as a production system that measurably saves time or unlocks throughput, not as a demo or a chatbot. The emphasis is on adoption by the people doing the work.

Longer explainer

DEFINITION

What is context engineering?

Context engineering is the deliberate design of what information a model sees at the moment of a request — system prompt, retrieved documents, tool descriptions, history, examples. It accounts for more of output quality than prompt wording does. Most production AI failures are context failures, not prompt failures.

Longer explainer

DEFINITION

What is an eval harness?

An eval harness is the code and data that lets you measure an AI system's quality against a labeled test set, automatically, on every change. It is the closest thing AI engineering has to a unit-test suite. Without one, every change is a guess and every regression is a surprise.

Longer explainer

DEFINITION

What is human-in-the-loop?

Human-in-the-loop (HITL) is a system design where an AI produces a draft and a human reviews, edits, or approves it before the outcome is committed. The trigger is usually a confidence threshold: above the floor, ship; below, route to a person. HITL is how production AI stays both fast and trusted.

Longer explainer

DEFINITION

What is anchor prompting?

Anchor prompting is a pattern where the full instruction set lives in an uploaded reference document, and chat prompts invoke it by name. This stabilizes long agent sessions, reduces context rot, and makes instructions easier to version and update than inline prompts.

Longer explainer

DEFINITION

What is context rot?

Context rot is the degradation of model output quality over a long conversation as the context window fills with irrelevant or contradictory information. Symptoms include rule-following drift, repeated answers, and ignored constraints. The fix is structured truncation, not a bigger window.

Longer explainer

DEFINITION

What is a confidence floor?

A confidence floor is the model-reported probability threshold below which a response is routed to a human reviewer instead of being committed automatically. Typical values are 0.80–0.95. The right floor depends on the cost of a wrong answer relative to the cost of a human review.

Longer explainer

DEFINITION

What is AI literacy?

AI literacy is a working understanding of what current AI tools can and cannot do, how to use them effectively for a specific job, and how to recognize when their output is wrong. It is closer to a research skill than a technical one, and it is the highest-leverage training a non-technical team can do this year.

Longer explainer


§ 04Field Essays

Longer, opinionated, from inside the work.

The essays we write when something repeats often enough across engagements to be worth naming. Roughly two a month. No newsletter funnel, no listicle quotas.

May 2026

The Six-Month Rule

Most AI projects fail at month seven because the question of who pays the cost of the system working was never named at month zero. A short diagnostic for the project brief.
Essay · 9 min
Apr 2026

Evals are the deliverable

The reason we write the test set before the agent. A field guide to building eval harnesses that actually steer the project, plus the three failure modes of eval sets that look correct on paper.
Essay · 12 min
Mar 2026

Adoption is the project

A taxonomy of three failure modes that kill AI rollouts after launch: the pilot purgatory, the leadership-line gap, and the trust deficit. What to instrument to catch each one early.
Essay · 15 min
Mar 2026

The case against discovery phases

Why every fourteen-week discovery we've seen could have been a one-week assessment and a courageous “no.” What clients actually need at the start of a project, and what they're being sold instead.
Essay · 7 min
Feb 2026

Stop calling it a copilot

The word has lost any specific meaning. A more useful three-way taxonomy — concierge, reference, operator — that changes how scope, evals, and pricing land in the same conversation.
Essay · 6 min
Dec 2025

What a real handoff looks like

The five artifacts every AI engagement should produce, the test for whether the handoff actually transferred ownership, and the one question that exposes a fake one in thirty seconds.
Essay · 8 min
Nov 2025

Context is the moat, not the prompt

Why the durable advantage in applied AI is the quality of context you can assemble at the moment of a request — not the cleverness of the wording. Three exercises to find where your context is leaking quality.
Essay · 10 min

§ 05Canada & Compliance

For Canadian readers.

A small section because most of our clients are here. Practical, plain-English writing on Canadian AI policy, data residency, and the questions our procurement teams keep asking.


§ 06FAQ

Questions we keep answering.

Short, direct answers to the questions that come up on the first call with new clients. Each answer is forty to sixty words because that is what survives extraction by an answer engine. Same questions, longer answers, live across the rest of this page.

What is applied AI?

Applied AI is the use of large language models, retrieval systems, and agentic workflows to change how specific work actually gets done inside a business — not as a demo or a chatbot, but as a system in production that measurably saves time, reduces errors, or unlocks new throughput.

How long does a typical AI project take?

Most production-grade applied-AI engagements run four to eight weeks end-to-end. A one-week assessment scopes the work; a two-to-four-week build delivers a working pass; a final pilot and adoption phase hardens the system and trains the team. Projects that run longer usually have an organizational problem, not a technical one.

How much does an AI engagement cost?

Published price bands at Applied AI North range from CA$5,500 for a one-week assessment to CA$72,000 for a full agent build with monitoring and adoption support. Most clients combine two or three engagements over six to nine months, averaging CA$25,000 to CA$85,000 in total. Workshops start at CA$3,500.

What is the difference between an agent and a chatbot?

A chatbot responds to messages. An agent decides what to do, executes tool calls, observes the results, and loops until a task is complete. Agents have memory, planning steps, and side-effects: they send emails, write to databases, or call APIs. Chatbots produce text; agents produce outcomes.

How do you measure AI return on investment?

Measure the workflow the AI replaces, not the AI itself. Track time-per-task before and after, error rates against a labeled ground-truth set, human-review percentage, and adoption rate (daily active users among the intended audience). A 92% accurate system used by the whole team beats a 99% accurate system nobody trusts.

What is context engineering?

Context engineering is the deliberate design of what information an AI model sees at the moment of a request — the system prompt, retrieved documents, tool descriptions, conversation history, and structured examples. It accounts for more of the output quality than prompt wording does. Most production AI failures are context failures, not prompt failures.

What is an eval set?

An eval set is a labeled collection of input-output examples used to measure how well an AI system performs against your real data, before and after every change. For a document-extraction agent, that means 200 to 2,000 historical documents with the correct answers attached. The eval set is the deliverable; the agent is the side-effect of building it.

Are AI projects different for Canadian businesses?

Yes. Canadian businesses must comply with PIPEDA on data handling, watch for the AIDA framework under Bill C-27, and consider data-residency preferences for client data. Practically: pick model providers with Canadian or US-only data-residency options, log consent, and avoid sending personally identifiable information to public LLMs without a data-processing agreement.

What is human-in-the-loop?

Human-in-the-loop (HITL) is a system design where an AI generates a draft or decision but a human reviews, approves, or edits it before the outcome is committed. The trigger is usually a confidence threshold: if the model is above 0.86, ship; below, route to a person. HITL is how production AI stays accurate and trusted.

How do I know if my team is ready to adopt AI?

Three signals: leaders can describe the workflow they want to change in plain English; one named person on the team is willing to own the system after launch; and you can produce twenty real examples of the work the AI will do. If any of these is missing, start with a literacy workshop before a build.

Subscribe

Two pieces a month. No tracking, no funnel.

RSS or email, your call. Plain text. We will never sell your address, and we will not send anything that isn't writing.

or grab the RSS: /feed.xml ↗