QCon London 2026 conference notes

Day 1: 16th March 2026

State of Play: AI Coding Assistants (Birgitta Bockeler)

Context engineering - questions to ask yourself
- What coding conventions should we amplify? Use skills
- What workflows should we build for modernisation initiatives? Use subagents + skills
- What tools should be available in your org? Use CLIs, MCP servers, LSPs
Code migrations are a great use case for gen AI
Claude Code/Gemini CLI etc. can be run in CI (e.g. for code reviews)
There is a need for dev sandboxes in which to run coding agents
Assess risk to use vs not use AI:
- Probability that AI gets something wrong
- Impact if AI gets something wrong
- Detectability if AI get something wrong
After completing the risk assessment consider:
- For which workflow should we use coding agents?
- How much review should we do?
- How long should we run agents w/o human supervision?
Consider lethal trifecta as described by SimonW:
- Access to private data
- Ability to communicate externally
- Exposure to untrusted content (via prompts)
Is it easy to sandbox coding agents? Dev containers?
To increase effectiveness of coding agents, harness engineering is required
- Context engineering
- Architectural constraints (linters, structural tests)
  - Structural tests (e.g. external SDKs can be imported only from files in client/ dir) should fail with an error message that contains context for coding agents
- Garbage collection (refactoring, simplification)
Harness = The collection of items that feed forward (e.g. guides, arch docs) information to the agent and feed back after the agent made changes (e.g. linters, tests) such that the agent can operate in a self-correcting manner with steering from a human
Will harness templates and harnessable topologies become a new abstraction layer in software engineering?

Scaling the unknown: Performance in the AI era (monday.com)

Cost of coding collapsed, cost of running/operating software did not
Everyone expects configurable workflows now
Users have same performance expectations but you no longer have control over the code (if you allow users to vibe code their own solutions on top of your product) => you need a corresponding architecture to ensure perf does not tank
- Core vs playground architecture
  - Core: Small, contained modules that need to be performant and resilient. If someone brings this down, you need to fix it
  - Playground: Freedom for users to create what they want using agents you provide that implement guardrails
Also measurements/metrics need to adapt. When you do not control flows you need more generic measurements that are consistent across very different workflows (e.g. nr of objects returned for a request)
Support can become a nightmare. Users expect you to support them for workflows they have created by themselves

From Pilot to Impact: How AI is Transforming Large-Scale Engineering (ING)

In 2023 ING started using AI assisted software engineering => 10% productivity gains
When coding is fast/cheap downstream steps (test, deploy) in the SDLC become the bottleneck => we need to speed these up as well
They created an AI platform that standardizes building blocks for agent assisted development => reduce variation
Once you have a great AI platform, adoption is the next big challenge. Ideas:
- AI hackathons
- Build a community that helps others
- SRE teams have been trained and are increasing adoption for teams they are working with
- Embed experts/coaches in teams for ~0.5d
Measure the impact of AI, don’t guess
- Multiple metrics from different angles are required
- Check if all/most metrics tell you the same story before jumping to conclusions
At ING measure productivity gain from AI assistance tools (Copilot) is ~10-20%
- Software engineers typically write code ~50% of working hours

Spotify Portal Studio + Claude

Spotify Portal is the centralized frontend for your entire infra
You can develop a new plugin/tool using Claude locally and have read access to the Spotify Portal DB to be able to check locally how your new plugin/tool fits in => great dev + feedback cycle

Maximising an agentic AI ecosystem

Consolidate by domain to limit the amount of context required for each agentic coding session
Aim to increase determinism to increase accuracy of results/outcomes + reduce costs + reduce execution time
Ensure coding agent behaviour is traceable/auditable in order to be able to debug issues when they occur
LLM-as-a-judge can be used to determine if coding agent guardrails are effective
Prompt caching reduces cost but also accuracy => tradeoff

Sandboxing using Unikraft

microVM = VM w/ Firecracker as the VMM (VM monitor)
Unikraft enables creating unikernels that are very efficient (0 CPU & memory usage when not used, spins up in ~10ms)
In PROD use unikernels instead of containers because they are more efficient
Use unikernels to create sandboxes for AI agents
- Drawback: Harder to create than containers?
How can you create a unikernel that is fast to start up:
- As part of CI/CD create VM -> deploy -> start up and fully initialize -> scale to zero -> take a snapshot using the VMM -> resume from snapshot when needed

Context Engineering (Unblocked)

Good context for an agent = everything it needs to do its job and nothing more
3 myths:
- Naive RAG over docs is a context engine.
- If I connect enough MCPs it is good enough.
- A bigger context window is enough.
What a context engine gives you:
- Understands who you are and which information matters for you
- Resolves conflicts between different sources (e.g. docs, Slack, Jira)
- Respects permissions and governance
- Delivers the right context to the right model at the right time

Day 2: 17th March 2026

Mitigating Geopolitical Risks w/ local-first software and atproto

In the current geopolitical climate it is risky to depend on the software produced by a single company or coming from a single country
Approaches to achieve technological sovereignity include:
- Backend services: Move to multi-cloud
- Social media: Move to the AT (authernticated transfer) protocol (e.g. used by Bluesky)
- Collaboration software: Local-first software
Backend services:
- Proposed solution: Commoditisation and standardisation of backend solutions (e.g. hosting services)
- De-facto standards today:
  - S3 API for data stores
  - k8s/containers for service deployment
  - Kafka API for streaming data
  - PostgreSQL client for relational databases
  - Data lakehouse for analytics

Ontology-Driven Observability: Building the e2e knowledge graph @Netflix scale

Approach:
- Create a unified observability data layer = MELT layer (M = metrics, E = events, L = logs, T = traces)
- Use an ontology and RDF triples to record semantics/meaning of data
- Use semantics/meaning of data to determine relationships between different types of telemetry data
- Use the relationships between different types of data to constrain coding agents i.e. reduce the non-determinism as much as possible
Automate the above as much as possible w/ human review required for changes to the ontology
- Use LLMs to improve both the ontology and the process to extend the ontology => build a self-improving loop
Today: AutoSRE
Future:
- Auto root-cause-analysis (RCA)
- Auto remediation
- Self-healing infra

Rewriting all of Spotify’s code base, all the time

Migrations are a great use case for coding agents
Approach:
- For every application that needs to be migrated use an agent + harness to create a branch with changes
- Verify changes using MCP Verify Tool that knows how to build and lint apps
- Push branch to GitHub
- CI pipeline confirms if the changes are valid
- Human review (most of the time)
Slap an API on top and allow users to interact with agent from Slack/GH/Jira
Standardisation of codebases makes auto-migrations possible.
- Need to agree on per-language coding standards/guidelines => Elect and use an Advisory Board for this
Their agent: Honk. Integrated in Slack so you can give it Slack conversations as context (e.g. chat w/ a colleague about a bug => @honk fix this)

Refreshing Stale Code Intelligence

Add to your repo repo-specific docs and constraints

Can Claude Fix Itself? Using LLMs for Incident Response

Claude is useful for:
- Finding correlations between telemetry data (but false positives are common, so double-check everything);
- Summarizing incident Slack channel contents;
- Writing shift handoff docs.
Answer to question: No.

The Rise of the Streamhouse

Apache Fluss = Data lakehouse + streaming support

From Copilots to Orchestrators

AI deployment != adoption != mastery
Tie AI usage to business outcomes to measure its impact
Trains judgment first, skills after
Show people how you are using agents
- Record + share screenshare
Use MCPs for Jira issue mgmt, Confluence page mgmt, creation of notebooks etc.

Day 3: 18th March 2026

Learning Out Loud (Under Near-Existential Pressure)

Behaviour that is appreciated by others typically:
- Learn in public
- Ask questions in public
Learn by playing (e.g. make a toy tool) rather than when under evaluation (which is stressful for most)
- Just start/do it. You will build fluency faster
- Lower the stakes. It will increase the likelihood of learning
Record your steps while learning. It will help you learn. It will also help others learn faster
Be mediocre in public. It will make you more approachable, so others are more likely to join you and then you can learn together

The right 300 tokens are better than 100k noisy ones

Antipattern 1: The stuffed prompt - you put too much into your context
- Fix: Use skills (description is used by coding agent to determine whether or not to lazy load the skill - invest in this!)
Build skills as reusable context artifacts that any user of your software can use
Antipattern 2: Similarity != correctness (RAG might not give you what you want)
- Fix: Ship versioned doc artifacts + rules + skill w/ your software
Antipattern 3: The Goldfish Agent (i.e. forgetting across sessions)
- Write and use your own skill to control memory management
Antipattern 4: Trust vibes w/o verifying
- Fix: Use evals via LLM-as-a-judge pattern
  - LLM generates scenarios and rubrics.
  - You review the generated scenarios and rubrics.
  - LLM judges against your approved rubric.

Beyond Benchmarks: How Evaluations Ensure Safety at Scale in LLM Applications

When developing LLM-based applications use pre- and post- deployment evaluations:
- Pre-deployment (i.e. during development):
  - Automated evaluation:
    - Verifiable evaluations (i.e. rule-based - e.g. regexes)
    - Open ended evaluations (i.e. LLM based using evals)
      - Iteratively improve prompt (evaluate, tweak, evaluate, tweak, …)
  - Human evaluations
  - Read teaming (using LLMs or humans)
- Post-deployment (i.e. in PROD):
  - Guardrails
  - Canary release
  - A/B test
  - Monitoring:
    - Online evaluation (using LLMs, same prompt like open ended evaluations)
    - Audit

The Reinvention of the Dev Team

8-step ladder of AI adoption laid out by SteveY when he introduced Gas Town
Coding agents are very good at migrations, refactoring and improving test coverage
The ratio of PMs to devs might change in the future
- 1 PM to 2 devs
- 2 PMs to 1 dev
- …
The PM might vibe code to test ideas and agree w/ users what should be built before committing to an implementation by devs
Forward Deployed Engineers (FDEs) or Product Engineers might become increasingly popular roles
Dev teams might be smaller in the future and 6 months roadmaps might no longer be useful due to the ability to ship fast and change course quickly
Having a Platform and an SRE team is more important than it used to be:
- Platform team: Ensure developers are not hindered/blocked in any way now that they can write code cheaply
- SRE team: Performance and resilience will be more challenging if we produce and ship code faster than ever before
Write down your SLIs, SLOs and error budget policies
Progressive delivery is key
Remove all manual steps from the release process (e.g. manual testing, manual releases)
Become a “broken comb” specialist - breadth of knowledge in many areas with deep knowledge in a few areas
Ensure on-call rotas remain sustainable
Remove heavy planning processes, shorten 2 week sprints and make some code reviews optional
- Instead of reviewing code review designs/ideas instead
  - Use spec driven development (SDD)

Teaching Engineers, Trusting AI: How Education Enabled Autonomous Code Review

AI adoption != AI tool access
Why is AI adoption hard:
- Engineers are skeptical
- People are reluctant to be flexible with regards to code reviews
- AI challenges accountability
- Trust is hard to build, fast to lose
What helped at Duolingo:
- Structured lab-style workshops
- Build observability dashboards to track AI usage
- Live office hours for AI support
- Encouraging sharing learnings (Slack channel + bi-weekly in-person knowledge sharing session)
Invest in AI vendor relationships => access to beta releases + info sharing about best practices
Duolingo introduced an AI auto-approval bot for low-risk changes
- Risk of changes is measured using Meta’s Diff Risk Score (using an LLM)
- Add a lot of deterministic guardrails for what cannot be auto-approved
- Ensure code owners get a notification once a day with changes to their code
- Get feedback from users about misclassifications => add to training/eval data set => update Diff Risk score prompt accordingly

Context is Code

The SDLC becomes the CDLC (Context Development LifeCycle)
CDLC:
1. Generate
2. Evaluate
3. Distribute
4. Observe
Generate:
- Prompt ~= humans as context engines
- Get context from code and docs using Context7/Ref/Context Hub
- Use context connectors like MCP Registry or Unblocked
- Use Spec Driven Development (SDD) using Kiro/SpecKit
Evaluate:
- Agent benchmarks (e.g. SWE-bench)
- Benchmark tools (e.g. harbor)
- Linting skills (e.g. using tessl)
- Skill evals (e.g. using Grammarly)
- Task evals ~= unit tests for your skills
- Project eval ~= e2e tests for your skills
- Non-deterministic != untestable (think error budgets)
- Eval levels: skill review/linting, skill eval, task eval, security scan, project eval, CI/CD pipeline
Distribute:
- Version and distribute contex for your apps
- Package management for agents: https://microsoft.github.io/apm/
- Marketplaces and registries: https://claude.com/platform/marketplace, https://tessl.io/registry
- Skill security scanning: https://labs.snyk.io/experiments/skill-scan/
- Context provenance: https://github.com/aai-institute/AI-SBOM (using AISBOM), https://usegitai.com/ (for tracking AI generated code)
- Context ledgers: https://github.com/steveyegge/beads
Observe:
- Self-improving AGENTS.md/CLAUDE.md: Ask coding agent to optimize AGENTS.md/CLAUDE.md by looking at logs of past sessions
- Standardized Agent logs: https://cognition.ai/blog/agent-trace
- Storing coding sessions w/ git commits: https://entire.io/
- Context from production logs: https://www.hud.io/
- Agent sandboxing: https://ast.georgebuilds.dev/
- Context filtering to prevent prompt injection via CLAUDE.md/AGENTS.md file: https://github.com/jedi4ever/context-filter
- Harness engineering to give your coding agent a full observability stack (i.e. it gets feedback/context from PROD after shipping a change): https://openai.com/index/harness-engineering/
- Dark factories: Specs in -> Software out, no human coding, no human reviews: https://darkfactory.dev/blog/what-is-dark-factory-software-development
Building a self-improving context loop (i.e. a context flywheel):
- Context maturity levels: https://medium.com/workmatters/a-maturity-model-for-context-engineering-a36a6f856f33
- Context governance: https://packmind.com/
- Context practices maturity levels: https://tessl.io/blog/context-maturity-for-ai-coding-teams/

March 17, 2026 · qcon, conference