Structural tests (e.g. external SDKs can be imported only from files in client/ dir) should fail with an error message that contains context for coding agents
Garbage collection (refactoring, simplification)
Harness = The collection of items that feed forward (e.g. guides, arch docs) information to the agent and feed back after the agent made changes (e.g. linters, tests) such that the agent can operate in a self-correcting manner with steering from a human
Will harness templates and harnessable topologies become a new abstraction layer in software engineering?
Scaling the unknown: Performance in the AI era (monday.com)
Cost of coding collapsed, cost of running/operating software did not
Everyone expects configurable workflows now
Users have same performance expectations but you no longer have control over the code (if you allow users to vibe code their own solutions on top of your product) => you need a corresponding architecture to ensure perf does not tank
Core vs playground architecture
Core: Small, contained modules that need to be performant and resilient. If someone brings this down, you need to fix it
Playground: Freedom for users to create what they want using agents you provide that implement guardrails
Also measurements/metrics need to adapt. When you do not control flows you need more generic measurements that are consistent across very different workflows (e.g. nr of objects returned for a request)
Support can become a nightmare. Users expect you to support them for workflows they have created by themselves
From Pilot to Impact: How AI is Transforming Large-Scale Engineering (ING)
In 2023 ING started using AI assisted software engineering => 10% productivity gains
When coding is fast/cheap downstream steps (test, deploy) in the SDLC become the bottleneck => we need to speed these up as well
They created an AI platform that standardizes building blocks for agent assisted development => reduce variation
Once you have a great AI platform, adoption is the next big challenge. Ideas:
AI hackathons
Build a community that helps others
SRE teams have been trained and are increasing adoption for teams they are working with
Embed experts/coaches in teams for ~0.5d
Measure the impact of AI, don’t guess
Multiple metrics from different angles are required
Check if all/most metrics tell you the same story before jumping to conclusions
At ING measure productivity gain from AI assistance tools (Copilot) is ~10-20%
Software engineers typically write code ~50% of working hours
Spotify Portal Studio + Claude
Spotify Portal is the centralized frontend for your entire infra
You can develop a new plugin/tool using Claude locally and have read access to the Spotify Portal DB to be able to check locally how your new plugin/tool fits in => great dev + feedback cycle
Maximising an agentic AI ecosystem
Consolidate by domain to limit the amount of context required for each agentic coding session
Aim to increase determinism to increase accuracy of results/outcomes + reduce costs + reduce execution time
Ensure coding agent behaviour is traceable/auditable in order to be able to debug issues when they occur
LLM-as-a-judge can be used to determine if coding agent guardrails are effective
Prompt caching reduces cost but also accuracy => tradeoff
Sandboxing using Unikraft
microVM = VM w/ Firecracker as the VMM (VM monitor)
Unikraft enables creating unikernels that are very efficient (0 CPU & memory usage when not used, spins up in ~10ms)
In PROD use unikernels instead of containers because they are more efficient
Use unikernels to create sandboxes for AI agents
Drawback: Harder to create than containers?
How can you create a unikernel that is fast to start up:
As part of CI/CD create VM -> deploy -> start up and fully initialize -> scale to zero -> take a snapshot using the VMM -> resume from snapshot when needed
Context Engineering (Unblocked)
Good context for an agent = everything it needs to do its job and nothing more
3 myths:
Naive RAG over docs is a context engine.
If I connect enough MCPs it is good enough.
A bigger context window is enough.
What a context engine gives you:
Understands who you are and which information matters for you
Resolves conflicts between different sources (e.g. docs, Slack, Jira)
Respects permissions and governance
Delivers the right context to the right model at the right time
Day 2: 17th March 2026
Mitigating Geopolitical Risks w/ local-first software and atproto
In the current geopolitical climate it is risky to depend on the software produced by a single company or coming from a single country
Approaches to achieve technological sovereignity include:
Backend services: Move to multi-cloud
Social media: Move to the AT (authernticated transfer) protocol (e.g. used by Bluesky)
Collaboration software: Local-first software
Backend services:
Proposed solution: Commoditisation and standardisation of backend solutions (e.g. hosting services)
De-facto standards today:
S3 API for data stores
k8s/containers for service deployment
Kafka API for streaming data
PostgreSQL client for relational databases
Data lakehouse for analytics
Ontology-Driven Observability: Building the e2e knowledge graph @Netflix scale
Approach:
Create a unified observability data layer = MELT layer (M = metrics, E = events, L = logs, T = traces)
Use an ontology and RDF triples to record semantics/meaning of data
Use semantics/meaning of data to determine relationships between different types of telemetry data
Use the relationships between different types of data to constrain coding agents i.e. reduce the non-determinism as much as possible
Automate the above as much as possible w/ human review required for changes to the ontology
Use LLMs to improve both the ontology and the process to extend the ontology => build a self-improving loop
Today: AutoSRE
Future:
Auto root-cause-analysis (RCA)
Auto remediation
Self-healing infra
Rewriting all of Spotify’s code base, all the time
Migrations are a great use case for coding agents
Approach:
For every application that needs to be migrated use an agent + harness to create a branch with changes
Verify changes using MCP Verify Tool that knows how to build and lint apps
Push branch to GitHub
CI pipeline confirms if the changes are valid
Human review (most of the time)
Slap an API on top and allow users to interact with agent from Slack/GH/Jira
Standardisation of codebases makes auto-migrations possible.
Need to agree on per-language coding standards/guidelines => Elect and use an Advisory Board for this
Their agent: Honk. Integrated in Slack so you can give it Slack conversations as context (e.g. chat w/ a colleague about a bug => @honk fix this)
Refreshing Stale Code Intelligence
Add to your repo repo-specific docs and constraints
Can Claude Fix Itself? Using LLMs for Incident Response
Claude is useful for:
Finding correlations between telemetry data (but false positives are common, so double-check everything);
Summarizing incident Slack channel contents;
Writing shift handoff docs.
Answer to question: No.
The Rise of the Streamhouse
Apache Fluss = Data lakehouse + streaming support
From Copilots to Orchestrators
AI deployment != adoption != mastery
Tie AI usage to business outcomes to measure its impact
Trains judgment first, skills after
Show people how you are using agents
Record + share screenshare
Use MCPs for Jira issue mgmt, Confluence page mgmt, creation of notebooks etc.
Day 3: 18th March 2026
Learning Out Loud (Under Near-Existential Pressure)
Behaviour that is appreciated by others typically:
Learn in public
Ask questions in public
Learn by playing (e.g. make a toy tool) rather than when under evaluation (which is stressful for most)
Just start/do it. You will build fluency faster
Lower the stakes. It will increase the likelihood of learning
Record your steps while learning. It will help you learn. It will also help others learn faster
Be mediocre in public. It will make you more approachable, so others are more likely to join you and then you can learn together
The right 300 tokens are better than 100k noisy ones
Antipattern 1: The stuffed prompt - you put too much into your context
Fix: Use skills (description is used by coding agent to determine whether or not to lazy load the skill - invest in this!)
Build skills as reusable context artifacts that any user of your software can use
Antipattern 2: Similarity != correctness (RAG might not give you what you want)
Context filtering to prevent prompt injection via CLAUDE.md/AGENTS.md file: https://github.com/jedi4ever/context-filter
Harness engineering to give your coding agent a full observability stack (i.e. it gets feedback/context from PROD after shipping a change): https://openai.com/index/harness-engineering/
Dark factories: Specs in -> Software out, no human coding, no human reviews: https://darkfactory.dev/blog/what-is-dark-factory-software-development
Building a self-improving context loop (i.e. a context flywheel):