PyData London 2026 conference notes

Day 1: 5th June 2026

Making Databases LLM-Ready: Building Production Semantic Layers with Semantido

Challenges for Text-to-SQL:
- Making application and business specific context
- Understanding user intent
- Limits of what can be expressed using SQL
- Tokenomics (i.e. tokens cost)
Failure modes for showcased finance app:
- Bridge fan-out (e.g. choosing the wrong bridge table)
- Sign conventions (e.g. amount is always positive, sign depends on transaction type)
- Balance ambiguity: In the presence of ambiguity how will an agent make the right choice
Tools used: SQLAlchemy, alembic (DB migrations), FastAPI, pydantic AI
Library built for this: semantido
- Workflow used by analytics agent is deterministic
- Steps in workflow are non-deterministic executed using sub-agents
- Business knowledge needs to be made available to sub-agents by including relevant descriptions in annotations
neon.new can be used to get a DB for a few hours for free for testing

GPU Algorithm Authoring w/ CUDA Tile (cuTile)

CUDA programming can be done, from coarse to fine grained granularity, at the grid vs tile (cuTile) vs thread level
- grid: system splits work into blocks, divides data into tiles & maps both onto threads
- tile: user splits work into blocks & divides data into tiles; system maps both onto threads
- thread: user splits work into blocks, divides data into tiles & maps both onto threads
Nsight suite can be used for debugging
- System - e2e performance
- Compute - kernel level debugging
cupy or dlpack can be used to move data from CPU to GPU
TileGym = collection of examples and tutorials for tile-based GPU programming: https://github.com/NVIDIA/TileGym
CUDA programming tutorials: https://github.com/NVIDIA/accelerated-computing-hub/tree/main/tutorials

Pydantic Monty & Logfire

Codemode is similar to what cloudflare offer, namely the MCP that exposes only two tools, search and execute
- Enables programmatic tool calling
pydantic Monty = minimal Python interpreter to run code written by AI (agent)
- Rust implementation, leverages ruff to parse Python code -> AST
Several people at the conference are using ghostty as a terminal emulator. Finally try it

Building a browser agent, from scratch

Agent = LM + tools + a loop
Five steps that most agents perform:
1. Observe: Capture the current state of the world
2. Decide: Pick one action that moves the task forward
3. Act: Invoke the tool that performs that action
4. Repeat: …or stop, if the task is complete
5. Read result: Add the outcome back to the next observation
Architecture:
- Planner: No tools, only strategy
- Executor: All required tools, only executes, no planning
Using pydantic for passing data between planner and executor
When the task is to read, give the agent markdown, not html

Creating your own evals

Tool: lighteval
Exercises: https://github.com/Cheukting/lighteval-exercises

Day 2: 6th June 2026

The Community is the Boat

Keep a list of people you would like to recommend for roles
A half of hour every month can change the course of someone’s career

From Noisy Sensors to Events

Two approaches that can be used for state detection in noisy data include:
- Kalman Filters:
  - Continuous data
  - Does not require training a model
- Hidden Markov Models:
  - Discrete data
  - Requires training a model
For deduplication discuss w/ business owners what to do
Consider the approach ARIMA as well

Columnar Thinking

Use pyarrow as a data representation format because the ecosystem for serializing and deserializing from arrow is powerful and translation might not be needed
Benefit from features of modern CPUs:
- Cache locality
- SIMD
- Consider memory bandwidth
The idea:
- Layout: Arrangement of data matters a lot
- Locality: Keep data as close as possible
- Measure: Profile before you optimize
The latency gap:
- L1: ~1ns
- L2: ~4ns
- L3: ~12ns
- Main memory: ~100ns
- NVMe SSD: ~25us
- Network (DC): ~500us
Data in columns compresses well, data in rows typically does not. Reason: data in the same column has the same type; data in the same row and across different columns might not.
Use lazy evaluation whenever possible. Prefer expressions over loops
Use streaming to reduce the peak memory usage
Use Parquet on disk and Arrow in-flight

JupyterLite

JupyterLite = like JupyterLab but in your browser, Jupyter server + kernel compiled to WebAssembly and run in the browser
Advantages:
- Easy to deploy
- Scales well
Most popular Python kernels:
- pyodide
- xeus-python, compiled
Compilation to WebAssembly performed using emscripten
Tutorial for deploying to GitHub pages available online

Building reasonable software

What is Pythonic?
- atomic
- composable
- simple
Speaker is author of prefect: https://www.prefect.io/
FastMCP 3.0 - atomic, composable, simple
- Leverages pydantic Monty
CodeMode (from cloudflare)
MCP Apps (shipped in Jan26)
Directed agentic graph
- Nodes are outcomes
- Make graph + state durable => resumable
- Prefab (is the name of the product?)
/goal is a more refined version of a Ralph loop

Documenting your open source projects for machines

All examples are about Python projects, but concepts apply to any project
Agents typically open __init__.py first. Use it to guide the agent
- __all__ tells it what is your public API
- imports show where everything is defined

Add to your __init__.py file guidance to nudge the agent in the right direction. Example:

"""parquetviewer: A library for viewing Parquet files

Docs:     https://parquetviewer.me/llms.txt
Source:   https://github.com/parquet/viewer
Examples: see the bundled docs/samples directory
"""

Use progressive disclosure as much as possible. Reveal more information only when needed
- Ties in with Pocock’s concept of deep modules
Additionally publish your docs in md format for LLMs; see https://llmstxt.org
- Can use sphinx-llm for this
Ship your docs w/ your package such that agents get the docs version matching the project version
- Reference the embedded docs in __init__.py
Framework for what to include in docs: Diataxis (diataxis.fr)
NVIDIA verified skills: github.com/NVIDIA/skills
Use the concept of deep modules for docs as well, not only code
Make your error messages agent-ready, namely include a reference to docs on how to fix the error
Evaluate the quality of your docs by QA-ing it using an agent, namely ask an agent to do something and follow what it does, find inefficiencies in terms of how it finds relevant information, improve, iterate

Day 3: 7th June 2026

Lightning talks

Data visualisation + simple model trainer: probaviz.streamlit.app
LiteLLM supply chain attack
Library for property based testing: hypothesis

From SQL to Python: Building Data Context for Agents and People

To make it possible for your agents to effectively access data from your data warehouse you can use datachain to:
1. Make your data (i.e. databases and tables) discoverable using pydantic models;
2. Type your data using pydantic models and schemas;
3. Capture lineage data automatically when writing/persisting data;
4. Make the data available to agents via skills and/or MCP servers.
Datachain: https://github.com/datachain-ai/datachain

Querying the queries: SQL metaprogramming in Python

Use sqloxing to parse SQL queries in Python => AST
Use SQL meta-programming to:
1. Lint SQL queries;
2. Rewrite SQL queries to improve them (SQL query => AST => improved AST => improved SQL query).
Linting rules that can be supported:
- Limit SQL query depth to N
- Auto-alias aggregates when an alias was not defined
- Wrap denominators in NULLIF
- Detect unused CTEs (create table expressions)

Making tech boring to keep data exciting

Team from climate policy radar
Data pipelines should be idempotent as much as possible
Have e2e tests run against a subset of the PROD data to validate data pipeline
Publish a new version of a dataset only after validating it using contract tests
- To publish a new dataset: First back up current version (to enable quick and easy rollbacks) then promote new version
Shared views are what make a team successful

LLMs and AI agents demystified

Speaker: Martin O’Reilly, Director of Research Engineering at the Alan Turing AI Institute
LLMs predict the next word very, very well
- They are also very big
- They are an example of a Foundation Model
Word embeddings: Numerical representation of words in multi-dimensional space such that:
- Words w/ similar meanings cluster together
- Similar modifications in work meaning result in similar changes to embeddings
LLM are trained on Internet data:
- Key idea: Hide some words from the model to create text completion examples w/ known answers (masking)
Technique that makes LLMs perform very well: attention
- Compute the importance of the current word relative to preceding words
LLMs are deep neural networks
- Key innovation: Transformers
  - When generating text the currently considered word sequence is fully provided as input to the LLM => input sequence provided to the LLM grows over time
- Fine-tuning a model: Adjust parameters of a pre-trained model in order to make it better suited for a given task
  - Fine-tuning from examples that are human-generated (and therefore expensive to create)
  - Preference learning by using LLMs to generate responses and using humans to rate answers
- Improving reasoning of chatbots:
  - Chain-of-thought prompting
    - Zero-shot
    - Few-shot
Coding fine-tuning:
- Supervised fine-tuning on curated high-quality human-generated examples
- Preference learning from human ranking of LLM-generated examples
- Reinforcement learning from LLM examples that have been automatically verified as correct (e.g. code compiles)
LLMs to AI agents:
- LLMs just generate text
- Agents interact w/ the world
Modern agent harness (think, act, observe):
- Use LLMs fine-tuned for reasoning, code generation and tool use
- Manage context (inject skills and tools definitions, context compaction, memory)
- Execute local or remote tools (and send their output to LLMs)
- Access local files
- Delegate sub-tasks to dedicated sub-agents

The Human-in-the-Loop is tired

Update your AGENTS.md from review comments. Or create corresponding skills
Work w/ brain-friendly outputs. Respect your cognitive limit (~4-7 chunks)
Use short paragraphs when interacting w/ agents (no more than 2-3 sentences per paragraph)
Protect the brain and body:
- 90-min breaks
- Talk to a human
- Four best hours then stop
- Regularly look away from your screen

From Chat-with-PDF to Quiz-Master: Live-Grading RAG w/ LLM-as-a-judge in Python

In a traditional chat-with-PDF solution that uses RAG an LLM is used to produce answers to questions
- Flip it: Use LLMs to evaluate understanding, not to generate answers
Author created “The Knowledge Arena”:
- Use LLMs to generate MCQs from PDFs
- Use LLMs to evaluate free-form answers of humans to questions about PDFs
Pipeline:
1. PDF of ~130 pages
2. Use docling to parse and section PDF document chunks
3. DeepEval: two-stage synthesis
4. Write JSON file containing questions and evidence
5. Marimo to serve questions and to grade answers
docling reads the structure of the document, not only text
- Chunk by section, not character count
Embed evidence at question generation time
Use a Jinja template for the prompt used to do the grading
Repo: https://github.com/Cadarn/PyData-AI-Generated-Quiz

The future of notebooks in a Claude Code world

Workflow:
- Claude interacts w/ an MCP server
- MCP server writes expressions to a catalog
- The buckaroo server reads expressions from the catalog, evaluates them and caches results as needed
- User interacts via a browser w/ catalog server and buckaroo server
Leverages xorq and ibis. Apache arrow is used for representing data
Use non-deterministic LLMs to generate functions for you. A human validates the generated function such that it can then be safely reused in other workflows.

June 5, 2026 · pydata, conference