PyData London 2026 conference notes
Day 1: 5th June 2026
Making Databases LLM-Ready: Building Production Semantic Layers with Semantido
- Challenges for Text-to-SQL:
- Making application and business specific context
- Understanding user intent
- Limits of what can be expressed using SQL
- Tokenomics (i.e. tokens cost)
- Failure modes for showcased finance app:
- Bridge fan-out (e.g. choosing the wrong bridge table)
- Sign conventions (e.g. amount is always positive, sign depends on transaction type)
- Balance ambiguity: In the presence of ambiguity how will an agent make the right choice
- Tools used: SQLAlchemy, alembic (DB migrations), FastAPI, pydantic AI
- Library built for this: semantido
- Workflow used by analytics agent is deterministic
- Steps in workflow are non-deterministic executed using sub-agents
- Business knowledge needs to be made available to sub-agents by including relevant descriptions in annotations
- neon.new can be used to get a DB for a few hours for free for testing
GPU Algorithm Authoring w/ CUDA Tile (cuTile)
- CUDA programming can be done, from coarse to fine grained granularity, at the grid vs tile (cuTile) vs thread level
- grid: system splits work into blocks, divides data into tiles & maps both onto threads
- tile: user splits work into blocks & divides data into tiles; system maps both onto threads
- thread: user splits work into blocks, divides data into tiles & maps both onto threads
- Nsight suite can be used for debugging
- System - e2e performance
- Compute - kernel level debugging
- cupy or dlpack can be used to move data from CPU to GPU
- TileGym = collection of examples and tutorials for tile-based GPU programming: https://github.com/NVIDIA/TileGym
- CUDA programming tutorials: https://github.com/NVIDIA/accelerated-computing-hub/tree/main/tutorials
Pydantic Monty & Logfire
- Codemode is similar to what cloudflare offer, namely the MCP that exposes only two tools, search and execute
- Enables programmatic tool calling
- pydantic Monty = minimal Python interpreter to run code written by AI (agent)
- Rust implementation, leverages ruff to parse Python code -> AST
- Several people at the conference are using ghostty as a terminal emulator. Finally try it
Building a browser agent, from scratch
- Agent = LM + tools + a loop
- Five steps that most agents perform:
- Observe: Capture the current state of the world
- Decide: Pick one action that moves the task forward
- Act: Invoke the tool that performs that action
- Repeat: …or stop, if the task is complete
- Read result: Add the outcome back to the next observation
- Architecture:
- Planner: No tools, only strategy
- Executor: All required tools, only executes, no planning
- Using pydantic for passing data between planner and executor
- When the task is to read, give the agent markdown, not html
Creating your own evals
- Tool: lighteval
- Exercises: https://github.com/Cheukting/lighteval-exercises
Day 2: 6th June 2026
The Community is the Boat
- Keep a list of people you would like to recommend for roles
- A half of hour every month can change the course of someone’s career
From Noisy Sensors to Events
- Two approaches that can be used for state detection in noisy data include:
- Kalman Filters:
- Continuous data
- Does not require training a model
- Hidden Markov Models:
- Discrete data
- Requires training a model
- Kalman Filters:
- For deduplication discuss w/ business owners what to do
- Consider the approach ARIMA as well
Columnar Thinking
- Use pyarrow as a data representation format because the ecosystem for serializing and deserializing from arrow is powerful and translation might not be needed
- Benefit from features of modern CPUs:
- Cache locality
- SIMD
- Consider memory bandwidth
- The idea:
- Layout: Arrangement of data matters a lot
- Locality: Keep data as close as possible
- Measure: Profile before you optimize
- The latency gap:
- L1: ~1ns
- L2: ~4ns
- L3: ~12ns
- Main memory: ~100ns
- NVMe SSD: ~25us
- Network (DC): ~500us
- Data in columns compresses well, data in rows typically does not. Reason: data in the same column has the same type; data in the same row and across different columns might not.
- Use lazy evaluation whenever possible. Prefer expressions over loops
- Use streaming to reduce the peak memory usage
- Use Parquet on disk and Arrow in-flight
JupyterLite
- JupyterLite = like JupyterLab but in your browser, Jupyter server + kernel compiled to WebAssembly and run in the browser
- Advantages:
- Easy to deploy
- Scales well
- Most popular Python kernels:
- pyodide
- xeus-python, compiled
- Compilation to WebAssembly performed using emscripten
- Tutorial for deploying to GitHub pages available online
Building reasonable software
- What is Pythonic?
- atomic
- composable
- simple
- Speaker is author of prefect: https://www.prefect.io/
- FastMCP 3.0 - atomic, composable, simple
- Leverages pydantic Monty
- CodeMode (from cloudflare)
- MCP Apps (shipped in Jan26)
- Directed agentic graph
- Nodes are outcomes
- Make graph + state durable => resumable
- Prefab (is the name of the product?)
- /goal is a more refined version of a Ralph loop
Documenting your open source projects for machines
- All examples are about Python projects, but concepts apply to any project
- Agents typically open
__init__.pyfirst. Use it to guide the agent__all__tells it what is your public API- imports show where everything is defined
Add to your
__init__.pyfile guidance to nudge the agent in the right direction. Example:"""parquetviewer: A library for viewing Parquet files Docs: https://parquetviewer.me/llms.txt Source: https://github.com/parquet/viewer Examples: see the bundled docs/samples directory """- Use progressive disclosure as much as possible. Reveal more information only when needed
- Ties in with Pocock’s concept of deep modules
- Additionally publish your docs in md format for LLMs; see https://llmstxt.org
- Can use sphinx-llm for this
- Ship your docs w/ your package such that agents get the docs version matching the project version
- Reference the embedded docs in
__init__.py
- Reference the embedded docs in
- Framework for what to include in docs: Diataxis (diataxis.fr)
- NVIDIA verified skills: github.com/NVIDIA/skills
- Use the concept of deep modules for docs as well, not only code
- Make your error messages agent-ready, namely include a reference to docs on how to fix the error
- Evaluate the quality of your docs by QA-ing it using an agent, namely ask an agent to do something and follow what it does, find inefficiencies in terms of how it finds relevant information, improve, iterate
Day 3: 7th June 2026
Lightning talks
- Data visualisation + simple model trainer: probaviz.streamlit.app
- LiteLLM supply chain attack
- Library for property based testing: hypothesis
From SQL to Python: Building Data Context for Agents and People
- To make it possible for your agents to effectively access data from your data warehouse you can use datachain to:
- Make your data (i.e. databases and tables) discoverable using pydantic models;
- Type your data using pydantic models and schemas;
- Capture lineage data automatically when writing/persisting data;
- Make the data available to agents via skills and/or MCP servers.
- Datachain: https://github.com/datachain-ai/datachain
Querying the queries: SQL metaprogramming in Python
- Use sqloxing to parse SQL queries in Python => AST
- Use SQL meta-programming to:
- Lint SQL queries;
- Rewrite SQL queries to improve them (SQL query => AST => improved AST => improved SQL query).
- Linting rules that can be supported:
- Limit SQL query depth to N
- Auto-alias aggregates when an alias was not defined
- Wrap denominators in NULLIF
- Detect unused CTEs (create table expressions)
Making tech boring to keep data exciting
- Team from climate policy radar
- Data pipelines should be idempotent as much as possible
- Have e2e tests run against a subset of the PROD data to validate data pipeline
- Publish a new version of a dataset only after validating it using contract tests
- To publish a new dataset: First back up current version (to enable quick and easy rollbacks) then promote new version
- Shared views are what make a team successful
LLMs and AI agents demystified
- Speaker: Martin O’Reilly, Director of Research Engineering at the Alan Turing AI Institute
- LLMs predict the next word very, very well
- They are also very big
- They are an example of a Foundation Model
- Word embeddings: Numerical representation of words in multi-dimensional space such that:
- Words w/ similar meanings cluster together
- Similar modifications in work meaning result in similar changes to embeddings
- LLM are trained on Internet data:
- Key idea: Hide some words from the model to create text completion examples w/ known answers (masking)
- Technique that makes LLMs perform very well: attention
- Compute the importance of the current word relative to preceding words
- LLMs are deep neural networks
- Key innovation: Transformers
- When generating text the currently considered word sequence is fully provided as input to the LLM => input sequence provided to the LLM grows over time
- Fine-tuning a model: Adjust parameters of a pre-trained model in order to make it better suited for a given task
- Fine-tuning from examples that are human-generated (and therefore expensive to create)
- Preference learning by using LLMs to generate responses and using humans to rate answers
- Improving reasoning of chatbots:
- Chain-of-thought prompting
- Zero-shot
- Few-shot
- Chain-of-thought prompting
- Key innovation: Transformers
- Coding fine-tuning:
- Supervised fine-tuning on curated high-quality human-generated examples
- Preference learning from human ranking of LLM-generated examples
- Reinforcement learning from LLM examples that have been automatically verified as correct (e.g. code compiles)
- LLMs to AI agents:
- LLMs just generate text
- Agents interact w/ the world
- Modern agent harness (think, act, observe):
- Use LLMs fine-tuned for reasoning, code generation and tool use
- Manage context (inject skills and tools definitions, context compaction, memory)
- Execute local or remote tools (and send their output to LLMs)
- Access local files
- Delegate sub-tasks to dedicated sub-agents
The Human-in-the-Loop is tired
- Update your AGENTS.md from review comments. Or create corresponding skills
- Work w/ brain-friendly outputs. Respect your cognitive limit (~4-7 chunks)
- Use short paragraphs when interacting w/ agents (no more than 2-3 sentences per paragraph)
- Protect the brain and body:
- 90-min breaks
- Talk to a human
- Four best hours then stop
- Regularly look away from your screen
From Chat-with-PDF to Quiz-Master: Live-Grading RAG w/ LLM-as-a-judge in Python
- In a traditional chat-with-PDF solution that uses RAG an LLM is used to produce answers to questions
- Flip it: Use LLMs to evaluate understanding, not to generate answers
- Author created “The Knowledge Arena”:
- Use LLMs to generate MCQs from PDFs
- Use LLMs to evaluate free-form answers of humans to questions about PDFs
- Pipeline:
- PDF of ~130 pages
- Use docling to parse and section PDF document chunks
- DeepEval: two-stage synthesis
- Write JSON file containing questions and evidence
- Marimo to serve questions and to grade answers
- docling reads the structure of the document, not only text
- Chunk by section, not character count
- Embed evidence at question generation time
- Use a Jinja template for the prompt used to do the grading
- Repo: https://github.com/Cadarn/PyData-AI-Generated-Quiz
The future of notebooks in a Claude Code world
- Workflow:
- Claude interacts w/ an MCP server
- MCP server writes expressions to a catalog
- The buckaroo server reads expressions from the catalog, evaluates them and caches results as needed
- User interacts via a browser w/ catalog server and buckaroo server
- Leverages xorq and ibis. Apache arrow is used for representing data
- Use non-deterministic LLMs to generate functions for you. A human validates the generated function such that it can then be safely reused in other workflows.