Eight years of wanting, three months of building with AI

Apr 5, 2026· Essay

For eight years, I’ve wanted a high-quality set of devtools for working with SQLite. Given how important SQLite is to the industry¹, I’ve long been puzzled that no one has invested in building a really good developer experience for it².

A couple of weeks ago, after ~250 hours of effort over three months³ on evenings, weekends, and vacation days, I finally released syntaqlite (GitHub), fulfilling this long-held wish. And I believe the main reason this happened was because of AI coding agents⁴.

Of course, there’s no shortage of posts claiming that AI one-shot their project or pushing back and declaring that AI is all slop. I’m going to take a very different approach and, instead, systematically break down my experience building syntaqlite with AI, both where it helped and where it was detrimental.

I’ll do this while contextualizing the project and my background so you can independently assess how generalizable this experience was. And whenever I make a claim, I’ll try to back it up with evidence from my project journal, coding transcripts, or commit history⁵.

Why I wanted it

In my work on Perfetto, I maintain a SQLite-based language for querying performance traces called PerfettoSQL. It’s basically the same as SQLite but with a few extensions to make the trace querying experience better. There are ~100K lines of PerfettoSQL internally in Google and it’s used by a wide range of teams.

Having a language which gets traction means your users also start expecting things like formatters, linters, and editor extensions. I’d hoped that we could adapt some SQLite tools from open source but the more I looked into it, the more disappointed I was. What I found either wasn’t reliable enough, fast enough⁶, or flexible enough to adapt to PerfettoSQL. There was clearly an opportunity to build something from scratch, but it was never the “most important thing we could work on”. We’ve been reluctantly making do with the tools out there but always wishing for better.

On the other hand, there was the option to do something in my spare time. I had built lots of open source projects in my teens⁷ but this had faded away during university when I felt that I just didn’t have the motivation anymore. Being a maintainer is much more than just “throwing the code out there” and seeing what happens. It’s triaging bugs, investigating crashes, writing documentation, building a community, and, most importantly, having a direction for the project.

But the itch of open source (specifically freedom to work on what I wanted while helping others) had never gone away. The SQLite devtools project was eternally in my mind as “something I’d like to work on”. But there was another reason why I kept putting it off: it sits at the intersection of being both hard and tedious.

What makes it hard and tedious

If I was going to invest my personal time working on this project, I didn’t want to build something that only helped Perfetto: I wanted to make it work for any SQLite user out there⁸. And this means parsing SQL exactly like SQLite.

The heart of any language-oriented devtool is the parser. This is responsible for turning the source code into a “parse tree” which acts as the central data structure anything else is built on top of. If your parser isn’t accurate, then your formatters and linters will inevitably inherit those inaccuracies; many of the tools I found suffered from having parsers which approximated the SQLite language rather than representing it precisely.

Unfortunately, unlike many other languages, SQLite has no formal specification describing how it should be parsed. It doesn’t expose a stable API for its parser either. In fact, quite uniquely, in its implementation it doesn’t even build a parse tree at all⁹! The only reasonable approach left in my opinion is to carefully extract the relevant parts of SQLite’s source code and adapt it to build the parser I wanted¹⁰.

This means getting into the weeds of SQLite source code, a fiendishly difficult codebase to understand. The whole project is written in C in an incredibly dense style; I’ve spent days just understanding the virtual table API¹¹ and implementation. Trying to grasp the full parser stack was daunting.

There’s also the fact that there are >400 rules in SQLite which capture the full surface area of its language. I’d have to specify in each of these “grammar rules” how that part of the syntax maps to the matching node in the parse tree. It’s extremely repetitive work; each rule is similar to all the ones around it but also, by definition, different.

And it’s not just the rules but also coming up with and writing tests to make sure it’s correct, debugging if something is wrong, triaging and fixing the inevitable bugs people filed when I got something wrong…

For years, this was where the idea died. Too hard for a side project¹², too tedious to sustain motivation, too risky to invest months into something that might not work.

How it happened

I’ve been using coding agents since early 2025 (Aider, Roo Code, then Claude Code since July) and they’d definitely been useful but never something I felt I could trust a serious project to. But towards the end of 2025, the models seemed to make a significant step forward in quality¹³. At the same time, I kept hitting problems in Perfetto which would have been trivially solved by having a reliable parser. Each workaround left the same thought in the back of my mind: maybe it’s finally time to build it for real.

I got some space to think and reflect over Christmas and decided to really stress test the most maximalist version of AI: could I vibe-code the whole thing using just Claude Code on the Max plan (£200/month)?

Through most of January, I iterated, acting as semi-technical manager and delegating almost all the design and all the implementation to Claude. Functionally, I ended up in a reasonable place: a parser in C extracted from SQLite sources using a bunch of Python scripts, a formatter built on top, support for both the SQLite language and the PerfettoSQL extensions, all exposed in a web playground.

But when I reviewed the codebase in detail in late January, the downside was obvious: the codebase was complete spaghetti¹⁴. I didn’t understand large parts of the Python source extraction pipeline, functions were scattered in random files without a clear shape, and a few files had grown to several thousand lines. It was extremely fragile; it solved the immediate problem but it was never going to cope with my larger vision, never mind integrating it into the Perfetto tools. The saving grace was that it had proved the approach was viable and generated more than 500 tests, many of which I felt I could reuse.

I decided to throw away everything and start from scratch while also switching most of the codebase to Rust¹⁵. I could see that C was going to make it difficult to build the higher level components like the validator and the language server implementation. And as a bonus, it would also let me use the same language for both the extraction and runtime instead of splitting it across C and Python.

More importantly, I completely changed my role in the project. I took ownership of all decisions¹⁶ and used it more as “autocomplete on steroids” inside a much tighter process: opinionated design upfront, reviewing every change thoroughly, fixing problems eagerly as I spotted them, and investing in scaffolding (like linting, validation, and non-trivial testing¹⁷) to check AI output automatically.

The core features came together through February and the final stretch (upstream test validation, editor extensions, packaging, docs) led to a 0.1 launch in mid-March.

But in my opinion, this timeline is the least interesting part of this story. What I really want to talk about is what wouldn’t have happened without AI and also the toll it took on me as I used it.

AI is why this project exists, and why it’s as complete as it is

Overcoming inertia

I’ve written in the past about how one of my biggest weaknesses as a software engineer is my tendency to procrastinate when facing a big new project. Though I didn’t realize it at the time, it could not have applied more perfectly to building syntaqlite.

AI basically let me put aside all my doubts on technical calls, my uncertainty of building the right thing and my reluctance to get started by giving me very concrete problems to work on. Instead of “I need to understand how SQLite’s parsing works”, it was “I need to get AI to suggest an approach for me so I can tear it up and build something better"¹⁸. I work so much better with concrete prototypes to play with and code to look at than endlessly thinking about designs in my head, and AI lets me get to that point at a pace I could not have dreamed about before. Once I took the first step, every step after that was so much easier.

Faster at churning code

AI turned out to be better than me at the act of writing code itself, assuming that code is obvious. If I can break a problem down to “write a function with this behaviour and parameters” or “write a class matching this interface,” AI will build it faster than I would and, crucially, in a style that might well be more intuitive to a future reader. It documents things I’d skip, lays out code consistently with the rest of the project, and sticks to what you might call the “standard dialect” of whatever language you’re working in¹⁹.

That standardness is a double-edged sword. For the vast majority of code in any project, standard is exactly what you want: predictable, readable, unsurprising. But every project has pieces that are its edge, the parts where the value comes from doing something non-obvious. For syntaqlite, that was the extraction pipeline and the parser architecture. AI’s instinct to normalize was actively harmful there, and those were the parts I had to design in depth and often resorted to just writing myself.

But here’s the flip side: the same speed that makes AI great at obvious code also makes it great at refactoring. If you’re using AI to generate code at industrial scale, you have to refactor constantly and continuously²⁰. If you don’t, things immediately get out of hand. This was the central lesson of the vibe-coding month: I didn’t refactor enough, the codebase became something I couldn’t reason about, and I had to throw it all away. In the rewrite, refactoring became the core of my workflow. After every large batch of generated code, I’d step back and ask “is this ugly?” Sometimes AI could clean it up. Other times there was a large-scale abstraction that AI couldn’t see but I could; I’d give it the direction and let it execute²¹. If you have taste, the cost of a wrong approach drops dramatically because you can restructure quickly²².

Teaching assistant

Of all the ways I used AI, research had by far the highest ratio of value delivered to time spent.

I’ve worked with interpreters and parsers before but I had never heard of Wadler-Lindig pretty printing²³. When I needed to build the formatter, AI gave me a concrete and actionable lesson from a point of view I could understand and pointed me to the papers to learn more. I could have found this myself eventually, but AI compressed what might have been a day or two of reading into a focused conversation where I could ask “but why does this work?” until I actually got it.

This extended to entire domains I’d never worked in. I have deep C++ and Android performance expertise but had barely touched Rust tooling or editor extension APIs. With AI, it wasn’t a problem: the fundamentals are the same, the terminology is similar, and AI bridges the gap²⁴. The VS Code extension would have taken me a day or two of learning the API before I could even start. With AI, I had a working extension within an hour.

It was also invaluable for reacquainting myself with parts of the project I hadn’t looked at for a few days²⁵. I could control how deep to go: “tell me about this component” for a surface-level refresher, “give me a detailed linear walkthrough” for a deeper dive, “audit unsafe usages in this repo” to go hunting for problems. When you’re context switching a lot, you lose context fast. AI let me reacquire it on demand.

More than I’d have built alone

Beyond making the project exist at all, AI is also the reason it shipped as complete as it did. Every open source project has a long tail of features that are important but not critical: the things you know theoretically how to do but keep deprioritizing because the core work is more pressing. For syntaqlite, that list was long: editor extensions, Python bindings, a WASM playground, a docs site, packaging for multiple ecosystems²⁶. AI made these cheap enough that skipping them felt like the wrong trade-off.

It also freed up mental energy for UX²⁷. Instead of spending all my time on implementation, I could think about what a user’s first experience should feel like: what error messages would actually help them fix their SQL, how the formatter output should look by default, whether the CLI flags were intuitive. These are the things that separate a tool people try once from one they keep using, and AI gave me the headroom to care about them. Without AI, I would have built something much smaller, probably no editor extensions or docs site. AI didn’t just make the same project faster. It changed what the project was.

Where AI had its costs

The addiction

There’s an uncomfortable parallel between using AI coding tools and playing slot machines²⁸. You send a prompt, wait, and either get something great or something useless. I found myself up late at night wanting to do “just one more prompt,” constantly trying AI just to see what would happen even when I knew it probably wouldn’t work. The sunk cost fallacy kicked in too: I’d keep at it even in tasks it was clearly ill-suited for, telling myself “maybe if I phrase it differently this time.”

The tiredness feedback loop made it worse²⁹. When I had energy, I could write precise, well-scoped prompts and be genuinely productive. But when I was tired, my prompts became vague, the output got worse, and I’d try again, getting more tired in the process. In these cases, AI was probably slower than just implementing something myself, but it was too hard to break out of the loop³⁰.

Losing touch

Several times during the project, I lost my mental model of the codebase³¹. Not the overall architecture or how things fitted together. But the day-to-day details of what lived where, which functions called which, the small decisions that accumulate into a working system. When that happened, surprising issues would appear and I’d find myself at a total loss to understand what was going wrong. I hated that feeling.

The deeper problem was that losing touch created a communication breakdown³². When you don’t have the mental thread of what’s going on, it becomes impossible to communicate meaningfully with the agent. Every exchange gets longer and more verbose. Instead of “change FooClass to do X,” you end up saying “change the thing which does Bar to do X”. Then the agent has to figure out what Bar is, how that maps to FooClass, and sometimes it gets it wrong³³. It’s exactly the same complaint engineers have always had about managers who don’t understand the code asking for fanciful or impossible things. Except now you’ve become that manager.

The fix was deliberate: I made it a habit to read through the code immediately after it was implemented and actively engage to see “how would I have done this differently?”.

Of course, in some sense all of the above is also true of code I wrote a few months ago (hence the sentiment that AI code is legacy code), but AI makes the drift happen faster because you’re not building the same muscle memory that comes from originally typing it out.

The slow corrosion

There were some other problems I only discovered incrementally over the three months.

I found that AI made me procrastinate on key design decisions³⁴. Because refactoring was cheap, I could always say “I’ll deal with this later.” And because AI could refactor at the same industrial scale it generated code, the cost of deferring felt low. But it wasn’t: deferring decisions corroded my ability to think clearly because the codebase stayed confusing in the meantime. The vibe-coding month was the most extreme version of this. Yes, I understood the problem, but if I had been more disciplined about making hard design calls earlier, I could have converged on the right architecture much faster.

Tests created a similar false comfort³⁵. Having 500+ tests felt reassuring, and AI made it easy to generate more. But neither humans nor AI are creative enough to foresee every edge case you’ll hit in the future; there are several times in the vibe-coding phase where I’d come up with a test case and realise the design of some component was completely wrong and needed to be totally reworked. This was a significant contributor to my lack of trust and the decision to scrap everything and start from scratch.

Basically, I learned that the “normal rules” of software still apply in the AI age: if you don’t have a fundamental foundation (clear architecture, well-defined boundaries) you’ll be left eternally chasing bugs as they appear.

No sense of time

Something I kept coming back to was how little AI understood about the passage of time³⁶. It sees a codebase in a certain state but doesn’t feel time the way humans do. I can tell you what it feels like to use an API, how it evolved over months or years, why certain decisions were made and later reversed.

The natural problem from this lack of understanding is that you either make the same mistakes you made in the past and have to relearn the lessons or you fall into new traps which were successfully avoided the first time, slowing you down in the long run. In my opinion, this is a similar problem to why losing a high-quality senior engineer hurts a team so much: they carry history and context that doesn’t exist anywhere else and act as a guide for others around them.

In theory, you can try to preserve this context by keeping specs and docs up to date. But there’s a reason we didn’t do this before AI: capturing implicit design decisions exhaustively is incredibly expensive and time-consuming to write down. AI can help draft these docs, but because there’s no way to automatically verify that it accurately captured what matters, a human still has to manually audit the result. And that’s still time-consuming.

There’s also the context pollution problem. You never know when a design note about API A will echo in API B. Consistency is a huge part of what makes codebases work, and for that you don’t just need context about what you’re working on right now but also about other things which were designed in a similar way. Deciding what’s relevant requires exactly the kind of judgement that institutional knowledge provides in the first place.

Relativity

Reflecting on the above, the pattern of when AI helped and when it hurt was fairly consistent.

When I was working on something I already understood deeply, AI was excellent. I could review its output instantly, catch mistakes before they landed and move at a pace I’d never have managed alone. The parser rule generation is the clearest example³⁷: I knew exactly what each rule should produce, so I could review AI’s output within a minute or two and iterate fast.

When I was working on something I could describe but didn’t yet know, AI was good but required more care. Learning Wadler-Lindig for the formatter was like this: I could articulate what I wanted, evaluate whether the output was heading in the right direction, and learn from what AI explained. But I had to stay engaged and couldn’t just accept what it gave me.

When I was working on something where I didn’t even know what I wanted, AI was somewhere between unhelpful and harmful. The architecture of the project was the clearest case: I spent weeks in the early days following AI down dead ends, exploring designs that felt productive in the moment but collapsed under scrutiny. In hindsight, I have to wonder if it would have been faster just thinking it through without AI in the loop at all.

But expertise alone isn’t enough. Even when I understood a problem deeply, AI still struggled if the task had no objectively checkable answer³⁸. Implementation has a right answer, at least at a local level: the code compiles, the tests pass, the output matches what you asked for. Design doesn’t. We’re still arguing about OOP decades after it first took off.

Concretely, I found that designing the public API of syntaqlite was where this hit home the hardest. I spent several days in early March doing nothing but API refactoring, manually fixing things any experienced engineer would have instinctively avoided but AI made a total mess of. There’s no test or objective metric for “is this API pleasant to use” and “will this API help users solve the problems they have” and that’s exactly why the coding agents did so badly at it.

This takes me back to the days I was obsessed with physics and, specifically, relativity. The laws of physics look simple and Newtonian in any small local area, but zoom out and spacetime curves in ways you can’t predict from the local picture alone. Code is the same: at the level of a function or a class, there’s usually a clear right answer, and AI is excellent there. But architecture is what happens when all those local pieces interact, and you can’t get good global behaviour by stitching together locally correct components.

Knowing where you are on these axes at any given moment is, I think, the core skill of working with AI effectively.

Wrap-up

Eight years is a long time to carry a project in your head. Seeing these SQLite tools actually exist and function after only three months of work is a massive win, and I’m fully aware they wouldn’t be here without AI.

But the process wasn’t the clean, linear success story people usually post. I lost an entire month to vibe-coding. I fell into the trap of managing a codebase I didn’t actually understand, and I paid for that with a total rewrite.

The takeaway for me is simple: AI is an incredible force multiplier for implementation, but it’s a dangerous substitute for design. It’s brilliant at giving you the right answer to a specific technical question, but it has no sense of history, taste, or how a human will actually feel using your API. If you rely on it for the “soul” of your software, you’ll just end up hitting a wall faster than you ever have before.

What I’d like to see more of from others is exactly what I’ve tried to do here: honest, detailed accounts of building real software with these tools; not weekend toys or one-off scripts but the kind of software that has to survive contact with users, bug reports, and your own changing mind.

# 13:00 / #sqlite #ai #claude-code #software-engineering

1. Note SQLite ships in every smartphone, every major browser and countless embedded systems. See Most Widely Deployed. ↩

2. Explanation Devtools: a formatter, linter and language server (LSP). High-quality: I can trust to work all SQLite SQL e.g. a formatter which doesn’t “eat” comments, a linter supporting SQLite-specific features, a language server featureful enough to give a similar experience to Typescript in VSCode ↩

3. Note 36 days with commits between Jan 14 and Mar 18, plus the very early period (Dec 29-Jan 12) where I didn’t even bother committing code. ↩

4. Qualification Justified in “AI is why this project exists, and why it’s as complete as it is”. ↩

5. Note The journal runs to ~4,000 words with dated entries throughout the project. ↩

6. Note See the comparison page for benchmarks against other SQLite tooling. ↩

7. Note E.g. a video converter wrapping ffmpeg, and a semi-successful IRC client for Android. ↩

8. Note A recurring tendency of mine: I’m rarely satisfied with the immediate problem I’m facing and can’t help but think “but what if I did something even more ambitious”. ↩

9. Explanation SQLite goes straight from SQL text to bytecode without building an intermediate tree. ↩

10. Note

In more detail:

Extract the tokenizer directly from SQLite’s sources
Extract the grammar rules and use SQLite’s Lemon parser generator to create our own parser.
Go through each of the ~400 extracted grammar rules and decide how they should be represented in a parse tree.

↩

11. Note Even after using the virtual table API for 8 years in Perfetto, I still don’t feel I have a perfect handle on all its nuances. ↩

12. Note Another big challenge is supporting dialects like PerfettoSQL without needing to fork. I didn’t want to get into it because there’s a whole heap of complexity to explain around how to design the parser to be extensible but also fast. ↩

13. Note Andrej Karpathy put it well: “coding agents basically didn’t work before December […] models have significantly higher quality, long-term coherence and tenacity that can power through large and long tasks, making them extremely disruptive to the default programming workflow.” ↩

14. Journal From the journal, reflecting on the vibe-coding prototypes: “I quickly got exhausted and started accepting too much random code and once you mess up, it becomes very hard to get it back without throwing it away.” ↩

15. Note The tokenizer and parser remained in C, extracted from SQLite’s sources. Everything above that (formatter, linter, validator, language server) is Rust. ↩

16. Transcript Quoting from transcript at the time, the opening prompt was: “How I want to do this is that I want to be incharge of all decisions and direction and I want to tell you what to do. I don’t want you to plan, I don’t want you to be independent. Is that clear?” ↩

17. Note One example: I wrote a TCL driver that hooks into SQLite’s own ~1,390 upstream test files. Every SQL statement is run through both real SQLite (sqlite3_prepare_v2) and syntaqlite’s parser side-by-side: if SQLite accepts a statement, we must accept it too; if SQLite rejects it, we must reject it too. This catches classes of bugs that hand-written tests never would. ↩

18. Journal From the journal: “Use it to prototype changes really fast and then come back and delete all the code and then rebuild everything again in a more structured way properly.” AI turned abstract uncertainty into concrete artefacts to react to: the prototypes from vibe-coding proved the approach was viable even though the code itself was thrown away. ↩

19. Journal From the journal: “I’ve both largely let AI code whole modules while also being picky and hand writing SIMD and looking at compiler explorer asm/machine code depending on the problem I’m solving.” The standard code is the vast majority; the hand-written pieces are the exceptions. ↩

20. Journal From the journal: “If you’re using AI to generate code at industrial scale you have to refactor constantly and continuously. If you don’t, you immediately get out of hand.” Also: “After every large amount of code, it’s worth taking the time as a human to ask ‘is this ugly’ and if so, do something about it.” ↩

21. Journal Verbatim from the project journal. The journal adds: “audits + refactors work amazingly well together.” ↩

22. Journal From the journal: “Thankfully the cost of mistakes also goes down a lot if you have taste as AI can refactor also at industrial scale.” ↩

23. Transcript Wadler-Lindig is an algorithm for pretty-printing that lets you describe document layout declaratively and have the printer decide where to break lines. In the formatter design transcript, Claude proposed “width-aware formatting from the start (Wadler-Lindig document model)” and I could immediately engage with the trade-offs rather than spending days discovering the approach. ↩

24. Journal From the journal, on switching between domains: “I know performance on C++/Android really well. But I don’t know Python/Go/Rust tools at all. With AI, not a problem: the fundamentals are the same, the terminology is similar. I can now switch between domains so fast.” ↩

25. Journal From the journal, on codebase audits: “You can control how in-depth you want to go: surface level overview for a quick refresher, detailed walkthrough for a deeper dive, targeted audit to hunt for problems.” ↩

26. Note The launch prep phase covered VS Code extension, docs site, crates.io, PyPI, npm, Homebrew publishing, and a Zed extension. Each of these is a “weekend project” on its own. ↩

27. Journal “Not just ‘what problem should I solve’ but ‘what error messages are most useful’ and ‘how do I make this really simple to use’.” Examples: rustc-style multi-error diagnostics with “did you mean” suggestions, quick-fix code actions, and a syntaqlite.toml config file so users don’t need CLI flags. ↩

28. Note “I felt addicted at points to the ‘slots’ nature of it, draining time and (over the long term) health.” ↩

29. Journal From the journal: “My specificity with AI is directly proportional to how tired I am. When I have lots of energy, I can be really precise and productive. But when I’m tired, I start saying ‘do X thing’ without much detail and the AI output gets much much much worse.” ↩

30. Note “AI often was slower than if I had implemented something myself but it was too hard to break out of the ‘AI loop’” ↩

31. Note “Several times I ’lost touch’ with the codebase and there were surprising issues where I would just have to say ‘AI, please debug’, and I hated that feeling.” The fix: “I made it a habit to read code myself regularly to stay in ’touch’ with the system.” ↩

32. Journal From the journal: “It becomes very difficult to communicate your intent of a change with the agent. Because you lose the mental thread of ‘what is going on’, it becomes impossible to communicate meaningfully and clearly and every exchange becomes longer and more verbose requiring the agent to do more work.” ↩

33. Journal From the journal: “Instead of ‘change FooClass to do X’, you have to be like ‘change the thing which does Bar to do X’. And then the agent has to figure out Bar, how that maps to FooClass, sometimes it will get it wrong. Exactly the same complaint we’ve had forever with software engineering managers who don’t understand the code asking for fanciful things.” ↩

34. Note “AI made me procrastinate about actually making key design decisions — because it was easy to refactor, several times I was able to just say ‘I can deal with this later’. But it corroded my ability to think clearly in the meantime because the codebase was confusing.” ↩

35. Journal “Neither humans nor AI are creative enough to foresee the sort of crazy things you might hit in the future. If you don’t have some fundamental foundation, you will be left eternally chasing bugs as they happen.” The vibe-coded prototype had 500+ tests and still fell apart. The rewrite invested in idempotency tests and upstream SQLite validation instead of just adding more unit tests. ↩

36. Journal From the journal: “Models don’t have a sense of time. They see the codebase in a certain state and yes they can with context/docs/memory get a sense of it. But they don’t feel time in the same way as humans do. For example, I can tell you what it feels like to use an API and the progression over time and why things are the way they are. Models can only get this with explicit capture and that’s very expensive to do constantly.” ↩

37. Journal From the journal: “Agent team was successful beyond my wildest dreams. Was able to build everything in one evening.” But I had to engineer the scaffolding first: restructuring the project so agents could work on different files, and building a diffing script that grouped errors into actionable feedback. Then the unavoidable manual pass: “need to go through every single one of the tests. Found a bunch of problems (flags not being correctly formatted, missing field names etc).” ↩

38. Journal From the journal: “The more trivial a property it is for a human or AI to verify correctness, the better AI is at dealing with those tasks.” Also: “concrete codegen fast, abstract codegen really slow and inconsistent.” ↩