<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.0">Jekyll</generator><link href="https://frank.sauerburger.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://frank.sauerburger.io/" rel="alternate" type="text/html" /><updated>2026-03-10T01:10:25+01:00</updated><id>https://frank.sauerburger.io/feed.xml</id><title type="html">Frank Sauerburger</title><entry><title type="html">Claude Code as member of the engineering team</title><link href="https://frank.sauerburger.io/2026/03/03/claude-as-team-member.html" rel="alternate" type="text/html" title="Claude Code as member of the engineering team" /><published>2026-03-03T00:00:00+01:00</published><updated>2026-03-03T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2026/03/03/claude-as-team-member</id><content type="html" xml:base="https://frank.sauerburger.io/2026/03/03/claude-as-team-member.html">&lt;p&gt;I made the experiment: Claude Code as a full engineer on GitLab, as part of the development team for a mini game. I create tickets and review merge requests, while Claude Code submits code as merge requests and iterates on them if required. It worked surprisingly well.&lt;/p&gt;

&lt;p&gt;What’s the project? HTTP is the default protocol we use to connect with people and look up information on the web. For a long time, I wanted to offer a service accessible via SSH, where the terminal replaces the browser and SSH replaces HTTPS. To achieve that, I wanted to build a small terminal-based game. Voilà: AsciiMoria. An SSH-based game where players navigate their way through deep and dangerous mines, implemented in Rust.&lt;/p&gt;

&lt;h2 id=&quot;the-development-workflow&quot;&gt;The development workflow&lt;/h2&gt;
&lt;p&gt;Claude Code was wired into a project on GitLab, waiting to be assigned to issues with instructions on what to implement. Once assigned, Claude would work on a feature branch and interact with the ticket, asking for clarification if needed, for example. Each ticket would eventually result in a merge request for me to review. If I requested changes, or if the CI/CD pipeline failed, Claude would give it another shot and improve on the previous implementation.&lt;/p&gt;

&lt;p&gt;What I liked most about this workflow was remaining in charge. I create tickets, write specs, review merge requests, request changes, and make decisions about the direction of the project. This meant I could focus on thinking about the game’s direction, corner cases, and new features without getting lost in the nitty-gritty details of building a game or an SSH server in Rust. The separation of concerns felt natural: I owned the vision, Claude owned the implementation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/elv.png&quot; alt=&quot;Workflow of Claude code as an engineer and part of the development team&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-outcome&quot;&gt;The outcome&lt;/h2&gt;

&lt;p&gt;Coming straight to the point: the outcome of my experiment is a fun little game, free for everyone to play or fork.&lt;/p&gt;

&lt;p&gt;To play it, you need a terminal and an SSH client with a local private key for authentication (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ssh-keygen -t ed25519&lt;/code&gt; if you don’t have a key already).&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ssh asciimoria.com
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/asciimoria.png&quot; alt=&quot;Screenshots from the game&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The code is open source and available at &lt;a href=&quot;https://gitlab.sauerburger.com/frank/asciimoria&quot;&gt;https://gitlab.sauerburger.com/frank/asciimoria&lt;/a&gt;.
The repository contains all interactions and conversations with Claude Code.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/asciimoria_issues_mr.png&quot; alt=&quot;Entire backlog for the project and the corresponding merge requests opened by Claude&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Is the game bug free? I don’t think so. Would it be bug free if I wrote the game? Also no. I made some contributions to prove to myself that I understand the code.&lt;/p&gt;

&lt;h2 id=&quot;what-did-i-learn&quot;&gt;What did I learn?&lt;/h2&gt;
&lt;p&gt;A lot. I don’t want to sound foolish claiming that AI taught me things, but I freely admit I picked up many tricks along the way: how to approach certain patterns in Rust, the landscape of open source Rust libraries, new(-ish) features in Docker Compose files, new(-ish) features in GitLab CI/CD configs. Claude is extremely versatile and proficient across a remarkable breadth of technologies.&lt;/p&gt;

&lt;p&gt;I also learned just how good AI is at getting things right the first time.
We have all committed endless commit chains to fix a CI problem, caused by stupid mistakes, syntax errors, typos, or a general lack of understanding of how to configure the pipeline. Claude did none of that. In nearly every case, it had things right on the first or second attempt.&lt;/p&gt;

&lt;p&gt;Perhaps the most surprising lesson was about the level of detail in tickets. When I was thorough in the issue description, Claude followed it to the letter - it had all the specification it needed and delivered exactly what I asked for. On the other end, when tickets were brief and superficial, Claude found sensible ways to fill in the gaps, and I had very few bad surprises. However, there is a dangerous middle ground. When I tried to be clever in the ticket description without fully understanding the underlying complexity and tricky details, Claude would take me at my word and implement something rather nonsensical. To be clear, Claude did see the hidden complexity, it simply trusted that I had made a deliberate choice when writing the specs. The lesson: be either very detailed and understand the underlying complexity or be very brief. Vague specificity is the worst of both worlds.&lt;/p&gt;

&lt;h2 id=&quot;how-was-the-experience&quot;&gt;How was the experience?&lt;/h2&gt;
&lt;p&gt;Creating this game with AI feels a bit like cheating.&lt;/p&gt;

&lt;p&gt;On the other hand, it also gave a very rewarding feeling. Closing tickets and burning through your backlog in very little time releases a lot of dopamine. Going back to writing code the hard way afterwards feels a bit like going through detox.&lt;/p&gt;

&lt;p&gt;I was incredibly fascinated by how skilled my new engineer was. And I am not going to lie: it was very convenient to write tickets from my phone on the train and have a merge request ready by the time I got home, with passing unittests and the desired feature materialized in code.&lt;/p&gt;

&lt;p&gt;However, I can see people trying to keep an AI agent working 24/7, which is not going to be healthy for our work-life balance. Just because you can create tickets (or maybe even review code) on your phone does not mean you should.&lt;/p&gt;

&lt;h2 id=&quot;where-to-go-from-here&quot;&gt;Where to go from here&lt;/h2&gt;
&lt;p&gt;For hobby projects, there is one important question to ask: do you prefer the activity of coding, or do you prefer having the finished project? To code vs the code.&lt;/p&gt;

&lt;p&gt;Code generation has become very cheap, but the cost of ownership remains. You still have to understand the code you ship, maintain it, and make architectural decisions that go beyond what any AI can infer from a ticket. Even before good code generation was available, a key skill in software engineering was to focus on outcome rather than output. That principle is more relevant than ever, now that generating code is just a ticket away. The real work, deciding what to build and why, has not changed.&lt;/p&gt;</content><author><name>frank</name></author><category term="ai" /><category term="rust" /><summary type="html">I made the experiment: Claude Code as a full engineer on GitLab, as part of the development team for a mini game. I create tickets and review merge requests, while Claude Code submits code as merge requests and iterates on them if required. It worked surprisingly well.</summary></entry><entry><title type="html">Async database access in Rust with Diesel 101</title><link href="https://frank.sauerburger.io/2026/02/08/async-diesel-101.html" rel="alternate" type="text/html" title="Async database access in Rust with Diesel 101" /><published>2026-02-08T00:00:00+01:00</published><updated>2026-02-08T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2026/02/08/async-diesel-101</id><content type="html" xml:base="https://frank.sauerburger.io/2026/02/08/async-diesel-101.html">&lt;p&gt;&lt;a href=&quot;https://diesel.rs/&quot;&gt;Diesel&lt;/a&gt;,
a database ORM in Rust, provides compile-time type checks for its database operations, seamlessly integrating with blazingly fast, robust API applications built with
&lt;a href=&quot;https://github.com/tokio-rs/axum&quot;&gt;Axum&lt;/a&gt;.
As always, when working with a new technology, some of the details are difficult to remember (assuming you’re not vibe coding the entire app). For example, is it &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;load(&amp;amp;mut conn)&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;execute(&amp;amp;mut conn)&lt;/code&gt; for a select query? This article showcases a few simple queries. It doesn’t try to be a comprehensive tutorial. In the following, upper-case text is meant as a placeholder. The code assumes an auto-generated &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;schema.rs&lt;/code&gt; file and a hand-written &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;models.rs&lt;/code&gt; file with Rust representations of database data structures.&lt;/p&gt;

&lt;h2 id=&quot;select&quot;&gt;Select&lt;/h2&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SELECTED_TYPE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .inner_join(schema::OTHER_TABLE::table)&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.eq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// or   .select((schema::TABLE::FIELD, …) )&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// or   .select(model::TABLE.as_select())&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .offset(OFFSET)&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .limit(LIMIT)&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.load&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .first(&amp;amp;mut conn)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;.await&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;insert&quot;&gt;Insert&lt;/h2&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SELECTED_TYPE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VALUES_OBJECT&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.insert_into&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.returning&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.get_result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// or without returning&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .execute(&amp;amp;mut conn) &lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;.await&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;update&quot;&gt;Update&lt;/h2&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nn&quot;&gt;diesel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;.filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.eq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;.set&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.eq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;.execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;.await&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name>frank</name></author><category term="rust" /><category term="internet" /><summary type="html">Diesel, a database ORM in Rust, provides compile-time type checks for its database operations, seamlessly integrating with blazingly fast, robust API applications built with Axum. As always, when working with a new technology, some of the details are difficult to remember (assuming you’re not vibe coding the entire app). For example, is it load(&amp;amp;mut conn) or execute(&amp;amp;mut conn) for a select query? This article showcases a few simple queries. It doesn’t try to be a comprehensive tutorial. In the following, upper-case text is meant as a placeholder. The code assumes an auto-generated schema.rs file and a hand-written models.rs file with Rust representations of database data structures.</summary></entry><entry><title type="html">Virt-install a Ubuntu VM from the terminal</title><link href="https://frank.sauerburger.io/2026/01/03/virt-install-ubuntu-on-terminal.html" rel="alternate" type="text/html" title="Virt-install a Ubuntu VM from the terminal" /><published>2026-01-03T00:00:00+01:00</published><updated>2026-01-03T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2026/01/03/virt-install-ubuntu-on-terminal</id><content type="html" xml:base="https://frank.sauerburger.io/2026/01/03/virt-install-ubuntu-on-terminal.html">&lt;p&gt;In recent releases of Ubuntu, the installation experience has changed significantly. Starting with Ubuntu 24.04, the default server ISO uses the new Subiquity-based live installer, and the traditional text-based “mini.iso” workflow is no longer provided in the same way.&lt;/p&gt;

&lt;p&gt;If you’re provisioning virtual machines via automation — especially on a headless host using KVM and virt-install — this can be frustrating. By default, virt-install may attempt to open a graphical console, pushing you toward VNC or SPICE even when you prefer a pure terminal-based workflow.
If you open the virt-install console, you’ll see something like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WARNING CDROM media does not print to the text console by default, so you likely will not see text install output. You might want to use --location. See the man page for examples of using --location with CDROM media
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This guide shows how to install Ubuntu 24.04 LTS using the live server ISO entirely from the terminal, launching the installer kernel and initrd directly from the ISO.&lt;/p&gt;

&lt;h2 id=&quot;define-virtual-machine-parameters&quot;&gt;Define Virtual Machine Parameters&lt;/h2&gt;

&lt;p&gt;Start by defining the VM configuration. The block below is fully editable — adjust CPU, RAM, disk size, and paths to fit your environment before running it.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot; contenteditable=&quot;&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ubuntu-vm&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_ISO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;https://releases.ubuntu.com/24.04.3/ubuntu-24.04.3-live-server-amd64.iso&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_OS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ubuntu-stable-latest&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_IMG&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;/var/lib/libvirt/images/&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_NAME&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;.qcow2&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_CORES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;2
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_DISKSIZE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;50
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_RAMSIZE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;4096
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_LOCAL_ISO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/tmp/ubuntu.iso
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Variable&lt;/th&gt;
      &lt;th&gt;Purpose&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_NAME&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Name of the virtual machine in libvirt&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_ISO&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Official Ubuntu 24.04 live server ISO&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_OS&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;OS variant for optimization&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_IMG&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Path to the VM disk image&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_CORES&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Number of virtual CPUs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_DISKSIZE&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Disk size in GB&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_RAMSIZE&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;RAM in MB&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_LOCAL_ISO&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Local ISO download path&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;!--
```bash
export VM_NAME=&quot;default&quot;
export VM_ISO=&quot;https://releases.ubuntu.com/24.04.3/ubuntu-24.04.3-live-server-amd64.iso&quot;
export VM_OS=&quot;ubuntu-stable-latest&quot;
export VM_IMG=&quot;/var/lib/libvirt/images/${VM_NAME}.qcow2&quot;
export VM_CORES=2
export VM_DISKSIZE=50
export VM_RAMSIZE=4096
```
--&gt;

&lt;h2 id=&quot;download-the-ubuntu-2404-iso&quot;&gt;Download the Ubuntu 24.04 ISO&lt;/h2&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot; contenteditable=&quot;&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;curl -o &quot;$VM_LOCAL_ISO&quot; -L &quot;$VM_ISO&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;install-ubuntu-24-using-terminal-installer&quot;&gt;Install Ubuntu 24 Using Terminal Installer&lt;/h2&gt;

&lt;p&gt;Here’s the key part: we instruct virt-install to boot directly from the installer kernel and initrd inside the ISO.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot; contenteditable=&quot;&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;virt-install &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--virt-type&lt;/span&gt; kvm &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--name&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$VM_NAME&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--os-variant&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$VM_OS&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--disk&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_IMG&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;,size&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_DISKSIZE&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;,bus&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;virtio,format&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;qcow2 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--memory&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$VM_RAMSIZE&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--vcpus&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$VM_CORES&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--graphics&lt;/span&gt; none &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--console&lt;/span&gt; pty,target_type&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;serial &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--location&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_LOCAL_ISO&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;,kernel&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;casper/vmlinuz,initrd&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;casper/initrd &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--extra-args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;console=ttyS0,115200n8 --- console=ttyS0,115200n8&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;ubuntu-vm&quot;&gt;Ubuntu VM&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/screenshot-ubuntu-vm-1.png&quot; alt=&quot;Screenshot from the Ubuntu installer&quot; /&gt;
&lt;img src=&quot;/assets/screenshot-ubuntu-vm-2.png&quot; alt=&quot;Screenshot from the Ubuntu installer&quot; /&gt;&lt;/p&gt;</content><author><name>frank</name></author><category term="sysadmin" /><category term="vm" /><summary type="html">In recent releases of Ubuntu, the installation experience has changed significantly. Starting with Ubuntu 24.04, the default server ISO uses the new Subiquity-based live installer, and the traditional text-based “mini.iso” workflow is no longer provided in the same way.</summary></entry><entry><title type="html">Creating an Elasticsearch API token for another user</title><link href="https://frank.sauerburger.io/2025/11/01/create-elastic-api-token-for-other-user.html" rel="alternate" type="text/html" title="Creating an Elasticsearch API token for another user" /><published>2025-11-01T00:00:00+01:00</published><updated>2025-11-01T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2025/11/01/create-elastic-api-token-for-other-user</id><content type="html" xml:base="https://frank.sauerburger.io/2025/11/01/create-elastic-api-token-for-other-user.html">&lt;p&gt;Using the superuser &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;elastic&lt;/code&gt; API keys to access the Elasticsearch API is not recommended.
The API key is often used by remote clients on remote systems. Any attacker who might get access to the
token, can compromise the entire Elasticsearch instance.&lt;/p&gt;

&lt;p&gt;The solution is to use API keys for unprivileged roles and users. Creating these API keys, however, is not
straightforward. Unprivileged users usually don’t have permission to log in and create the API keys for themselves.
Run the following request as superuser to create API keys on behalf of unprivileged users, for example, in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/app/dev_tools#/console&lt;/code&gt; console.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;POST /_security/api_key/grant
&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;s2&quot;&gt;&quot;grant_type&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;password&quot;&lt;/span&gt;,
    &lt;span class=&quot;s2&quot;&gt;&quot;username&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;elastic&quot;&lt;/span&gt;,
    &lt;span class=&quot;s2&quot;&gt;&quot;password&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;YOUR SUPERUSER PASSWORD&quot;&lt;/span&gt;,
    &lt;span class=&quot;s2&quot;&gt;&quot;run_as&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;UNPRIVILEGED USERNAME&quot;&lt;/span&gt;,
    &lt;span class=&quot;s2&quot;&gt;&quot;api_key&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;s2&quot;&gt;&quot;name&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;NEW NAME FOR THAT API KEY&quot;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;More details can be found in the &lt;a href=&quot;https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-security-grant-api-key&quot;&gt;API docs&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;dedicated-api-keys-with-restricted-permissions&quot;&gt;Dedicated API keys with restricted permissions&lt;/h2&gt;

&lt;p&gt;Creating tokens for dedicated tasks, for example for 
&lt;a href=&quot;https://www.elastic.co/beats/filebeat&quot;&gt;Filebeat&lt;/a&gt; or &lt;a href=&quot;https://www.elastic.co/beats/metricbeat&quot;&gt;Metricbeat&lt;/a&gt;
clients, can be achieved through another endpoint.&lt;/p&gt;

&lt;p&gt;Create an API key for Metricbeat clients with the following request. See the &lt;a href=&quot;https://www.elastic.co/docs/reference/beats/metricbeat/beats-api-keys&quot;&gt;API docs&lt;/a&gt; for more details.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;POST /_security/api_key
&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;s2&quot;&gt;&quot;name&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;API KEY NAME&quot;&lt;/span&gt;, 
  &lt;span class=&quot;s2&quot;&gt;&quot;role_descriptors&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;s2&quot;&gt;&quot;metricbeat_writer&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; 
      &lt;span class=&quot;s2&quot;&gt;&quot;cluster&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;monitor&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;read_ilm&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;read_pipeline&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;,
      &lt;span class=&quot;s2&quot;&gt;&quot;index&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;s2&quot;&gt;&quot;names&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;metricbeat-*&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;,
          &lt;span class=&quot;s2&quot;&gt;&quot;privileges&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;view_index_metadata&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;create_doc&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;auto_configure&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Create an API key for Filebeat clients with the following request. See the &lt;a href=&quot;https://www.elastic.co/docs/reference/beats/filebeat/beats-api-keys&quot;&gt;API docs&lt;/a&gt; for more details.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;POST /_security/api_key
&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;s2&quot;&gt;&quot;name&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;API KEY NAME&quot;&lt;/span&gt;,
  &lt;span class=&quot;s2&quot;&gt;&quot;role_descriptors&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;s2&quot;&gt;&quot;filebeat_writer&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&quot;cluster&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;monitor&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;read_ilm&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;read_pipeline&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;,
      &lt;span class=&quot;s2&quot;&gt;&quot;index&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;s2&quot;&gt;&quot;names&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;filebeat-*&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;,
          &lt;span class=&quot;s2&quot;&gt;&quot;privileges&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;view_index_metadata&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;create_doc&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;auto_configure&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name>frank</name></author><category term="internet" /><category term="sysadmin" /><summary type="html">Using the superuser elastic API keys to access the Elasticsearch API is not recommended. The API key is often used by remote clients on remote systems. Any attacker who might get access to the token, can compromise the entire Elasticsearch instance.</summary></entry><entry><title type="html">llms.txt adoption</title><link href="https://frank.sauerburger.io/2025/09/03/llms-txt-adoption.html" rel="alternate" type="text/html" title="llms.txt adoption" /><published>2025-09-03T00:00:00+02:00</published><updated>2025-09-03T00:00:00+02:00</updated><id>https://frank.sauerburger.io/2025/09/03/llms-txt-adoption</id><content type="html" xml:base="https://frank.sauerburger.io/2025/09/03/llms-txt-adoption.html">&lt;p&gt;Exactly one year ago, Jeremy Howard published a &lt;a href=&quot;https://llmstxt.org&quot;&gt;proposal&lt;/a&gt; to make the web more accessible to AI and, in particular, to LLMs. How many of the top one million websites adopt this approach?&lt;/p&gt;

&lt;p&gt;The proposed standard suggests creating a file at the root of a website, e.g., &lt;a href=&quot;https://llmstxt.org/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt;&lt;/a&gt;,
intended to be consumed by LLMs and AI tools, loosely taking inspiration from &lt;a href=&quot;https://www.robotstxt.org/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/robots.txt&lt;/code&gt;&lt;/a&gt;.
The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; serves as an entry point or site map of the website, potentially linking to other pages.
As the source code of a webpage is often very verbose and its content is mingled with style sheets, JavaScript, and HTML markup,
parsing the source with an LLM might exceed the LLM’s content window or consume too many tokens.
Therefore, the idea is to use Markdown for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; entry point and to link to Markdown versions of each page.&lt;/p&gt;

&lt;p&gt;How many websites adopted this approach?&lt;/p&gt;

&lt;!-- snip --&gt;

&lt;h1 id=&quot;lets-measure&quot;&gt;Let’s measure.&lt;/h1&gt;

&lt;p&gt;Starting with a dataset of the &lt;a href=&quot;https://www.domcop.com/top-10-million-websites&quot;&gt;10 million highest-ranked domains&lt;/a&gt;
from &lt;a href=&quot;https://www.domcop.com/openpagerank/what-is-openpagerank&quot;&gt;Open Page Rank&lt;/a&gt;,
we can send a GET request to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; for each of them and see how many web servers respond with an HTTP success code.
It turns out, a lot of web servers respond with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;200&lt;/code&gt; status code but actually send a
page informing the client that the page doesn’t exist. 
For example, the top-ranked domain, facebook.com, behaves in that way.
A GET request to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://facebook.com/llms.txt&lt;/code&gt; returns a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;200&lt;/code&gt; status code, but the pages says the content is not available.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/facebook-llmstxt.png&quot; alt=&quot;Unusual behavior of facebook.com&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A typical feature of these fake-success responses is that the response page is an HTML document.
However, sometimes the Content-Type header field is not a reliable discriminator to detect fake-success pages.
A good way to distinguish HTML content from Markdown is to compare the number of occurrences of the left angle bracket (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;&lt;/code&gt;) character to the
left square bracket (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[&lt;/code&gt;).
The first is ubiquitous in HTML, while the latter is common in Markdown.
I came up with the following somewhat arbitrary rules. Only if a response satisfies all of them, I count it as a valid &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; response.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;HTTP response code must be less than 400&lt;/li&gt;
  &lt;li&gt;Content-Type header must start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;text/plain&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;The web server must accept the connection within 3 seconds&lt;/li&gt;
  &lt;li&gt;The response must arrive within 10 seconds&lt;/li&gt;
  &lt;li&gt;The site must support HTTPS&lt;/li&gt;
  &lt;li&gt;The response content must be longer than 500 chars&lt;/li&gt;
  &lt;li&gt;There must be more left square brackets than left angle brackets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To speed up the analysis, it limited it to the top one million domains.
I ran the analysis on September 2, 2025.&lt;/p&gt;

&lt;h1 id=&quot;results&quot;&gt;Results&lt;/h1&gt;

&lt;p&gt;Domains ranked high in the domain ranking might be faster to adopt new technological ideas. 
To test this, I counted the fraction of domains with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; among the top n domains.
The result is shown in the following chart.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/llmstxt.png&quot; alt=&quot;Barchart of the fraction of domains with llms.txt among the top n domains&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Based on these results, we see that the largest adoption rate at 4 % is among the top 300 domains.
The fraction continuously decreases further down the ranking. Looking at the top one million domains,
we see that the overall adoption rate drops to around 1.2 %.
In total, that corresponds to 12174 domains.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://gitlab.sauerburger.com/frank/llmstxt-adoption&quot;&gt;crawler, the analysis code, and the result dataset&lt;/a&gt;
are available in a Git repository.&lt;/p&gt;</content><author><name>frank</name></author><category term="internet" /><category term="ai" /><summary type="html">Exactly one year ago, Jeremy Howard published a proposal to make the web more accessible to AI and, in particular, to LLMs. How many of the top one million websites adopt this approach? The proposed standard suggests creating a file at the root of a website, e.g., /llms.txt, intended to be consumed by LLMs and AI tools, loosely taking inspiration from /robots.txt. The /llms.txt serves as an entry point or site map of the website, potentially linking to other pages. As the source code of a webpage is often very verbose and its content is mingled with style sheets, JavaScript, and HTML markup, parsing the source with an LLM might exceed the LLM’s content window or consume too many tokens. Therefore, the idea is to use Markdown for the /llms.txt entry point and to link to Markdown versions of each page. How many websites adopted this approach?</summary></entry><entry><title type="html">Magic floating-point numbers: NaNs</title><link href="https://frank.sauerburger.io/2025/08/18/magic-floating-point-numbers-nans.markdown.html" rel="alternate" type="text/html" title="Magic floating-point numbers: NaNs" /><published>2025-08-18T00:00:00+02:00</published><updated>2025-08-18T00:00:00+02:00</updated><id>https://frank.sauerburger.io/2025/08/18/magic-floating-point-numbers-nans.markdown</id><content type="html" xml:base="https://frank.sauerburger.io/2025/08/18/magic-floating-point-numbers-nans.markdown.html">&lt;p&gt;After following &lt;a href=&quot;https://www.youtube.com/watch?v=y-NOz94ZEOA&quot;&gt;Laurie Kirk down a rabbit hole on subnormal numbers in the IEEE 754 float specification&lt;/a&gt;,
I stumbled upon other interesting properties of floating-point numbers, specifically how NaNs (Not a Number) are represented in binary.
After more than 10 years of scientific computing and data science, I thought there was nothing about floats that could surprise me, but oh, was I wrong.
Let’s see if I can surprise you.
I’ve built the computer-science equivalent of a magic trick to showcase these properties.&lt;/p&gt;

&lt;h2 id=&quot;the-magic-trick&quot;&gt;The magic trick&lt;/h2&gt;
&lt;p&gt;The trick works in two stages:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;You choose a phrase of your liking. With a special Python function, you can convert it into a numpy array of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;s. It’s a normal array. It’s normal NaNs. Your phrase is nowhere to be seen.&lt;/li&gt;
  &lt;li&gt;You send the numpy array to an API endpoint at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://magicfloat.sauerburger.io/unravel&lt;/code&gt;. Using advanced magic (knowledge of &lt;a href=&quot;https://en.wikipedia.org/wiki/IEEE_754&quot;&gt;IEEE 754&lt;/a&gt;), I can unravel your secrets by looking at the array of NaNs.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;step-one-enchanting-your-phrase&quot;&gt;Step one: Enchanting your phrase&lt;/h3&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;enchant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;phrase&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ndarray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;frombuffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;sa&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\xff\x80\x7f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phrase&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;encode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;utf-8&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;]),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you call that with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;Computers are fun!&quot;&lt;/code&gt;, you get a numpy array of floats with no signs of the phrase. It seems the message is gone.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;box&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;enchant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Computers are fun!&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,)&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;float32&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;step-two-open-the-magic-nan-array&quot;&gt;Step two: Open the magic NaN array&lt;/h3&gt;

&lt;p&gt;I’m providing an API endpoint at &lt;a href=&quot;https://magicfloat.sauerburger.io/unravel&quot;&gt;https://magicfloat.sauerburger.io/unravel&lt;/a&gt; that takes the binary version of the numpy array and responds with your original phrase.
The following function does the necessary encoding and request handling.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;requests&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;unravel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ndarray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;raise&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;ValueError&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Magic box must be float32.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requests&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://magicfloat.sauerburger.io/unravel&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tobytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;raise&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;RuntimeError&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;The planets don&apos;t seem to align: %s&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we continue the example from above, we get: &lt;em&gt;drum roll&lt;/em&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unravel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&apos;Computers are fun!&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;how-does-it-work&quot;&gt;How does it work?&lt;/h2&gt;

&lt;p&gt;Floating-point numbers are represented using three components,&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;the sign of the numbers,&lt;/li&gt;
  &lt;li&gt;the exponent used with base 2, and&lt;/li&gt;
  &lt;li&gt;the fractional part of the number, the mantissa.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In memory, they are arranged as follows. The order might be different depending on the &lt;a href=&quot;https://frank.sauerburger.io/2022/01/26/big-and-little-endian.html&quot;&gt;endianness&lt;/a&gt; of your platform.&lt;/p&gt;

&lt;table&gt;
&lt;tr style=&quot;color: #fff; text-align: center&quot;&gt;
&lt;td style=&quot;width: 1em;&quot;&gt; &lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9C165B&quot;&gt;x&lt;/td&gt;

&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;

&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;

&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;

&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;

&lt;/tr&gt;
&lt;tr style=&quot;color: #000; background-color: #fff&quot;&gt;
&lt;td style=&quot;text-align: center&quot; colspan=&quot;2&quot;&gt;Sign&lt;/td&gt;
&lt;td style=&quot;text-align: center&quot; colspan=&quot;8&quot;&gt;Biased exponent (8 bits)&lt;/td&gt;
&lt;td style=&quot;text-align: center&quot; colspan=&quot;23&quot;&gt;Mantissa (23 bits)&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;A few combinations of bits have a special meaning, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+inf&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-inf&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;.
When the exponent is all-ones (as shown in the chart), it represents one of the aforementioned three cases.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Sign&lt;/th&gt;
      &lt;th&gt;Biased exponent&lt;/th&gt;
      &lt;th&gt;Mantissa&lt;/th&gt;
      &lt;th&gt;Special meaning&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;all ones: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1111 1111&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;all zero&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+inf&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;all ones: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1111 1111&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;all zero&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-inf&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;any&lt;/td&gt;
      &lt;td&gt;all ones: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1111 1111&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;at least one bit not zero&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;We observe that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+inf&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-inf&lt;/code&gt; each have a unique binary representation.
However, for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;, we have 2^24 - 1 possible binary representations.
For my little magic trick, I pack one UTF-8 encoded byte in each 32-bit float number.
I invite you to &lt;a href=&quot;https://gitlab.sauerburger.com/frank/magicfloat&quot;&gt;discover&lt;/a&gt; the details yourself.&lt;/p&gt;</content><author><name>frank</name></author><category term="python" /><summary type="html">After following Laurie Kirk down a rabbit hole on subnormal numbers in the IEEE 754 float specification, I stumbled upon other interesting properties of floating-point numbers, specifically how NaNs (Not a Number) are represented in binary. After more than 10 years of scientific computing and data science, I thought there was nothing about floats that could surprise me, but oh, was I wrong. Let’s see if I can surprise you. I’ve built the computer-science equivalent of a magic trick to showcase these properties.</summary></entry><entry><title type="html">LLM basics and its applications</title><link href="https://frank.sauerburger.io/2025/07/16/llm-basic-and-its-applications.html" rel="alternate" type="text/html" title="LLM basics and its applications" /><published>2025-07-16T00:00:00+02:00</published><updated>2025-07-16T00:00:00+02:00</updated><id>https://frank.sauerburger.io/2025/07/16/llm-basic-and-its-applications</id><content type="html" xml:base="https://frank.sauerburger.io/2025/07/16/llm-basic-and-its-applications.html">&lt;p&gt;Technical innovations often offer numerous applications with tremendous added value, making work easier. However, this also increases the potential for misuse, with the opposite effect—whether deliberately or through improper use. This also applies to advances in the field of Artificial Intelligence (AI). The aim of this article is to shed light on how current text-based AI applications work, so that they can be used meaningfully and appropriately in the field of safety science. The focus here is not to discourage their use, but to encourage an active discussion about sensible applications and their adoption.&lt;/p&gt;

&lt;!-- snip --&gt;

&lt;div class=&quot;alert alert-primary&quot; role=&quot;alert&quot;&gt;
&lt;i class=&quot;fa fa-info-circle&quot;&gt;&lt;/i&gt; &lt;b&gt;Note:&lt;/b&gt;
The is the written version of my presentation titled &lt;i&gt;Challenge &quot;Artificial Intelligence&quot; in relation to the assessment of safety&lt;/i&gt; at &lt;a href=&quot;https://auva.at/veranstaltungen/forum-praevention-international-2025/programm-program/#MIE&quot;&gt;XXXIX. International GfS Symposium&lt;/a&gt;, Vienna on Mai 21, 2025.
The text as translated from German to English using a Large Language Model (LLM) and is not a verbatim transcript of the presentation.
&lt;/div&gt;

&lt;p&gt;From a user’s perspective, lack of understanding of how AI applications work creates several obstacles. In the following, I will focus on text-based applications powered by Large Language Models (LLMs). Since the launch of ChatGPT in November 2022, it took only a few months before insufficient understanding during use made headlines. As various media reported, a lawyer in New York used ChatGPT to research legal precedents, which he then submitted to the court. It later emerged that most of the cases presented by ChatGPT were either incorrectly cited or completely made up. This behavior, known among experts as “hallucination,” is a characteristic of LLMs. The lawyer claimed to have acted under the assumption that ChatGPT was a search engine. The legal consequences in this case led to the lawyer being fined. Numerous reports of similar cases have emerged over the past two years.&lt;/p&gt;

&lt;p&gt;The example above illustrates how important it is to understand how AI applications work. This is not limited to ChatGPT and can also be transferred to other AI tools. Below, I describe how language models function and how modern AI tools are derived from them.&lt;/p&gt;

&lt;h2 id=&quot;large-language-models&quot;&gt;Large Language Models&lt;/h2&gt;
&lt;p&gt;Large Language Models are neural networks, that is, in a broad sense, nonlinear mathematical functions that compute an output from an input. In the case of language models, both the input and output are text. LLMs are used in almost all generative, text-based AI services. However, they are often hidden behind several layers of application-specific logic. The foundation for today’s models was laid by Google employees in 2017 with the Transformer architecture and the so-called Attention mechanism.&lt;/p&gt;

&lt;p&gt;The term “Large” in Large Language Models refers to the number of parameters and the associated memory requirement. There is no clear cutoff to define the term and it is expected that the development of ever larger language models will continue. If the language model is viewed as a mathematical function, the number of parameters becomes comparable: A first-degree polynomial, i.e. a function that describes a straight line in a plane, has two parameters; a second-degree polynomial (parabola) has three. In general, GPT-1 is considered the first LLM, which is described by 117 million parameters. Today’s models have up to 700 billion parameters, requiring specialized hardware for their application. Further examples of large language models include OpenAI’s GPT-4.1 or o3, as well as openly available models such as BLOOM, Llama 3, and Mixtral.&lt;/p&gt;

&lt;h3 id=&quot;functionality&quot;&gt;Functionality&lt;/h3&gt;
&lt;p&gt;LLMs do not work directly with words or characters, but with “tokens”—essentially the alphabet of the language model. A token reflects a semantic unit of a word and is the smallest element the model understands. In English, a token is often equated to about ¾ of a word, so on average about 1⅓ tokens are required to form a word. For a language model to process text, it is first broken down into a sequence of tokens and each token is identified by an integer. The original text is thus translated into a chain of numbers. Typically, the vocabulary of modern language models contains about 20,000 to 200,000 different tokens.&lt;/p&gt;

&lt;p&gt;Large language models for generative applications are mostly trained to predict the next token based on a given sequence of tokens—so-called Next Token Prediction. Put simply, the goal of training is to optimize the vast number of parameters in the language model so that it can predict the next token for texts from the training corpus as accurately as possible. To train such a large number of parameters, a correspondingly large dataset of texts is required. For high precision, the language model must be capable of understanding context both within sentences and across entire texts. Earlier approaches in computational linguistics, such as Markov chains or neural networks with Long Short-Term Memory (LSTM), do not achieve comparable results.&lt;/p&gt;

&lt;p&gt;Sticking with the view of language models as mathematical functions, so far the model appears to compute the next token from a sequence of input tokens. By repeatedly applying the function, always appending the predicted token to the input sequence, it is possible to continue or complete a started text. This is illustrated in the following figure. To enhance readability, input and output texts are shown as plain text rather than tokenized. The model completes the started sentence. Although this appears to work well at first glance, the phenomenon of hallucination also appears here, as the described “FINTURBO Cup” does not exist.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna1.png&quot; alt=&quot;Example of text completion by language model Qwen2-72B. Words generated by the model are shown in blue.
&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;mental-model-for-the-model&quot;&gt;Mental Model for the Model&lt;/h3&gt;
&lt;p&gt;According to a 2022 survey, computational linguists were divided on whether LLMs truly “understand” natural language in a non-trivial sense, or merely remix and parrot back training texts. It is unclear how language models solve linguistic tasks, and they are often described as black boxes.&lt;/p&gt;

&lt;p&gt;The author believes this question is not productive, as it enters deeply philosophical territory. For instance, to answer it, we would first need to define what is meant by “understanding” and to what extent a machine can actually understand anything.&lt;/p&gt;

&lt;p&gt;From a practical perspective, it makes sense to use the following mental model for LLMs: Since LLMs are above all trained to predict the next word or token, it is reasonable to say that LLMs merely imitate human language and conversation. It shouldn’t be surprising that such imitation can still be extremely useful for solving tasks. Throughout this article, I will refer back to this perspective to explain the behavior of LLMs in specific examples.&lt;/p&gt;

&lt;h2 id=&quot;techniques&quot;&gt;Techniques&lt;/h2&gt;
&lt;p&gt;The following sections highlight various techniques, all of which are applied in one form or another in today’s AI tools.&lt;/p&gt;

&lt;h3 id=&quot;paradigm-shift-prompt-engineering&quot;&gt;Paradigm Shift: Prompt Engineering&lt;/h3&gt;
&lt;p&gt;Programming has traditionally meant that a computer executes the instructions in program code exactly as written. This can be surprising when, for example, an “obviously” correct algorithm is carried out by the machine differently than a human would expect. Translating abstract ideas into concrete, specific computer instructions is a core part of programming.&lt;/p&gt;

&lt;p&gt;Prompt Engineering marks a radical departure from this principle. When language models are used, instructions are no longer executed with mathematical precision. Prompt Engineering refers to the practice of writing instructions—called prompts—to a language model so that it completes a task as reliably and accurately as possible. A prompt is executed by the model in the context of simulated conversation. Small, seemingly insignificant changes to the prompt’s wording can lead to radically different answers. For instance, if a language model’s output must adhere to a specific format to work with traditional software, it is not uncommon to remind the model several times within the prompt about the desired output format, as shown in the following figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna2.png&quot; alt=&quot;Instruction to a language model to analyze product reviews.
&quot; /&gt;&lt;/p&gt;

&lt;p&gt;LLMs tend to invent or guess answers when they do not know the correct response—this is the already described phenomenon of hallucination. Providers of AI services attempt to avoid or reduce hallucination in their products. Because of this tendency, pure language models are not suitable as factual knowledge bases or search engines. Ignorance of this property led to the initial example with the lawyer.&lt;/p&gt;

&lt;p&gt;As radical and paradoxical as the departure from mathematical precision in programming may seem, this paradigm shift dramatically increases the range of possible applications. Writing a short textual instruction to an LLM can solve problems that could not be addressed with conventional programming, or only through great effort using traditional computational linguistics techniques. Applications can more quickly adapt to changing requirements simply by changing prompts, whereas retraining a classic computational linguistics model takes significant time.&lt;/p&gt;

&lt;h3 id=&quot;in-context-learning&quot;&gt;In-Context Learning&lt;/h3&gt;
&lt;p&gt;Language models have a limited knowledge base, known as the knowledge cut-off. During training, a large corpus of texts is used. The model cannot know anything about events not included in its training set, for example because they occurred after training was completed. Language models generally cannot access all information from the training dataset as a knowledge base would. The purpose of the dataset is to teach the model language and conversation—not to memorize facts and events.&lt;/p&gt;

&lt;p&gt;Since training LLMs is time- and resource-intensive, there are other ways to convey new information to them. A fundamental method is so-called in-context learning, in which, at the start of a conversation with the language model—i.e. after training is finished—the model is given all necessary information for the task as part of the input text. This can include background knowledge required for a task, or examples that demonstrate how the task should be solved (“few-shot learning”). The following figure shows a conversation in which the language model is taught, using examples, how to rephrase sentences.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna3.png&quot; alt=&quot;Conversation using in-context learning, where Qwen2-72B-Instruct is given example instructions. Words generated by the model are shown in blue.&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;external-knowledge-and-logical-reasoning&quot;&gt;External Knowledge and Logical Reasoning&lt;/h3&gt;
&lt;p&gt;The concept of in-context learning can be extended and combined with classic knowledge bases. Information required for answering a question or carrying out a task is added to the conversational context by extracting it from a knowledge base. This technique is called Retrieval Augmented Generation (RAG). RAG applications differ greatly depending on the data source. A RAG application can consult a single document, an encyclopedia like Wikipedia, or all freely available internet content.&lt;/p&gt;

&lt;p&gt;Developing the retrieval aspect is essential for a successful RAG system. If the extracted content does not include the information needed to answer the question, the language model cannot provide a meaningful response. RAG is often seen as another method to reduce LLM hallucinations. The following illustrates the workings of a RAG system with an example.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna4.png&quot; alt=&quot;Example of a RAG system using the language model Qwen2-72B-Instruct. Relevant background info is inserted into the conversation based on the user&apos;s question. Words generated by the model are shown in blue.
&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The analogy—that language models merely imitate conversation—points to another limitation that was humorously noted online in the early days of ChatGPT: the lack of logical reasoning ability.&lt;/p&gt;

&lt;p&gt;THe following figure illustrates that the language model did not reach the correct logical or mathematical conclusion. The example uses in-context learning to demonstrate what the model is being asked to do. It remains unclear how the model arrived at the answer “27”; the correct answer is 9.&lt;/p&gt;

&lt;p&gt;A common technique to guide language models toward logical reasoning is called Chain-of-Thought Prompting (COT prompting), in which the model is encouraged to document and explain intermediate steps. Since of the way the model works, each subsequent step is generated only after preceding intermediates have been “put on paper” and made part of the input. THe following figures shows the same task, this time solved correctly with COT prompting.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna5.png&quot; alt=&quot;Example of conversation with a language model, in which the model incorrectly solves a mathematical riddle. Words generated by the model are shown in blue.
&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna6.png&quot; alt=&quot;Example of a conversation with a language model using chain-of-thought prompting to correctly solve a mathematical riddle. Words generated by the model are shown in blue.
&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;agents-and-fine-tuning&quot;&gt;Agents and Fine-Tuning&lt;/h3&gt;
&lt;p&gt;Innovations in prompt engineering and related techniques allow a language model to be used for a wide range of applications. However, its capabilities are limited to pure text (or audio/image) output. Interaction with external systems requires so-called agents, which can enable complex computations and allow language models to interact with external systems.&lt;/p&gt;

&lt;p&gt;Through prompt engineering, a language model is told, in its instruction, what external tools are available for solving a task. This could include a calculator, an external API (to manage emails, appointments, or contacts), a RAG-based search, a free web search, or a coding environment.&lt;/p&gt;

&lt;p&gt;Users can assign a task to the language model. The model is instructed to select, from the available tools, the one required to solve the problem. The model communicates the selection and use of tools via a special answer format. Complex systems allow for the use of multiple tools to solve the task step by step.&lt;/p&gt;

&lt;p&gt;A language model with internet search and route planner abilities could, for example, handle the request “How long does it take to drive from Vienna to the host venue of the Eurovision Song Contest 2025?” by first using a web search to find where Eurovision will be held in 2025, then using the route planner to calculate the travel time from Vienna to Basel. The answer is about eight hours.&lt;/p&gt;

&lt;p&gt;The potential of agent-based applications is virtually limitless in our connected digital world. Today’s language models are often trained on both natural and programming languages, allowing them to generate functional code. In combination with a programming environment, language models can write complex algorithms needed to solve a task and then execute them. The limits of such systems in the future are currently unforeseeable.&lt;/p&gt;

&lt;p&gt;All the techniques discussed so far use prompt engineering to influence the work of the language model. To conclude, there is another technique that does not rely on prompt engineering: Through fine-tuning, a pre-trained language model can be customized so that it automatically follows a fixed, predefined instruction. In a way, the model loses its ability to react generically to all prompts. On the other hand, the model doesn’t have to be told its task at the start of every conversation. Fine-tuning is suitable when a language model will be used for a large number of identical tasks. Depending on the fine-tuning method, all the parameters of the model may be adjusted. However, this is often too computationally expensive. Alternatives like PEFT enable fine-tuning by adding or adjusting only a small number of new parameters.&lt;/p&gt;

&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;Meaningful use of AI in safety sciences requires understanding both the underlying technology and its limitations. By presenting some essential techniques used in today’s AI tools, this article aims to make a contribution here.&lt;/p&gt;

&lt;p&gt;The landscape of commercial and open-source language models, and their abilities and methods, is changing rapidly. It is difficult to make predictions about the future of AI. But AI will certainly change the way many industries work, and companies that fail to use AI tools will have a hard time competing. It is therefore all the more important to seize the opportunities that AI provides. To make this easier, this article has highlighted some technical boundaries, to help avoid mistakes in application.&lt;/p&gt;</content><author><name>frank</name></author><category term="ai" /><summary type="html">Technical innovations often offer numerous applications with tremendous added value, making work easier. However, this also increases the potential for misuse, with the opposite effect—whether deliberately or through improper use. This also applies to advances in the field of Artificial Intelligence (AI). The aim of this article is to shed light on how current text-based AI applications work, so that they can be used meaningfully and appropriately in the field of safety science. The focus here is not to discourage their use, but to encourage an active discussion about sensible applications and their adoption.</summary></entry><entry><title type="html">Restore a collection from a Qdrant snapshot stored in S3</title><link href="https://frank.sauerburger.io/2025/02/26/restore-qdrant-snapshot-from-s3.html" rel="alternate" type="text/html" title="Restore a collection from a Qdrant snapshot stored in S3" /><published>2025-02-26T00:00:00+01:00</published><updated>2025-02-26T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2025/02/26/restore-qdrant-snapshot-from-s3</id><content type="html" xml:base="https://frank.sauerburger.io/2025/02/26/restore-qdrant-snapshot-from-s3.html">&lt;p&gt;The lightning-fast vector database &lt;a href=&quot;https://qdrant.tech&quot;&gt;Qdrant&lt;/a&gt; has supported 
creating snapshots of its collection on S3 since version v1.10. That’s convenient and simplifies storing
snapshots as backups on multiple machines. However, as of 2025, there is no way to restore
a collection from a snapshot on S3. This article describes a walkaround using pre-signed URLs in S3.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/qdrant-s3@2x.png&quot; alt=&quot;Illustration of a backup and restore workflow in Qdrant via an S3 bucket&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;creating-the-snapshot&quot;&gt;Creating the snapshot&lt;/h2&gt;

&lt;p&gt;A Qdrant instance can be &lt;a href=&quot;https://qdrant.tech/documentation/concepts/snapshots/#s3&quot;&gt;configured to use S3 as its storage for snapshots&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;storage&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;snapshots_config&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;snapshots_storage&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;s3&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;s3_config&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;bucket&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_bucket_here&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_bucket_region_here&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;access_key&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_access_key_here&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;secret_key&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_secret_key_here&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;endpoint_url&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_url_here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The next snapshot request will be written to the S3 bucket.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;qdrant_client&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QdrantClient&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QdrantClient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;http://localhost:6333&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;snapshot_info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_snapshot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;collection_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my-collection&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;creating-a-pre-signed-url&quot;&gt;Creating a pre-signed URL&lt;/h2&gt;

&lt;p&gt;From the snapshot_info object, we can figure out the name of the snapshot within the bucket.
Alternatively, inspecting the bucket directly works as well. Let’s assume, the snapshot is called&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;snapshots/my-collection/my-collection_123456789.snapshot
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since Qdrant is not able to read and restore a snapshot from S3 directly, we can use S3’s pre-signed URLs.
A pre-signed URL contains a signature that authorizes everyone with the URL to access a specific object in the
bucket—assuming that the API and the bucket is not configured to be entirely private.&lt;/p&gt;

&lt;p&gt;AWS’s Python client &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;boto3&lt;/code&gt; is an easy way to create pre-signed URLs.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;boto3&lt;/span&gt;


&lt;span class=&quot;n&quot;&gt;session&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;boto3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;session&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Session&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;s3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;session&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;service_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;s3&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;aws_access_key_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;aws_secret_access_key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;endpoint_url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generate_presigned_url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;&apos;get_object&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;Params&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&apos;Bucket&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&apos;Key&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;snapshots/my-collection/my-collection_123456789.snapshot&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ExpiresIn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Expires in 7 days
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Creating pre-signed URLs doesn’t require internet access. The key id and access key pair is sufficient
to create the signature offline. The S3 server validates the signature and evaluates permissions once
the resource is requested.&lt;/p&gt;

&lt;p&gt;The HTTP URL points to the snapshot file. This could look like&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&apos;https://s3.sauerburger.com/&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;qdrant-backup/snapshots/my-collection/my-collection_123456789.snapshot?&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Credential=a4tw715etr74zq2dhxdp70s1%2F20250212%2Feu-central-1%2Fs3%2Faws4_request&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Date=20250212T194041Z&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Expires=604800&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-SignedHeaders=host&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Signature=23ea0fcfa91bb5e3ebea847e79cc69f08224e58275890097e28bed9e1de018df&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;restoring-a-collection&quot;&gt;Restoring a collection&lt;/h2&gt;

&lt;p&gt;The final step is to instruct Qdrant to restore a collection from a snapshot via HTTP.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;qdrant_client&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QdrantClient&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QdrantClient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;http://localhost:6333&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;success&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;qdrant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;recover_snapshot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;my-collection&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;wait&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;priority&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;snapshot&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Done.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;Qdrant → S3 → Pre-signed HTTP URL → Qdrant&lt;/p&gt;</content><author><name>frank</name></author><category term="ai" /><category term="python" /><summary type="html">The lightning-fast vector database Qdrant has supported creating snapshots of its collection on S3 since version v1.10. That’s convenient and simplifies storing snapshots as backups on multiple machines. However, as of 2025, there is no way to restore a collection from a snapshot on S3. This article describes a walkaround using pre-signed URLs in S3.</summary></entry><entry><title type="html">Network analysis with Scapy and Polars</title><link href="https://frank.sauerburger.io/2025/01/29/network-analysis-scapy-polars.html" rel="alternate" type="text/html" title="Network analysis with Scapy and Polars" /><published>2025-01-29T00:00:00+01:00</published><updated>2025-01-29T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2025/01/29/network-analysis-scapy-polars</id><content type="html" xml:base="https://frank.sauerburger.io/2025/01/29/network-analysis-scapy-polars.html">&lt;p&gt;Sometimes, debugging state-of-the-art AI applications in an on-premise Kubernetes cluster requires capturing network packets and performing complex statistical traffic exploration and analysis.
Traffic is easily captured with&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo tcpdump -i any -s 65535 -w /tmp/capture.pcap
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and decoded with &lt;a href=&quot;https://wireshark.org&quot;&gt;Wireshark&lt;/a&gt;.
However, complex analyses require other tools.
Let’s open the data scientists’ toolbox: &lt;a href=&quot;https://pola.rs&quot;&gt;Polars&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;loading&quot;&gt;Loading&lt;/h2&gt;

&lt;p&gt;The basic idea is to use &lt;a href=&quot;https://scapy.net/&quot;&gt;Scapy&lt;/a&gt; to read the capture file, decode the packets and various protocols, and organize the data in a Polars dataframe.
In this example, let’s extract the source and destination IP address, the packet length,
and the query domain name from DNS packets.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;polars&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;scapy.all&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sa&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;scapy.all&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PcapReader&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;seaborn&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sns&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tqdm&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tqdm&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PcapReader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;capture.pcap&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;IP:src&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;                    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;  &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;IP:dst&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dst&lt;/span&gt;                    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;  &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;IP:len&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;                    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;  &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;DNS:qcode&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sprintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%DNS.opcode%&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;DNS&quot;&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;qname&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;qd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;DNS&quot;&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[],&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tqdm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;IP:src&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;IP:dst&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;IP:len&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;DNS:qcode&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()),&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Derive additional columns
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_columns&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;internal&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP:src&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;starts_with&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;10.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP:dst&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;starts_with&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;10.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Additionally, we can derive additional columns, such as whether the packet is internal or external, based on the IP addresses. A more robust analysis could include the packet’s
IP addresses as 32-bit integers and applying bitwise operations to determine membership in a network subnet.&lt;/p&gt;

&lt;p&gt;The resulting, redacted dataframe looks something like:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;IP:src&lt;/th&gt;
      &lt;th&gt;IP:dst&lt;/th&gt;
      &lt;th&gt;IP:len&lt;/th&gt;
      &lt;th&gt;DNS:opcode&lt;/th&gt;
      &lt;th&gt;DNS:qnames&lt;/th&gt;
      &lt;th&gt;internal&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;128&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“162.55.242.49”&lt;/td&gt;
      &lt;td&gt;“91.59.x.x”&lt;/td&gt;
      &lt;td&gt;188&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;false&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;93&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;93&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;843&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;139&lt;/td&gt;
      &lt;td&gt;“QUERY”&lt;/td&gt;
      &lt;td&gt;[“ns-2.sit-servers.net.”]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;packet-length-analysis&quot;&gt;Packet length analysis&lt;/h2&gt;

&lt;p&gt;So far so good. Assume we want to investigate elevated retransmission rates. We might want to look at
the distribution of packet lengths, for internal and external traffic. With the current setup,
we can hand the dataframe to &lt;a href=&quot;https://seaborn.pydata.org&quot;&gt;seaborn&lt;/a&gt; for visualization.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;seaborn&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sns&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;sns&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;histplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP:len&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;internal&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;element&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;step&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;yscale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;log&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlabel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Packet size / Bytes&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/packet-length.png&quot; alt=&quot;Distribution of packet lengths&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;dns-server-analysis&quot;&gt;DNS server analysis&lt;/h2&gt;

&lt;p&gt;Next, we might want to investigate the DNS queries. Let’s look at the frequency of query names.
Since we captured traffic on all interfaces, we want to filter out queries for internal
servers. That’s easily done with Polars. Furthermore, since we don’t specify the direction of the query,
incoming or outgoing, we capture both: incoming DNS queries to the authoritative server where tcpdump was running, as well as, name lookups originating from the server.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;dns_stats&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;explode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;drop_nulls&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value_counts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ends_with&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;in-addr.arpa.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;not_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ends_with&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;local.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;not_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dns_stats&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;count&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;descending&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;DNS:qnames&lt;/th&gt;
      &lt;th&gt;count&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;“gitlab.sauerburger.com.”&lt;/td&gt;
      &lt;td&gt;91&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;em&gt;(redacted)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;55&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“ns-1.sit-servers.net.”&lt;/td&gt;
      &lt;td&gt;30&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“ns-2.sit-servers.net.”&lt;/td&gt;
      &lt;td&gt;30&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“frank.sauerburger.io.”&lt;/td&gt;
      &lt;td&gt;12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“fjell.ai.”&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“neodns.io.”&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“sauerburger.io.”&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“debugci.dev.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“www.fjellai.cloud.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“sAUeRbuRgEr.DeV.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“ds.sit-servers.net.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“nEodns.teCH.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“Ns-1.sIT-servErs.neT.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“NS-2.SiT-SErVerS.NeT.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“ns-2.sit-SeRveRS.nET.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“.uhepp.org.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“net.stratus.sit-servers.net.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“NS-1.sIT-SERVErs.NEt.”&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“neODNS.TecH.”&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“nS-1.siT-SerVERS.NEt.”&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;If you’re wondering why some of the DNS entries have random captialization,
that’s just &lt;a href=&quot;https://xkcd.com/1361/&quot;&gt;Google focussing on its core business&lt;/a&gt;.&lt;/p&gt;</content><author><name>frank</name></author><category term="python" /><category term="internet" /><category term="sysadmin" /><summary type="html">Sometimes, debugging state-of-the-art AI applications in an on-premise Kubernetes cluster requires capturing network packets and performing complex statistical traffic exploration and analysis. Traffic is easily captured with</summary></entry><entry><title type="html">Python datetime’s difficult relationship to timezones</title><link href="https://frank.sauerburger.io/2024/12/04/Python-datetime-timezone.html" rel="alternate" type="text/html" title="Python datetime’s difficult relationship to timezones" /><published>2024-12-04T00:00:00+01:00</published><updated>2024-12-04T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2024/12/04/Python-datetime-timezone</id><content type="html" xml:base="https://frank.sauerburger.io/2024/12/04/Python-datetime-timezone.html">&lt;p&gt;Python has two modes of dealing with dates and times: Timezone-naive and timezone-aware.
The former is simpler, the latter is more powerful but has some pitfalls in store.
This article summarizes a few edge cases involving daylight saving shifts where timezone-aware
datetime objects behave in unexpected ways. At the time of writing,
according to a non-representative &lt;a href=&quot;https://quiz.sauerburger.com/dxi7m/&quot;&gt;quiz&lt;/a&gt; I launched online, only 20 % of all 354 responses 
were correct. Let’s see what the problem is.&lt;/p&gt;

&lt;h2 id=&quot;the-problem&quot;&gt;The problem&lt;/h2&gt;

&lt;p&gt;Let’s say we have two datetime objects, one at 9 pm the evening before daylight saving time ends, and one at 9 am the next day.
A person with a stopwatch would measure 13 hours between the two events.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/python-dst@2x.png&quot; alt=&quot;Daylight saving time switch&quot; /&gt;&lt;/p&gt;

&lt;p&gt;However, Python will compute 12 hours between the two events.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;zoneinfo&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;datetime&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Europe/Berlin&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;26&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tzinfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sun&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tzinfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sun&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_seconds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;the-explanation&quot;&gt;The explanation&lt;/h1&gt;

&lt;p&gt;The explanation is buried in a footnote in the &lt;a href=&quot;https://docs.python.org/3.12/library/datetime.html#datetime.datetime.fold&quot;&gt;Python documentation&lt;/a&gt;
detailing arithmetic operations on datetime objects.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Subtraction of a datetime from a datetime is defined only if both operands are naive, or if both are aware. […]
If both are naive, or both are aware and have the same tzinfo attribute, the tzinfo attributes are ignored, […].&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So subtract two timezone-aware datetime objects in the same timezone ignores the timezone information.
The problem arises if&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Two datetime objects have the &lt;strong&gt;same tzinfo&lt;/strong&gt; but&lt;/li&gt;
  &lt;li&gt;Due to the daylight saving shift, have &lt;strong&gt;different offsets to UTC&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/python/cpython/issues/116111#issuecomment-2336746070&quot;&gt;rationale behind this&lt;/a&gt; is, that if you want to schedule a task every day at 10 am in the local timezone, you can do so by
taking one event at 10 am and repeatedly add 24 hours. The resulting time will always be 10 am, even if the clock is shifted
due to daylight saving time.&lt;/p&gt;

&lt;p&gt;In my view, this is a fundamental design flaw in Python’s datetime module.
The module should differentiate between adding a full day or adding 24 hours.
Adding a full day would ignore daylight saving shifts. While repeatably adding a full day, would keep tasks at the same local time,
adding 24 hours would keep the time observed with a stopwatch consistent.
The worst part is that the problem is only visible in edge cases and only specified in a footnote in the documentation.&lt;/p&gt;

&lt;h2 id=&quot;how-to-avoid-it&quot;&gt;How to avoid it&lt;/h2&gt;

&lt;p&gt;There are two simple ways to avoid the problem:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Always convert to UTC when doing arithmetic operations.&lt;/li&gt;
  &lt;li&gt;Avoid using the datetime library and use a library like &lt;a href=&quot;https://pendulum.eustace.io/&quot;&gt;pendulum&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;examples-with-surprising-results&quot;&gt;Examples with surprising results&lt;/h2&gt;

&lt;p&gt;Summary&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The additional hour of sleep at the end of daylight saving time is &lt;strong&gt;not counted&lt;/strong&gt;.&lt;/li&gt;
  &lt;li&gt;The additional hour of sleep at the end of daylight saving time is &lt;strong&gt;counted&lt;/strong&gt; if two different timezones with identical DST logic are used (e.g. Berlin and Paris).&lt;/li&gt;
  &lt;li&gt;Adding a day and 24 hours is the &lt;strong&gt;same&lt;/strong&gt; across daylight saving shifts.&lt;/li&gt;
  &lt;li&gt;Points in time are &lt;strong&gt;incorrectly ordered&lt;/strong&gt; across daylight saving shifts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;amount-of-sleep-during-daylight-saving-switch&quot;&gt;Amount of sleep during daylight saving switch&lt;/h3&gt;

&lt;p&gt;In Europe, daylight saving time ends early in the morning on Sunday, October 27, 2024.
At 3:00 AM, clocks are set back to 2:00 AM, adding an extra hour to the night.
During daylight saving, Central European Summer Time (CEST) is UTC +2 hours. After the switch, Central European Time (CET) is UTC +1 hour.&lt;/p&gt;

&lt;p&gt;Let’s calculate how many hours of sleep someone in Germany would get if they went to bed on Saturday at 9:00 PM (CEST) and set their alarm for 9:00 AM (CET) the next morning.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;zoneinfo&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;datetime&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Europe/Berlin&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;26&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tzinfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sun&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tzinfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isoformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;2024-10-26T21:00:00+02:00&apos;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sun&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isoformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;2024-10-27T09:00:00+01:00&apos;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sun&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_seconds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Python computes 12 hours of sleep, however, a person with a stopwatch would measure 13 hours.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 id=&quot;bus-ride-from-berlin-to-paris-across-a-daylight-saving-switch&quot;&gt;Bus ride from Berlin to Paris across a daylight saving switch&lt;/h3&gt;

&lt;p&gt;Now, instead of going to bed and setting an alarm,
let’s consider a person who boards a bus in Berlin on Saturday at 9:00 PM (CEST) and arrives in Paris on Sunday at 9:00 AM (CET).&lt;/p&gt;

&lt;p&gt;Both Germany and France share the same UTC offset and follow identical daylight saving time rules.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;zoneinfo&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;datetime&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Europe/Berlin&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;26&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tzinfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;paris&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Europe/Paris&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sun&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tzinfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;paris&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isoformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;2024-10-26T21:00:00+02:00&apos;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sun&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isoformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;2024-10-27T09:00:00+01:00&apos;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sun&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_seconds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;13&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;In contrast to the previous example, Python computes 13 hours of travel time.&lt;/strong&gt; This is because formally Berlin and Paris is not the same timezone object.&lt;/p&gt;

&lt;h3 id=&quot;adding-a-day-and-24-hours-is-the-same&quot;&gt;Adding a day and 24 hours is the same&lt;/h3&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;zoneinfo&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;datetime&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timedelta&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Europe/Berlin&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;26&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tzinfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timedelta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timedelta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hours&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Adding a day and 24 hours to the same time yields the same result, 9 pm on Sunday.&lt;/strong&gt;
However, due to the daylight saving shift, one might expect that adding 24 hours would yield 8 pm instead.&lt;/p&gt;

&lt;h3 id=&quot;incorrect-ordering&quot;&gt;Incorrect ordering&lt;/h3&gt;

&lt;p&gt;From GitHub issue &lt;a href=&quot;https://github.com/python/cpython/issues/116111#issuecomment-2427121958&quot;&gt;python/cpython#116111&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;zoneinfo&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;datetime&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timedelta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timezone&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ZoneInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Europe/Berlin&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fold&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tzinfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;35&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fold&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tzinfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;berlin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;astimezone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;timezone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;utc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isoformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# 2024-10-27T01:30:00+00:00
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;astimezone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;timezone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;utc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isoformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# 2024-10-27T00:35:00+00:00
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The point &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; is on the second fold, i.e. after the daylight saving shift with UTC offset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+1&lt;/code&gt;, while &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt; is on the first fold, i.e. before the shift with UTC offset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+2&lt;/code&gt;.
The two points are unambiguously ordered in UTC, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt; occurs before &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt;, however,
&lt;strong&gt;Python incorrectly orders the two times.&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;is-this-a-bug&quot;&gt;Is this a bug?&lt;/h2&gt;

&lt;p&gt;The behavior is as specified in the documentation, so it is not a bug.
The problem is that this detail is hidden in a footnote, not well known, and even misleadingly described in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;zoneinfo&lt;/code&gt; documentation.
Help spread awareness of this issue.&lt;/p&gt;</content><author><name>frank</name></author><category term="python" /><category term="datetime" /><category term="internet" /><summary type="html">Python has two modes of dealing with dates and times: Timezone-naive and timezone-aware. The former is simpler, the latter is more powerful but has some pitfalls in store. This article summarizes a few edge cases involving daylight saving shifts where timezone-aware datetime objects behave in unexpected ways. At the time of writing, according to a non-representative quiz I launched online, only 20 % of all 354 responses were correct. Let’s see what the problem is.</summary></entry></feed>