<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.0">Jekyll</generator><link href="https://frank.sauerburger.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://frank.sauerburger.io/" rel="alternate" type="text/html" /><updated>2026-06-06T02:10:30+02:00</updated><id>https://frank.sauerburger.io/feed.xml</id><title type="html">Frank Sauerburger</title><entry><title type="html">Vibe code in Rust</title><link href="https://frank.sauerburger.io/2026/04/29/vibe-code-in-rust.html" rel="alternate" type="text/html" title="Vibe code in Rust" /><published>2026-04-29T00:00:00+02:00</published><updated>2026-04-29T00:00:00+02:00</updated><id>https://frank.sauerburger.io/2026/04/29/vibe-code-in-rust</id><content type="html" xml:base="https://frank.sauerburger.io/2026/04/29/vibe-code-in-rust.html">&lt;p&gt;If you are vibe coding and you don’t review any of the code, like a true vibe &lt;em&gt;coder&lt;/em&gt;,
you should build your system in Rust.  I know, that’s a controversial statement. But hear me out. We will see how good the advice is.&lt;/p&gt;

&lt;h2 id=&quot;why&quot;&gt;Why?&lt;/h2&gt;

&lt;p&gt;Letting AI write your code in Rust has a couple of advantages:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Better code quality.&lt;/strong&gt; Rust’s strict compiler and ownership model can help prevent common bugs and memory issues, leading to more robust code. If it compiles, you have some assurance it will not crash on Saturday at 3 am. No &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AttributeError: &apos;NoneType&apos; object has no attribute &apos;something&apos;&lt;/code&gt; or whatever.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Performance.&lt;/strong&gt; Before the advent of good Generative AI, there was always a trade-off between speed &lt;em&gt;of code&lt;/em&gt; and speed &lt;em&gt;to code&lt;/em&gt;. Often, especially for early market tests, prototypes, or personal experiments, the speed of execution was not the bottleneck, but the speed of development was. When AI writes the code for you at lightning speed, the bottleneck shifts to defining the task and execution speed.&lt;/li&gt;
&lt;/ul&gt;

&lt;!-- snip --&gt;

&lt;h2 id=&quot;why-not&quot;&gt;Why not?&lt;/h2&gt;
&lt;p&gt;There are also some arguments that speak against using Rust for vibe coding:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Less prevalent in training data.&lt;/strong&gt; Rust became very popular in the last few years, but is still much less prevalent than Python or JavaScript/TypeScript.
Therefore, AI models are commonly trained much less on Rust code than on other programming languages. 
Therefore, the quality of generated Rust code may be comparatively lower.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Language complexity.&lt;/strong&gt; The Rust programming language is complex with a steep learning curve and novel memory management concepts. What’s difficult for human developers to learn and master is also difficult for AI models to generate correctly. This can lead to more errors during implementation.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Maintenance challenges.&lt;/strong&gt; Due to the complexity of Rust, in the long term, it might be more difficult to maintain and review the generated code. However, since our initial premise was that we are not reviewing the code anyway, this point is moot.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Higher generation costs.&lt;/strong&gt; The language complexity and the need for more iterations in AI thinking and iterations until the code compiles can lead to longer generation times and higher token costs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s put these things to the test and see if the positive or negative argument outweighs the other.&lt;/p&gt;

&lt;h1 id=&quot;the-experiment&quot;&gt;The experiment&lt;/h1&gt;

&lt;p&gt;I let Claude Code implement three different spec.md files, each in Python, Rust, and Typescript. So a total of 9 implementations. The spec files were identical across languages; only the language-specific instructions and conventions were adapted.&lt;/p&gt;

&lt;p&gt;The experiment uses an automated framework that runs Claude Code in headless mode inside isolated Docker containers, one per language and project. Each container receives the same specification file and is tasked with producing a working implementation. The framework records three metrics for each run: the USD cost charged by the Claude API, the wall-time, and the correctness of the generated code verified via a shared test suite (which is also AI-generated, but reviewed to ensure applicability and correctness with the given specs).&lt;/p&gt;

&lt;p&gt;Three language environments were tested:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Python 3.14&lt;/strong&gt; (using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uv&lt;/code&gt; package manager)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;TypeScript&lt;/strong&gt; (Node 25)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Rust&lt;/strong&gt; (1.94 with Cargo)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why these three? They are all very popular languages. All three are memory safe. Python is interpreted and dynamically typed.
TypeScript is statically typed, and Node might use just-in-time compilation,
while Rust is also statically typed and ahead-of-time compiled.
The three languages are usually used for different purposes: Python is often used for data science, scripting, and web development; TypeScript is popular for frontend and backend web development; Rust is favored for systems programming and performance-critical applications.
This gives us a good variety of language features and paradigms to compare.&lt;/p&gt;

&lt;p&gt;The setup is documented at &lt;a href=&quot;https://gitlab.sauerburger.com/vibe-code-language/utils&quot;&gt;gitlab.sauerburger.com/vibe-code-language/utils&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;the-benchmark-projects&quot;&gt;The benchmark projects&lt;/h1&gt;

&lt;p&gt;Three projects of increasing complexity were chosen as benchmarks. The specs are available at &lt;a href=&quot;https://gitlab.sauerburger.com/vibe-code-language/utils/-/tree/main/specs&quot;&gt;gitlab.sauerburger.com/vibe-code-language/utils/-/tree/main/specs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://gitlab.sauerburger.com/vibe-code-language/utils/-/blob/main/specs/mandelbrot.md?ref_type=heads&quot;&gt;&lt;strong&gt;Mandelbrot&lt;/strong&gt;:&lt;/a&gt; The simplest project. Claude had to implement an HTTP API that renders a blue-scale image of the Mandelbrot set for a given coordinate region, plus a single-page HTML/JS viewer with click-to-zoom navigation. Pure computation with no external dependencies or database.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://gitlab.sauerburger.com/vibe-code-language/utils/-/blob/main/specs/vocab.md?ref_type=heads&quot;&gt;&lt;strong&gt;Vocab Trainer&lt;/strong&gt;:&lt;/a&gt; A mid-complexity CLI application for spaced-repetition vocabulary learning. The tool has two commands: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;search&lt;/code&gt; (look up a word via a free online dictionary API and persist it to SQLite) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;train&lt;/code&gt; (quiz the user on stored definitions using a 6-bucket spaced-repetition scheduler). Requires file I/O, HTTP requests, and an embedded SQLite database.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://gitlab.sauerburger.com/vibe-code-language/utils/-/blob/main/specs/spendingapi.md?ref_type=heads&quot;&gt;&lt;strong&gt;Spending API&lt;/strong&gt;:&lt;/a&gt; The most complex project. A production-oriented LLM usage-tracking system consisting of an HTTP API service backed by PostgreSQL and an admin CLI. It handles API-key authentication (Argon2id hashed secrets), per-request token consumption recording, cost computation with fixed-precision decimal arithmetic, and aggregated usage reporting grouped by model or API key.&lt;/p&gt;

&lt;h1 id=&quot;the-results&quot;&gt;The results&lt;/h1&gt;

&lt;p&gt;Feel free to review the 
&lt;a href=&quot;https://gitlab.sauerburger.com/vibe-code-language&quot;&gt;9 implementations&lt;/a&gt;
yourself.&lt;/p&gt;

&lt;p&gt;The following chart summarizes the API cost and generation time for each language and project.
The numeric values can be found in the table at the end of this article.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vibe_chart.png&quot; alt=&quot;Cost and time comparison of vibe coding in Python, TypeScript, and Rust across three benchmark projects&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Let’s break it down.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; For the simple Mandelbrot project Rust was actually the cheapest to generate at $0.23, beating Python ($0.38) and TypeScript ($0.35). However, that advantage reverses as complexity grows: for the Vocab Trainer Rust already costs more ($0.87) than Python ($0.80) and TypeScript ($0.72), and for the Spending API Rust was the most expensive at $1.31, 42% more than Python ($0.92) and 30% more than TypeScript ($1.01).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Time:&lt;/strong&gt; Rust took the longest to generate across all three projects by a wide margin, roughly twice the time of Python and 50–70% more than TypeScript. However, the time difference might also stem from the toolchain and environment setup. For example, loading all the Rust dependencies and compiling them into a binary, usually takes longer than installing Python packages or Node modules.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Correctness:&lt;/strong&gt; All three languages passed the Vocab and Mandelbrot test suites completely. The Spending API is where things diverged: Rust passed 23 out of 24 test groups (failing only 1), Python passed 21 out of 24 (failing 3). The test failures are due to different returned HTTP status codes (400 vs 422) for invalid input, which is a minor correctness issue. In contrast, the TypeScript version could not even run the database migration and therefore produced no passing results at all.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Runtime performance:&lt;/strong&gt; The Mandebrot project is purely CPU-bound and serves as a test-bed for CPU performance. We measure the time it takes to generate one image. Rust and TypeScript matched each other at ~0.4 s for the Mandelbrot benchmark, while Python was 3.5× slower at 1.4 s.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;the-conclusion&quot;&gt;The conclusion&lt;/h1&gt;

&lt;p&gt;The hypothesis that Rust’s compile-time checks help vibe coding partially holds. Rust produced the most correct Spending API implementation and never crashed. But it comes at a cost: generation takes longer and is more expensive for complex projects. Python is the cheapest and fastest to generate and was surprisingly competitive on correctness. TypeScript was middle-ground on cost, but its generated code failed hardest on the most complex project.&lt;/p&gt;

&lt;p&gt;So, should you vibe code in Rust? As always, it depends. It depends on your priorities: cost vs execution speed and stability.&lt;/p&gt;

&lt;p&gt;The choice of language should depend on your application and your capabilities to review and maintain generated code.
Let’s be honest, not reviewing any of the generated code is a bad idea.&lt;/p&gt;

&lt;hr /&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Project&lt;/th&gt;
      &lt;th&gt;Language&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Cost (USD)&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Time (s)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Mandelbrot&lt;/td&gt;
      &lt;td&gt;Python&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;$0.38&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;127&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Mandelbrot&lt;/td&gt;
      &lt;td&gt;TypeScript&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;$0.35&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;166&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Mandelbrot&lt;/td&gt;
      &lt;td&gt;Rust&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;$0.23&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;323&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Vocab Trainer&lt;/td&gt;
      &lt;td&gt;Python&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;$0.80&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;171&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Vocab Trainer&lt;/td&gt;
      &lt;td&gt;TypeScript&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;$0.72&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;301&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Vocab Trainer&lt;/td&gt;
      &lt;td&gt;Rust&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;$0.87&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;526&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Spending API&lt;/td&gt;
      &lt;td&gt;Python&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;$0.92&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;367&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Spending API&lt;/td&gt;
      &lt;td&gt;TypeScript&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;$1.01&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;422&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Spending API&lt;/td&gt;
      &lt;td&gt;Rust&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;$1.31&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;771&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;</content><author><name>frank</name></author><category term="AI" /><summary type="html">If you are vibe coding and you don’t review any of the code, like a true vibe coder, you should build your system in Rust. I know, that’s a controversial statement. But hear me out. We will see how good the advice is. Why? Letting AI write your code in Rust has a couple of advantages: Better code quality. Rust’s strict compiler and ownership model can help prevent common bugs and memory issues, leading to more robust code. If it compiles, you have some assurance it will not crash on Saturday at 3 am. No AttributeError: &apos;NoneType&apos; object has no attribute &apos;something&apos; or whatever. Performance. Before the advent of good Generative AI, there was always a trade-off between speed of code and speed to code. Often, especially for early market tests, prototypes, or personal experiments, the speed of execution was not the bottleneck, but the speed of development was. When AI writes the code for you at lightning speed, the bottleneck shifts to defining the task and execution speed.</summary></entry><entry><title type="html">Claude Code as member of the engineering team</title><link href="https://frank.sauerburger.io/2026/03/03/claude-as-team-member.html" rel="alternate" type="text/html" title="Claude Code as member of the engineering team" /><published>2026-03-03T00:00:00+01:00</published><updated>2026-03-03T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2026/03/03/claude-as-team-member</id><content type="html" xml:base="https://frank.sauerburger.io/2026/03/03/claude-as-team-member.html">&lt;p&gt;I made the experiment: Claude Code as a full engineer on GitLab, as part of the development team for a mini game. I create tickets and review merge requests, while Claude Code submits code as merge requests and iterates on them if required. It worked surprisingly well.&lt;/p&gt;

&lt;p&gt;What’s the project? HTTP is the default protocol we use to connect with people and look up information on the web. For a long time, I wanted to offer a service accessible via SSH, where the terminal replaces the browser and SSH replaces HTTPS. To achieve that, I wanted to build a small terminal-based game. Voilà: AsciiMoria. An SSH-based game where players navigate their way through deep and dangerous mines, implemented in Rust.&lt;/p&gt;

&lt;h2 id=&quot;the-development-workflow&quot;&gt;The development workflow&lt;/h2&gt;
&lt;p&gt;Claude Code was wired into a project on GitLab, waiting to be assigned to issues with instructions on what to implement. Once assigned, Claude would work on a feature branch and interact with the ticket, asking for clarification if needed, for example. Each ticket would eventually result in a merge request for me to review. If I requested changes, or if the CI/CD pipeline failed, Claude would give it another shot and improve on the previous implementation.&lt;/p&gt;

&lt;p&gt;What I liked most about this workflow was remaining in charge. I create tickets, write specs, review merge requests, request changes, and make decisions about the direction of the project. This meant I could focus on thinking about the game’s direction, corner cases, and new features without getting lost in the nitty-gritty details of building a game or an SSH server in Rust. The separation of concerns felt natural: I owned the vision, Claude owned the implementation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/elv.png&quot; alt=&quot;Workflow of Claude code as an engineer and part of the development team&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-outcome&quot;&gt;The outcome&lt;/h2&gt;

&lt;p&gt;Coming straight to the point: the outcome of my experiment is a fun little game, free for everyone to play or fork.&lt;/p&gt;

&lt;p&gt;To play it, you need a terminal and an SSH client with a local private key for authentication (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ssh-keygen -t ed25519&lt;/code&gt; if you don’t have a key already).&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ssh asciimoria.com
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/asciimoria.png&quot; alt=&quot;Screenshots from the game&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The code is open source and available at &lt;a href=&quot;https://gitlab.sauerburger.com/frank/asciimoria&quot;&gt;https://gitlab.sauerburger.com/frank/asciimoria&lt;/a&gt;.
The repository contains all interactions and conversations with Claude Code.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/asciimoria_issues_mr.png&quot; alt=&quot;Entire backlog for the project and the corresponding merge requests opened by Claude&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Is the game bug free? I don’t think so. Would it be bug free if I wrote the game? Also no. I made some contributions to prove to myself that I understand the code.&lt;/p&gt;

&lt;h2 id=&quot;what-did-i-learn&quot;&gt;What did I learn?&lt;/h2&gt;
&lt;p&gt;A lot. I don’t want to sound foolish claiming that AI taught me things, but I freely admit I picked up many tricks along the way: how to approach certain patterns in Rust, the landscape of open source Rust libraries, new(-ish) features in Docker Compose files, new(-ish) features in GitLab CI/CD configs. Claude is extremely versatile and proficient across a remarkable breadth of technologies.&lt;/p&gt;

&lt;p&gt;I also learned just how good AI is at getting things right the first time.
We have all committed endless commit chains to fix a CI problem, caused by stupid mistakes, syntax errors, typos, or a general lack of understanding of how to configure the pipeline. Claude did none of that. In nearly every case, it had things right on the first or second attempt.&lt;/p&gt;

&lt;p&gt;Perhaps the most surprising lesson was about the level of detail in tickets. When I was thorough in the issue description, Claude followed it to the letter - it had all the specification it needed and delivered exactly what I asked for. On the other end, when tickets were brief and superficial, Claude found sensible ways to fill in the gaps, and I had very few bad surprises. However, there is a dangerous middle ground. When I tried to be clever in the ticket description without fully understanding the underlying complexity and tricky details, Claude would take me at my word and implement something rather nonsensical. To be clear, Claude did see the hidden complexity, it simply trusted that I had made a deliberate choice when writing the specs. The lesson: be either very detailed and understand the underlying complexity or be very brief. Vague specificity is the worst of both worlds.&lt;/p&gt;

&lt;h2 id=&quot;how-was-the-experience&quot;&gt;How was the experience?&lt;/h2&gt;
&lt;p&gt;Creating this game with AI feels a bit like cheating.&lt;/p&gt;

&lt;p&gt;On the other hand, it also gave a very rewarding feeling. Closing tickets and burning through your backlog in very little time releases a lot of dopamine. Going back to writing code the hard way afterwards feels a bit like going through detox.&lt;/p&gt;

&lt;p&gt;I was incredibly fascinated by how skilled my new engineer was. And I am not going to lie: it was very convenient to write tickets from my phone on the train and have a merge request ready by the time I got home, with passing unittests and the desired feature materialized in code.&lt;/p&gt;

&lt;p&gt;However, I can see people trying to keep an AI agent working 24/7, which is not going to be healthy for our work-life balance. Just because you can create tickets (or maybe even review code) on your phone does not mean you should.&lt;/p&gt;

&lt;h2 id=&quot;where-to-go-from-here&quot;&gt;Where to go from here&lt;/h2&gt;
&lt;p&gt;For hobby projects, there is one important question to ask: do you prefer the activity of coding, or do you prefer having the finished project? To code vs the code.&lt;/p&gt;

&lt;p&gt;Code generation has become very cheap, but the cost of ownership remains. You still have to understand the code you ship, maintain it, and make architectural decisions that go beyond what any AI can infer from a ticket. Even before good code generation was available, a key skill in software engineering was to focus on outcome rather than output. That principle is more relevant than ever, now that generating code is just a ticket away. The real work, deciding what to build and why, has not changed.&lt;/p&gt;</content><author><name>frank</name></author><category term="ai" /><category term="rust" /><summary type="html">I made the experiment: Claude Code as a full engineer on GitLab, as part of the development team for a mini game. I create tickets and review merge requests, while Claude Code submits code as merge requests and iterates on them if required. It worked surprisingly well.</summary></entry><entry><title type="html">Async database access in Rust with Diesel 101</title><link href="https://frank.sauerburger.io/2026/02/08/async-diesel-101.html" rel="alternate" type="text/html" title="Async database access in Rust with Diesel 101" /><published>2026-02-08T00:00:00+01:00</published><updated>2026-02-08T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2026/02/08/async-diesel-101</id><content type="html" xml:base="https://frank.sauerburger.io/2026/02/08/async-diesel-101.html">&lt;p&gt;&lt;a href=&quot;https://diesel.rs/&quot;&gt;Diesel&lt;/a&gt;,
a database ORM in Rust, provides compile-time type checks for its database operations, seamlessly integrating with blazingly fast, robust API applications built with
&lt;a href=&quot;https://github.com/tokio-rs/axum&quot;&gt;Axum&lt;/a&gt;.
As always, when working with a new technology, some of the details are difficult to remember (assuming you’re not vibe coding the entire app). For example, is it &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;load(&amp;amp;mut conn)&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;execute(&amp;amp;mut conn)&lt;/code&gt; for a select query? This article showcases a few simple queries. It doesn’t try to be a comprehensive tutorial. In the following, upper-case text is meant as a placeholder. The code assumes an auto-generated &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;schema.rs&lt;/code&gt; file and a hand-written &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;models.rs&lt;/code&gt; file with Rust representations of database data structures.&lt;/p&gt;

&lt;h2 id=&quot;select&quot;&gt;Select&lt;/h2&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SELECTED_TYPE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .inner_join(schema::OTHER_TABLE::table)&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.eq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// or   .select((schema::TABLE::FIELD, …) )&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// or   .select(model::TABLE.as_select())&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .offset(OFFSET)&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .limit(LIMIT)&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.load&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .first(&amp;amp;mut conn)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;.await&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;insert&quot;&gt;Insert&lt;/h2&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SELECTED_TYPE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VALUES_OBJECT&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.insert_into&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.returning&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;.get_result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// or without returning&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// .execute(&amp;amp;mut conn) &lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;.await&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;update&quot;&gt;Update&lt;/h2&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nn&quot;&gt;diesel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;.filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.eq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;.set&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIELD&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.eq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;.execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;.await&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name>frank</name></author><category term="rust" /><category term="internet" /><summary type="html">Diesel, a database ORM in Rust, provides compile-time type checks for its database operations, seamlessly integrating with blazingly fast, robust API applications built with Axum. As always, when working with a new technology, some of the details are difficult to remember (assuming you’re not vibe coding the entire app). For example, is it load(&amp;amp;mut conn) or execute(&amp;amp;mut conn) for a select query? This article showcases a few simple queries. It doesn’t try to be a comprehensive tutorial. In the following, upper-case text is meant as a placeholder. The code assumes an auto-generated schema.rs file and a hand-written models.rs file with Rust representations of database data structures.</summary></entry><entry><title type="html">Virt-install a Ubuntu VM from the terminal</title><link href="https://frank.sauerburger.io/2026/01/03/virt-install-ubuntu-on-terminal.html" rel="alternate" type="text/html" title="Virt-install a Ubuntu VM from the terminal" /><published>2026-01-03T00:00:00+01:00</published><updated>2026-01-03T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2026/01/03/virt-install-ubuntu-on-terminal</id><content type="html" xml:base="https://frank.sauerburger.io/2026/01/03/virt-install-ubuntu-on-terminal.html">&lt;p&gt;In recent releases of Ubuntu, the installation experience has changed significantly. Starting with Ubuntu 24.04, the default server ISO uses the new Subiquity-based live installer, and the traditional text-based “mini.iso” workflow is no longer provided in the same way.&lt;/p&gt;

&lt;p&gt;If you’re provisioning virtual machines via automation — especially on a headless host using KVM and virt-install — this can be frustrating. By default, virt-install may attempt to open a graphical console, pushing you toward VNC or SPICE even when you prefer a pure terminal-based workflow.
If you open the virt-install console, you’ll see something like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WARNING CDROM media does not print to the text console by default, so you likely will not see text install output. You might want to use --location. See the man page for examples of using --location with CDROM media
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This guide shows how to install Ubuntu 24.04 LTS using the live server ISO entirely from the terminal, launching the installer kernel and initrd directly from the ISO.&lt;/p&gt;

&lt;h2 id=&quot;define-virtual-machine-parameters&quot;&gt;Define Virtual Machine Parameters&lt;/h2&gt;

&lt;p&gt;Start by defining the VM configuration. The block below is fully editable — adjust CPU, RAM, disk size, and paths to fit your environment before running it.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot; contenteditable=&quot;&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ubuntu-vm&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_ISO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;https://releases.ubuntu.com/24.04.3/ubuntu-24.04.3-live-server-amd64.iso&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_OS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ubuntu-stable-latest&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_IMG&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;/var/lib/libvirt/images/&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_NAME&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;.qcow2&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_CORES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;2
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_DISKSIZE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;50
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_RAMSIZE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;4096
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_LOCAL_ISO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/tmp/ubuntu.iso
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Variable&lt;/th&gt;
      &lt;th&gt;Purpose&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_NAME&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Name of the virtual machine in libvirt&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_ISO&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Official Ubuntu 24.04 live server ISO&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_OS&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;OS variant for optimization&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_IMG&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Path to the VM disk image&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_CORES&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Number of virtual CPUs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_DISKSIZE&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Disk size in GB&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_RAMSIZE&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;RAM in MB&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VM_LOCAL_ISO&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Local ISO download path&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;!--
```bash
export VM_NAME=&quot;default&quot;
export VM_ISO=&quot;https://releases.ubuntu.com/24.04.3/ubuntu-24.04.3-live-server-amd64.iso&quot;
export VM_OS=&quot;ubuntu-stable-latest&quot;
export VM_IMG=&quot;/var/lib/libvirt/images/${VM_NAME}.qcow2&quot;
export VM_CORES=2
export VM_DISKSIZE=50
export VM_RAMSIZE=4096
```
--&gt;

&lt;h2 id=&quot;download-the-ubuntu-2404-iso&quot;&gt;Download the Ubuntu 24.04 ISO&lt;/h2&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot; contenteditable=&quot;&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;curl -o &quot;$VM_LOCAL_ISO&quot; -L &quot;$VM_ISO&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;install-ubuntu-24-using-terminal-installer&quot;&gt;Install Ubuntu 24 Using Terminal Installer&lt;/h2&gt;

&lt;p&gt;Here’s the key part: we instruct virt-install to boot directly from the installer kernel and initrd inside the ISO.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot; contenteditable=&quot;&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;virt-install &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--virt-type&lt;/span&gt; kvm &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--name&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$VM_NAME&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--os-variant&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$VM_OS&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--disk&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_IMG&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;,size&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_DISKSIZE&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;,bus&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;virtio,format&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;qcow2 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--memory&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$VM_RAMSIZE&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--vcpus&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$VM_CORES&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--graphics&lt;/span&gt; none &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--console&lt;/span&gt; pty,target_type&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;serial &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--location&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;VM_LOCAL_ISO&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;,kernel&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;casper/vmlinuz,initrd&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;casper/initrd &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--extra-args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;console=ttyS0,115200n8 --- console=ttyS0,115200n8&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;ubuntu-vm&quot;&gt;Ubuntu VM&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/screenshot-ubuntu-vm-1.png&quot; alt=&quot;Screenshot from the Ubuntu installer&quot; /&gt;
&lt;img src=&quot;/assets/screenshot-ubuntu-vm-2.png&quot; alt=&quot;Screenshot from the Ubuntu installer&quot; /&gt;&lt;/p&gt;</content><author><name>frank</name></author><category term="sysadmin" /><category term="vm" /><summary type="html">In recent releases of Ubuntu, the installation experience has changed significantly. Starting with Ubuntu 24.04, the default server ISO uses the new Subiquity-based live installer, and the traditional text-based “mini.iso” workflow is no longer provided in the same way.</summary></entry><entry><title type="html">Creating an Elasticsearch API token for another user</title><link href="https://frank.sauerburger.io/2025/11/01/create-elastic-api-token-for-other-user.html" rel="alternate" type="text/html" title="Creating an Elasticsearch API token for another user" /><published>2025-11-01T00:00:00+01:00</published><updated>2025-11-01T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2025/11/01/create-elastic-api-token-for-other-user</id><content type="html" xml:base="https://frank.sauerburger.io/2025/11/01/create-elastic-api-token-for-other-user.html">&lt;p&gt;Using the superuser &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;elastic&lt;/code&gt; API keys to access the Elasticsearch API is not recommended.
The API key is often used by remote clients on remote systems. Any attacker who might get access to the
token, can compromise the entire Elasticsearch instance.&lt;/p&gt;

&lt;p&gt;The solution is to use API keys for unprivileged roles and users. Creating these API keys, however, is not
straightforward. Unprivileged users usually don’t have permission to log in and create the API keys for themselves.
Run the following request as superuser to create API keys on behalf of unprivileged users, for example, in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/app/dev_tools#/console&lt;/code&gt; console.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;POST /_security/api_key/grant
&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;s2&quot;&gt;&quot;grant_type&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;password&quot;&lt;/span&gt;,
    &lt;span class=&quot;s2&quot;&gt;&quot;username&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;elastic&quot;&lt;/span&gt;,
    &lt;span class=&quot;s2&quot;&gt;&quot;password&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;YOUR SUPERUSER PASSWORD&quot;&lt;/span&gt;,
    &lt;span class=&quot;s2&quot;&gt;&quot;run_as&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;UNPRIVILEGED USERNAME&quot;&lt;/span&gt;,
    &lt;span class=&quot;s2&quot;&gt;&quot;api_key&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;s2&quot;&gt;&quot;name&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;NEW NAME FOR THAT API KEY&quot;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;More details can be found in the &lt;a href=&quot;https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-security-grant-api-key&quot;&gt;API docs&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;dedicated-api-keys-with-restricted-permissions&quot;&gt;Dedicated API keys with restricted permissions&lt;/h2&gt;

&lt;p&gt;Creating tokens for dedicated tasks, for example for 
&lt;a href=&quot;https://www.elastic.co/beats/filebeat&quot;&gt;Filebeat&lt;/a&gt; or &lt;a href=&quot;https://www.elastic.co/beats/metricbeat&quot;&gt;Metricbeat&lt;/a&gt;
clients, can be achieved through another endpoint.&lt;/p&gt;

&lt;p&gt;Create an API key for Metricbeat clients with the following request. See the &lt;a href=&quot;https://www.elastic.co/docs/reference/beats/metricbeat/beats-api-keys&quot;&gt;API docs&lt;/a&gt; for more details.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;POST /_security/api_key
&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;s2&quot;&gt;&quot;name&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;API KEY NAME&quot;&lt;/span&gt;, 
  &lt;span class=&quot;s2&quot;&gt;&quot;role_descriptors&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;s2&quot;&gt;&quot;metricbeat_writer&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; 
      &lt;span class=&quot;s2&quot;&gt;&quot;cluster&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;monitor&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;read_ilm&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;read_pipeline&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;,
      &lt;span class=&quot;s2&quot;&gt;&quot;index&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;s2&quot;&gt;&quot;names&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;metricbeat-*&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;,
          &lt;span class=&quot;s2&quot;&gt;&quot;privileges&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;view_index_metadata&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;create_doc&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;auto_configure&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Create an API key for Filebeat clients with the following request. See the &lt;a href=&quot;https://www.elastic.co/docs/reference/beats/filebeat/beats-api-keys&quot;&gt;API docs&lt;/a&gt; for more details.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;POST /_security/api_key
&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;s2&quot;&gt;&quot;name&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;API KEY NAME&quot;&lt;/span&gt;,
  &lt;span class=&quot;s2&quot;&gt;&quot;role_descriptors&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;s2&quot;&gt;&quot;filebeat_writer&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&quot;cluster&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;monitor&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;read_ilm&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;read_pipeline&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;,
      &lt;span class=&quot;s2&quot;&gt;&quot;index&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;s2&quot;&gt;&quot;names&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;filebeat-*&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;,
          &lt;span class=&quot;s2&quot;&gt;&quot;privileges&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;view_index_metadata&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;create_doc&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;auto_configure&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name>frank</name></author><category term="internet" /><category term="sysadmin" /><summary type="html">Using the superuser elastic API keys to access the Elasticsearch API is not recommended. The API key is often used by remote clients on remote systems. Any attacker who might get access to the token, can compromise the entire Elasticsearch instance.</summary></entry><entry><title type="html">llms.txt adoption</title><link href="https://frank.sauerburger.io/2025/09/03/llms-txt-adoption.html" rel="alternate" type="text/html" title="llms.txt adoption" /><published>2025-09-03T00:00:00+02:00</published><updated>2025-09-03T00:00:00+02:00</updated><id>https://frank.sauerburger.io/2025/09/03/llms-txt-adoption</id><content type="html" xml:base="https://frank.sauerburger.io/2025/09/03/llms-txt-adoption.html">&lt;p&gt;Exactly one year ago, Jeremy Howard published a &lt;a href=&quot;https://llmstxt.org&quot;&gt;proposal&lt;/a&gt; to make the web more accessible to AI and, in particular, to LLMs. How many of the top one million websites adopt this approach?&lt;/p&gt;

&lt;p&gt;The proposed standard suggests creating a file at the root of a website, e.g., &lt;a href=&quot;https://llmstxt.org/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt;&lt;/a&gt;,
intended to be consumed by LLMs and AI tools, loosely taking inspiration from &lt;a href=&quot;https://www.robotstxt.org/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/robots.txt&lt;/code&gt;&lt;/a&gt;.
The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; serves as an entry point or site map of the website, potentially linking to other pages.
As the source code of a webpage is often very verbose and its content is mingled with style sheets, JavaScript, and HTML markup,
parsing the source with an LLM might exceed the LLM’s content window or consume too many tokens.
Therefore, the idea is to use Markdown for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; entry point and to link to Markdown versions of each page.&lt;/p&gt;

&lt;p&gt;How many websites adopted this approach?&lt;/p&gt;

&lt;!-- snip --&gt;

&lt;h1 id=&quot;lets-measure&quot;&gt;Let’s measure.&lt;/h1&gt;

&lt;p&gt;Starting with a dataset of the &lt;a href=&quot;https://www.domcop.com/top-10-million-websites&quot;&gt;10 million highest-ranked domains&lt;/a&gt;
from &lt;a href=&quot;https://www.domcop.com/openpagerank/what-is-openpagerank&quot;&gt;Open Page Rank&lt;/a&gt;,
we can send a GET request to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; for each of them and see how many web servers respond with an HTTP success code.
It turns out, a lot of web servers respond with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;200&lt;/code&gt; status code but actually send a
page informing the client that the page doesn’t exist. 
For example, the top-ranked domain, facebook.com, behaves in that way.
A GET request to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://facebook.com/llms.txt&lt;/code&gt; returns a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;200&lt;/code&gt; status code, but the pages says the content is not available.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/facebook-llmstxt.png&quot; alt=&quot;Unusual behavior of facebook.com&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A typical feature of these fake-success responses is that the response page is an HTML document.
However, sometimes the Content-Type header field is not a reliable discriminator to detect fake-success pages.
A good way to distinguish HTML content from Markdown is to compare the number of occurrences of the left angle bracket (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;&lt;/code&gt;) character to the
left square bracket (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[&lt;/code&gt;).
The first is ubiquitous in HTML, while the latter is common in Markdown.
I came up with the following somewhat arbitrary rules. Only if a response satisfies all of them, I count it as a valid &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; response.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;HTTP response code must be less than 400&lt;/li&gt;
  &lt;li&gt;Content-Type header must start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;text/plain&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;The web server must accept the connection within 3 seconds&lt;/li&gt;
  &lt;li&gt;The response must arrive within 10 seconds&lt;/li&gt;
  &lt;li&gt;The site must support HTTPS&lt;/li&gt;
  &lt;li&gt;The response content must be longer than 500 chars&lt;/li&gt;
  &lt;li&gt;There must be more left square brackets than left angle brackets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To speed up the analysis, it limited it to the top one million domains.
I ran the analysis on September 2, 2025.&lt;/p&gt;

&lt;h1 id=&quot;results&quot;&gt;Results&lt;/h1&gt;

&lt;p&gt;Domains ranked high in the domain ranking might be faster to adopt new technological ideas. 
To test this, I counted the fraction of domains with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/llms.txt&lt;/code&gt; among the top n domains.
The result is shown in the following chart.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/llmstxt.png&quot; alt=&quot;Barchart of the fraction of domains with llms.txt among the top n domains&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Based on these results, we see that the largest adoption rate at 4 % is among the top 300 domains.
The fraction continuously decreases further down the ranking. Looking at the top one million domains,
we see that the overall adoption rate drops to around 1.2 %.
In total, that corresponds to 12174 domains.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://gitlab.sauerburger.com/frank/llmstxt-adoption&quot;&gt;crawler, the analysis code, and the result dataset&lt;/a&gt;
are available in a Git repository.&lt;/p&gt;</content><author><name>frank</name></author><category term="internet" /><category term="ai" /><summary type="html">Exactly one year ago, Jeremy Howard published a proposal to make the web more accessible to AI and, in particular, to LLMs. How many of the top one million websites adopt this approach? The proposed standard suggests creating a file at the root of a website, e.g., /llms.txt, intended to be consumed by LLMs and AI tools, loosely taking inspiration from /robots.txt. The /llms.txt serves as an entry point or site map of the website, potentially linking to other pages. As the source code of a webpage is often very verbose and its content is mingled with style sheets, JavaScript, and HTML markup, parsing the source with an LLM might exceed the LLM’s content window or consume too many tokens. Therefore, the idea is to use Markdown for the /llms.txt entry point and to link to Markdown versions of each page. How many websites adopted this approach?</summary></entry><entry><title type="html">Magic floating-point numbers: NaNs</title><link href="https://frank.sauerburger.io/2025/08/18/magic-floating-point-numbers-nans.markdown.html" rel="alternate" type="text/html" title="Magic floating-point numbers: NaNs" /><published>2025-08-18T00:00:00+02:00</published><updated>2025-08-18T00:00:00+02:00</updated><id>https://frank.sauerburger.io/2025/08/18/magic-floating-point-numbers-nans.markdown</id><content type="html" xml:base="https://frank.sauerburger.io/2025/08/18/magic-floating-point-numbers-nans.markdown.html">&lt;p&gt;After following &lt;a href=&quot;https://www.youtube.com/watch?v=y-NOz94ZEOA&quot;&gt;Laurie Kirk down a rabbit hole on subnormal numbers in the IEEE 754 float specification&lt;/a&gt;,
I stumbled upon other interesting properties of floating-point numbers, specifically how NaNs (Not a Number) are represented in binary.
After more than 10 years of scientific computing and data science, I thought there was nothing about floats that could surprise me, but oh, was I wrong.
Let’s see if I can surprise you.
I’ve built the computer-science equivalent of a magic trick to showcase these properties.&lt;/p&gt;

&lt;h2 id=&quot;the-magic-trick&quot;&gt;The magic trick&lt;/h2&gt;
&lt;p&gt;The trick works in two stages:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;You choose a phrase of your liking. With a special Python function, you can convert it into a numpy array of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;s. It’s a normal array. It’s normal NaNs. Your phrase is nowhere to be seen.&lt;/li&gt;
  &lt;li&gt;You send the numpy array to an API endpoint at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://magicfloat.sauerburger.io/unravel&lt;/code&gt;. Using advanced magic (knowledge of &lt;a href=&quot;https://en.wikipedia.org/wiki/IEEE_754&quot;&gt;IEEE 754&lt;/a&gt;), I can unravel your secrets by looking at the array of NaNs.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;step-one-enchanting-your-phrase&quot;&gt;Step one: Enchanting your phrase&lt;/h3&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;enchant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;phrase&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ndarray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;frombuffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;sa&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\xff\x80\x7f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phrase&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;encode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;utf-8&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;]),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you call that with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;Computers are fun!&quot;&lt;/code&gt;, you get a numpy array of floats with no signs of the phrase. It seems the message is gone.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;box&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;enchant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Computers are fun!&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,)&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;float32&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;step-two-open-the-magic-nan-array&quot;&gt;Step two: Open the magic NaN array&lt;/h3&gt;

&lt;p&gt;I’m providing an API endpoint at &lt;a href=&quot;https://magicfloat.sauerburger.io/unravel&quot;&gt;https://magicfloat.sauerburger.io/unravel&lt;/a&gt; that takes the binary version of the numpy array and responds with your original phrase.
The following function does the necessary encoding and request handling.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;requests&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;unravel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ndarray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;raise&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;ValueError&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Magic box must be float32.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requests&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://magicfloat.sauerburger.io/unravel&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tobytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;raise&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;RuntimeError&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;The planets don&apos;t seem to align: %s&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we continue the example from above, we get: &lt;em&gt;drum roll&lt;/em&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unravel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;box&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&apos;Computers are fun!&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;how-does-it-work&quot;&gt;How does it work?&lt;/h2&gt;

&lt;p&gt;Floating-point numbers are represented using three components,&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;the sign of the numbers,&lt;/li&gt;
  &lt;li&gt;the exponent used with base 2, and&lt;/li&gt;
  &lt;li&gt;the fractional part of the number, the mantissa.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In memory, they are arranged as follows. The order might be different depending on the &lt;a href=&quot;https://frank.sauerburger.io/2022/01/26/big-and-little-endian.html&quot;&gt;endianness&lt;/a&gt; of your platform.&lt;/p&gt;

&lt;table&gt;
&lt;tr style=&quot;color: #fff; text-align: center&quot;&gt;
&lt;td style=&quot;width: 1em;&quot;&gt; &lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9C165B&quot;&gt;x&lt;/td&gt;

&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #16539C&quot;&gt;1&lt;/td&gt;

&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;

&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;

&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;
&lt;td style=&quot;padding: 10px 0; width: 1em; background-color: #9F4214&quot;&gt;x&lt;/td&gt;

&lt;/tr&gt;
&lt;tr style=&quot;color: #000; background-color: #fff&quot;&gt;
&lt;td style=&quot;text-align: center&quot; colspan=&quot;2&quot;&gt;Sign&lt;/td&gt;
&lt;td style=&quot;text-align: center&quot; colspan=&quot;8&quot;&gt;Biased exponent (8 bits)&lt;/td&gt;
&lt;td style=&quot;text-align: center&quot; colspan=&quot;23&quot;&gt;Mantissa (23 bits)&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;A few combinations of bits have a special meaning, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+inf&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-inf&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;.
When the exponent is all-ones (as shown in the chart), it represents one of the aforementioned three cases.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Sign&lt;/th&gt;
      &lt;th&gt;Biased exponent&lt;/th&gt;
      &lt;th&gt;Mantissa&lt;/th&gt;
      &lt;th&gt;Special meaning&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;all ones: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1111 1111&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;all zero&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+inf&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;all ones: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1111 1111&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;all zero&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-inf&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;any&lt;/td&gt;
      &lt;td&gt;all ones: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1111 1111&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;at least one bit not zero&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;We observe that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+inf&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-inf&lt;/code&gt; each have a unique binary representation.
However, for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;, we have 2^24 - 1 possible binary representations.
For my little magic trick, I pack one UTF-8 encoded byte in each 32-bit float number.
I invite you to &lt;a href=&quot;https://gitlab.sauerburger.com/frank/magicfloat&quot;&gt;discover&lt;/a&gt; the details yourself.&lt;/p&gt;</content><author><name>frank</name></author><category term="python" /><summary type="html">After following Laurie Kirk down a rabbit hole on subnormal numbers in the IEEE 754 float specification, I stumbled upon other interesting properties of floating-point numbers, specifically how NaNs (Not a Number) are represented in binary. After more than 10 years of scientific computing and data science, I thought there was nothing about floats that could surprise me, but oh, was I wrong. Let’s see if I can surprise you. I’ve built the computer-science equivalent of a magic trick to showcase these properties.</summary></entry><entry><title type="html">LLM basics and its applications</title><link href="https://frank.sauerburger.io/2025/07/16/llm-basic-and-its-applications.html" rel="alternate" type="text/html" title="LLM basics and its applications" /><published>2025-07-16T00:00:00+02:00</published><updated>2025-07-16T00:00:00+02:00</updated><id>https://frank.sauerburger.io/2025/07/16/llm-basic-and-its-applications</id><content type="html" xml:base="https://frank.sauerburger.io/2025/07/16/llm-basic-and-its-applications.html">&lt;p&gt;Technical innovations often offer numerous applications with tremendous added value, making work easier. However, this also increases the potential for misuse, with the opposite effect—whether deliberately or through improper use. This also applies to advances in the field of Artificial Intelligence (AI). The aim of this article is to shed light on how current text-based AI applications work, so that they can be used meaningfully and appropriately in the field of safety science. The focus here is not to discourage their use, but to encourage an active discussion about sensible applications and their adoption.&lt;/p&gt;

&lt;!-- snip --&gt;

&lt;div class=&quot;alert alert-primary&quot; role=&quot;alert&quot;&gt;
&lt;i class=&quot;fa fa-info-circle&quot;&gt;&lt;/i&gt; &lt;b&gt;Note:&lt;/b&gt;
The is the written version of my presentation titled &lt;i&gt;Challenge &quot;Artificial Intelligence&quot; in relation to the assessment of safety&lt;/i&gt; at &lt;a href=&quot;https://auva.at/veranstaltungen/forum-praevention-international-2025/programm-program/#MIE&quot;&gt;XXXIX. International GfS Symposium&lt;/a&gt;, Vienna on Mai 21, 2025.
The text as translated from German to English using a Large Language Model (LLM) and is not a verbatim transcript of the presentation.
&lt;/div&gt;

&lt;p&gt;From a user’s perspective, lack of understanding of how AI applications work creates several obstacles. In the following, I will focus on text-based applications powered by Large Language Models (LLMs). Since the launch of ChatGPT in November 2022, it took only a few months before insufficient understanding during use made headlines. As various media reported, a lawyer in New York used ChatGPT to research legal precedents, which he then submitted to the court. It later emerged that most of the cases presented by ChatGPT were either incorrectly cited or completely made up. This behavior, known among experts as “hallucination,” is a characteristic of LLMs. The lawyer claimed to have acted under the assumption that ChatGPT was a search engine. The legal consequences in this case led to the lawyer being fined. Numerous reports of similar cases have emerged over the past two years.&lt;/p&gt;

&lt;p&gt;The example above illustrates how important it is to understand how AI applications work. This is not limited to ChatGPT and can also be transferred to other AI tools. Below, I describe how language models function and how modern AI tools are derived from them.&lt;/p&gt;

&lt;h2 id=&quot;large-language-models&quot;&gt;Large Language Models&lt;/h2&gt;
&lt;p&gt;Large Language Models are neural networks, that is, in a broad sense, nonlinear mathematical functions that compute an output from an input. In the case of language models, both the input and output are text. LLMs are used in almost all generative, text-based AI services. However, they are often hidden behind several layers of application-specific logic. The foundation for today’s models was laid by Google employees in 2017 with the Transformer architecture and the so-called Attention mechanism.&lt;/p&gt;

&lt;p&gt;The term “Large” in Large Language Models refers to the number of parameters and the associated memory requirement. There is no clear cutoff to define the term and it is expected that the development of ever larger language models will continue. If the language model is viewed as a mathematical function, the number of parameters becomes comparable: A first-degree polynomial, i.e. a function that describes a straight line in a plane, has two parameters; a second-degree polynomial (parabola) has three. In general, GPT-1 is considered the first LLM, which is described by 117 million parameters. Today’s models have up to 700 billion parameters, requiring specialized hardware for their application. Further examples of large language models include OpenAI’s GPT-4.1 or o3, as well as openly available models such as BLOOM, Llama 3, and Mixtral.&lt;/p&gt;

&lt;h3 id=&quot;functionality&quot;&gt;Functionality&lt;/h3&gt;
&lt;p&gt;LLMs do not work directly with words or characters, but with “tokens”—essentially the alphabet of the language model. A token reflects a semantic unit of a word and is the smallest element the model understands. In English, a token is often equated to about ¾ of a word, so on average about 1⅓ tokens are required to form a word. For a language model to process text, it is first broken down into a sequence of tokens and each token is identified by an integer. The original text is thus translated into a chain of numbers. Typically, the vocabulary of modern language models contains about 20,000 to 200,000 different tokens.&lt;/p&gt;

&lt;p&gt;Large language models for generative applications are mostly trained to predict the next token based on a given sequence of tokens—so-called Next Token Prediction. Put simply, the goal of training is to optimize the vast number of parameters in the language model so that it can predict the next token for texts from the training corpus as accurately as possible. To train such a large number of parameters, a correspondingly large dataset of texts is required. For high precision, the language model must be capable of understanding context both within sentences and across entire texts. Earlier approaches in computational linguistics, such as Markov chains or neural networks with Long Short-Term Memory (LSTM), do not achieve comparable results.&lt;/p&gt;

&lt;p&gt;Sticking with the view of language models as mathematical functions, so far the model appears to compute the next token from a sequence of input tokens. By repeatedly applying the function, always appending the predicted token to the input sequence, it is possible to continue or complete a started text. This is illustrated in the following figure. To enhance readability, input and output texts are shown as plain text rather than tokenized. The model completes the started sentence. Although this appears to work well at first glance, the phenomenon of hallucination also appears here, as the described “FINTURBO Cup” does not exist.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna1.png&quot; alt=&quot;Example of text completion by language model Qwen2-72B. Words generated by the model are shown in blue.
&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;mental-model-for-the-model&quot;&gt;Mental Model for the Model&lt;/h3&gt;
&lt;p&gt;According to a 2022 survey, computational linguists were divided on whether LLMs truly “understand” natural language in a non-trivial sense, or merely remix and parrot back training texts. It is unclear how language models solve linguistic tasks, and they are often described as black boxes.&lt;/p&gt;

&lt;p&gt;The author believes this question is not productive, as it enters deeply philosophical territory. For instance, to answer it, we would first need to define what is meant by “understanding” and to what extent a machine can actually understand anything.&lt;/p&gt;

&lt;p&gt;From a practical perspective, it makes sense to use the following mental model for LLMs: Since LLMs are above all trained to predict the next word or token, it is reasonable to say that LLMs merely imitate human language and conversation. It shouldn’t be surprising that such imitation can still be extremely useful for solving tasks. Throughout this article, I will refer back to this perspective to explain the behavior of LLMs in specific examples.&lt;/p&gt;

&lt;h2 id=&quot;techniques&quot;&gt;Techniques&lt;/h2&gt;
&lt;p&gt;The following sections highlight various techniques, all of which are applied in one form or another in today’s AI tools.&lt;/p&gt;

&lt;h3 id=&quot;paradigm-shift-prompt-engineering&quot;&gt;Paradigm Shift: Prompt Engineering&lt;/h3&gt;
&lt;p&gt;Programming has traditionally meant that a computer executes the instructions in program code exactly as written. This can be surprising when, for example, an “obviously” correct algorithm is carried out by the machine differently than a human would expect. Translating abstract ideas into concrete, specific computer instructions is a core part of programming.&lt;/p&gt;

&lt;p&gt;Prompt Engineering marks a radical departure from this principle. When language models are used, instructions are no longer executed with mathematical precision. Prompt Engineering refers to the practice of writing instructions—called prompts—to a language model so that it completes a task as reliably and accurately as possible. A prompt is executed by the model in the context of simulated conversation. Small, seemingly insignificant changes to the prompt’s wording can lead to radically different answers. For instance, if a language model’s output must adhere to a specific format to work with traditional software, it is not uncommon to remind the model several times within the prompt about the desired output format, as shown in the following figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna2.png&quot; alt=&quot;Instruction to a language model to analyze product reviews.
&quot; /&gt;&lt;/p&gt;

&lt;p&gt;LLMs tend to invent or guess answers when they do not know the correct response—this is the already described phenomenon of hallucination. Providers of AI services attempt to avoid or reduce hallucination in their products. Because of this tendency, pure language models are not suitable as factual knowledge bases or search engines. Ignorance of this property led to the initial example with the lawyer.&lt;/p&gt;

&lt;p&gt;As radical and paradoxical as the departure from mathematical precision in programming may seem, this paradigm shift dramatically increases the range of possible applications. Writing a short textual instruction to an LLM can solve problems that could not be addressed with conventional programming, or only through great effort using traditional computational linguistics techniques. Applications can more quickly adapt to changing requirements simply by changing prompts, whereas retraining a classic computational linguistics model takes significant time.&lt;/p&gt;

&lt;h3 id=&quot;in-context-learning&quot;&gt;In-Context Learning&lt;/h3&gt;
&lt;p&gt;Language models have a limited knowledge base, known as the knowledge cut-off. During training, a large corpus of texts is used. The model cannot know anything about events not included in its training set, for example because they occurred after training was completed. Language models generally cannot access all information from the training dataset as a knowledge base would. The purpose of the dataset is to teach the model language and conversation—not to memorize facts and events.&lt;/p&gt;

&lt;p&gt;Since training LLMs is time- and resource-intensive, there are other ways to convey new information to them. A fundamental method is so-called in-context learning, in which, at the start of a conversation with the language model—i.e. after training is finished—the model is given all necessary information for the task as part of the input text. This can include background knowledge required for a task, or examples that demonstrate how the task should be solved (“few-shot learning”). The following figure shows a conversation in which the language model is taught, using examples, how to rephrase sentences.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna3.png&quot; alt=&quot;Conversation using in-context learning, where Qwen2-72B-Instruct is given example instructions. Words generated by the model are shown in blue.&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;external-knowledge-and-logical-reasoning&quot;&gt;External Knowledge and Logical Reasoning&lt;/h3&gt;
&lt;p&gt;The concept of in-context learning can be extended and combined with classic knowledge bases. Information required for answering a question or carrying out a task is added to the conversational context by extracting it from a knowledge base. This technique is called Retrieval Augmented Generation (RAG). RAG applications differ greatly depending on the data source. A RAG application can consult a single document, an encyclopedia like Wikipedia, or all freely available internet content.&lt;/p&gt;

&lt;p&gt;Developing the retrieval aspect is essential for a successful RAG system. If the extracted content does not include the information needed to answer the question, the language model cannot provide a meaningful response. RAG is often seen as another method to reduce LLM hallucinations. The following illustrates the workings of a RAG system with an example.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna4.png&quot; alt=&quot;Example of a RAG system using the language model Qwen2-72B-Instruct. Relevant background info is inserted into the conversation based on the user&apos;s question. Words generated by the model are shown in blue.
&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The analogy—that language models merely imitate conversation—points to another limitation that was humorously noted online in the early days of ChatGPT: the lack of logical reasoning ability.&lt;/p&gt;

&lt;p&gt;THe following figure illustrates that the language model did not reach the correct logical or mathematical conclusion. The example uses in-context learning to demonstrate what the model is being asked to do. It remains unclear how the model arrived at the answer “27”; the correct answer is 9.&lt;/p&gt;

&lt;p&gt;A common technique to guide language models toward logical reasoning is called Chain-of-Thought Prompting (COT prompting), in which the model is encouraged to document and explain intermediate steps. Since of the way the model works, each subsequent step is generated only after preceding intermediates have been “put on paper” and made part of the input. THe following figures shows the same task, this time solved correctly with COT prompting.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna5.png&quot; alt=&quot;Example of conversation with a language model, in which the model incorrectly solves a mathematical riddle. Words generated by the model are shown in blue.
&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/vienna6.png&quot; alt=&quot;Example of a conversation with a language model using chain-of-thought prompting to correctly solve a mathematical riddle. Words generated by the model are shown in blue.
&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;agents-and-fine-tuning&quot;&gt;Agents and Fine-Tuning&lt;/h3&gt;
&lt;p&gt;Innovations in prompt engineering and related techniques allow a language model to be used for a wide range of applications. However, its capabilities are limited to pure text (or audio/image) output. Interaction with external systems requires so-called agents, which can enable complex computations and allow language models to interact with external systems.&lt;/p&gt;

&lt;p&gt;Through prompt engineering, a language model is told, in its instruction, what external tools are available for solving a task. This could include a calculator, an external API (to manage emails, appointments, or contacts), a RAG-based search, a free web search, or a coding environment.&lt;/p&gt;

&lt;p&gt;Users can assign a task to the language model. The model is instructed to select, from the available tools, the one required to solve the problem. The model communicates the selection and use of tools via a special answer format. Complex systems allow for the use of multiple tools to solve the task step by step.&lt;/p&gt;

&lt;p&gt;A language model with internet search and route planner abilities could, for example, handle the request “How long does it take to drive from Vienna to the host venue of the Eurovision Song Contest 2025?” by first using a web search to find where Eurovision will be held in 2025, then using the route planner to calculate the travel time from Vienna to Basel. The answer is about eight hours.&lt;/p&gt;

&lt;p&gt;The potential of agent-based applications is virtually limitless in our connected digital world. Today’s language models are often trained on both natural and programming languages, allowing them to generate functional code. In combination with a programming environment, language models can write complex algorithms needed to solve a task and then execute them. The limits of such systems in the future are currently unforeseeable.&lt;/p&gt;

&lt;p&gt;All the techniques discussed so far use prompt engineering to influence the work of the language model. To conclude, there is another technique that does not rely on prompt engineering: Through fine-tuning, a pre-trained language model can be customized so that it automatically follows a fixed, predefined instruction. In a way, the model loses its ability to react generically to all prompts. On the other hand, the model doesn’t have to be told its task at the start of every conversation. Fine-tuning is suitable when a language model will be used for a large number of identical tasks. Depending on the fine-tuning method, all the parameters of the model may be adjusted. However, this is often too computationally expensive. Alternatives like PEFT enable fine-tuning by adding or adjusting only a small number of new parameters.&lt;/p&gt;

&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;Meaningful use of AI in safety sciences requires understanding both the underlying technology and its limitations. By presenting some essential techniques used in today’s AI tools, this article aims to make a contribution here.&lt;/p&gt;

&lt;p&gt;The landscape of commercial and open-source language models, and their abilities and methods, is changing rapidly. It is difficult to make predictions about the future of AI. But AI will certainly change the way many industries work, and companies that fail to use AI tools will have a hard time competing. It is therefore all the more important to seize the opportunities that AI provides. To make this easier, this article has highlighted some technical boundaries, to help avoid mistakes in application.&lt;/p&gt;</content><author><name>frank</name></author><category term="ai" /><summary type="html">Technical innovations often offer numerous applications with tremendous added value, making work easier. However, this also increases the potential for misuse, with the opposite effect—whether deliberately or through improper use. This also applies to advances in the field of Artificial Intelligence (AI). The aim of this article is to shed light on how current text-based AI applications work, so that they can be used meaningfully and appropriately in the field of safety science. The focus here is not to discourage their use, but to encourage an active discussion about sensible applications and their adoption.</summary></entry><entry><title type="html">Restore a collection from a Qdrant snapshot stored in S3</title><link href="https://frank.sauerburger.io/2025/02/26/restore-qdrant-snapshot-from-s3.html" rel="alternate" type="text/html" title="Restore a collection from a Qdrant snapshot stored in S3" /><published>2025-02-26T00:00:00+01:00</published><updated>2025-02-26T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2025/02/26/restore-qdrant-snapshot-from-s3</id><content type="html" xml:base="https://frank.sauerburger.io/2025/02/26/restore-qdrant-snapshot-from-s3.html">&lt;p&gt;The lightning-fast vector database &lt;a href=&quot;https://qdrant.tech&quot;&gt;Qdrant&lt;/a&gt; has supported 
creating snapshots of its collection on S3 since version v1.10. That’s convenient and simplifies storing
snapshots as backups on multiple machines. However, as of 2025, there is no way to restore
a collection from a snapshot on S3. This article describes a walkaround using pre-signed URLs in S3.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/qdrant-s3@2x.png&quot; alt=&quot;Illustration of a backup and restore workflow in Qdrant via an S3 bucket&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;creating-the-snapshot&quot;&gt;Creating the snapshot&lt;/h2&gt;

&lt;p&gt;A Qdrant instance can be &lt;a href=&quot;https://qdrant.tech/documentation/concepts/snapshots/#s3&quot;&gt;configured to use S3 as its storage for snapshots&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;storage&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;snapshots_config&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;snapshots_storage&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;s3&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;s3_config&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;bucket&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_bucket_here&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_bucket_region_here&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;access_key&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_access_key_here&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;secret_key&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_secret_key_here&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;endpoint_url&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;your_url_here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The next snapshot request will be written to the S3 bucket.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;qdrant_client&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QdrantClient&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QdrantClient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;http://localhost:6333&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;snapshot_info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_snapshot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;collection_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my-collection&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;creating-a-pre-signed-url&quot;&gt;Creating a pre-signed URL&lt;/h2&gt;

&lt;p&gt;From the snapshot_info object, we can figure out the name of the snapshot within the bucket.
Alternatively, inspecting the bucket directly works as well. Let’s assume, the snapshot is called&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;snapshots/my-collection/my-collection_123456789.snapshot
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since Qdrant is not able to read and restore a snapshot from S3 directly, we can use S3’s pre-signed URLs.
A pre-signed URL contains a signature that authorizes everyone with the URL to access a specific object in the
bucket—assuming that the API and the bucket is not configured to be entirely private.&lt;/p&gt;

&lt;p&gt;AWS’s Python client &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;boto3&lt;/code&gt; is an easy way to create pre-signed URLs.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;boto3&lt;/span&gt;


&lt;span class=&quot;n&quot;&gt;session&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;boto3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;session&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Session&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;s3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;session&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;service_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;s3&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;aws_access_key_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;aws_secret_access_key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;endpoint_url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generate_presigned_url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;&apos;get_object&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;Params&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&apos;Bucket&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&apos;Key&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;snapshots/my-collection/my-collection_123456789.snapshot&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ExpiresIn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Expires in 7 days
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Creating pre-signed URLs doesn’t require internet access. The key id and access key pair is sufficient
to create the signature offline. The S3 server validates the signature and evaluates permissions once
the resource is requested.&lt;/p&gt;

&lt;p&gt;The HTTP URL points to the snapshot file. This could look like&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&apos;https://s3.sauerburger.com/&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;qdrant-backup/snapshots/my-collection/my-collection_123456789.snapshot?&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Credential=a4tw715etr74zq2dhxdp70s1%2F20250212%2Feu-central-1%2Fs3%2Faws4_request&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Date=20250212T194041Z&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Expires=604800&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-SignedHeaders=host&amp;amp;&apos;&lt;/span&gt; \
    &lt;span class=&quot;s&quot;&gt;&apos;X-Amz-Signature=23ea0fcfa91bb5e3ebea847e79cc69f08224e58275890097e28bed9e1de018df&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;restoring-a-collection&quot;&gt;Restoring a collection&lt;/h2&gt;

&lt;p&gt;The final step is to instruct Qdrant to restore a collection from a snapshot via HTTP.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;qdrant_client&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QdrantClient&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QdrantClient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;http://localhost:6333&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;success&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;qdrant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;recover_snapshot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;my-collection&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;wait&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;priority&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;snapshot&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Done.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;Qdrant → S3 → Pre-signed HTTP URL → Qdrant&lt;/p&gt;</content><author><name>frank</name></author><category term="ai" /><category term="python" /><summary type="html">The lightning-fast vector database Qdrant has supported creating snapshots of its collection on S3 since version v1.10. That’s convenient and simplifies storing snapshots as backups on multiple machines. However, as of 2025, there is no way to restore a collection from a snapshot on S3. This article describes a walkaround using pre-signed URLs in S3.</summary></entry><entry><title type="html">Network analysis with Scapy and Polars</title><link href="https://frank.sauerburger.io/2025/01/29/network-analysis-scapy-polars.html" rel="alternate" type="text/html" title="Network analysis with Scapy and Polars" /><published>2025-01-29T00:00:00+01:00</published><updated>2025-01-29T00:00:00+01:00</updated><id>https://frank.sauerburger.io/2025/01/29/network-analysis-scapy-polars</id><content type="html" xml:base="https://frank.sauerburger.io/2025/01/29/network-analysis-scapy-polars.html">&lt;p&gt;Sometimes, debugging state-of-the-art AI applications in an on-premise Kubernetes cluster requires capturing network packets and performing complex statistical traffic exploration and analysis.
Traffic is easily captured with&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo tcpdump -i any -s 65535 -w /tmp/capture.pcap
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and decoded with &lt;a href=&quot;https://wireshark.org&quot;&gt;Wireshark&lt;/a&gt;.
However, complex analyses require other tools.
Let’s open the data scientists’ toolbox: &lt;a href=&quot;https://pola.rs&quot;&gt;Polars&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;loading&quot;&gt;Loading&lt;/h2&gt;

&lt;p&gt;The basic idea is to use &lt;a href=&quot;https://scapy.net/&quot;&gt;Scapy&lt;/a&gt; to read the capture file, decode the packets and various protocols, and organize the data in a Polars dataframe.
In this example, let’s extract the source and destination IP address, the packet length,
and the query domain name from DNS packets.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;polars&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;scapy.all&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sa&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;scapy.all&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PcapReader&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;seaborn&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sns&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tqdm&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tqdm&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PcapReader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;capture.pcap&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;IP:src&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;                    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;  &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;IP:dst&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dst&lt;/span&gt;                    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;  &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;IP:len&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;                    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;IP&quot;&lt;/span&gt;  &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;DNS:qcode&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sprintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%DNS.opcode%&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;DNS&quot;&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;qname&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;qd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;DNS&quot;&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[],&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;packet&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tqdm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;IP:src&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;IP:dst&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;IP:len&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;DNS:qcode&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()),&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Derive additional columns
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_columns&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;internal&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP:src&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;starts_with&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;10.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP:dst&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;starts_with&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;10.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Additionally, we can derive additional columns, such as whether the packet is internal or external, based on the IP addresses. A more robust analysis could include the packet’s
IP addresses as 32-bit integers and applying bitwise operations to determine membership in a network subnet.&lt;/p&gt;

&lt;p&gt;The resulting, redacted dataframe looks something like:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;IP:src&lt;/th&gt;
      &lt;th&gt;IP:dst&lt;/th&gt;
      &lt;th&gt;IP:len&lt;/th&gt;
      &lt;th&gt;DNS:opcode&lt;/th&gt;
      &lt;th&gt;DNS:qnames&lt;/th&gt;
      &lt;th&gt;internal&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;128&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“162.55.242.49”&lt;/td&gt;
      &lt;td&gt;“91.59.x.x”&lt;/td&gt;
      &lt;td&gt;188&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;false&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;93&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;93&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;843&lt;/td&gt;
      &lt;td&gt;null&lt;/td&gt;
      &lt;td&gt;[]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;“10.x.x.x”&lt;/td&gt;
      &lt;td&gt;139&lt;/td&gt;
      &lt;td&gt;“QUERY”&lt;/td&gt;
      &lt;td&gt;[“ns-2.sit-servers.net.”]&lt;/td&gt;
      &lt;td&gt;true&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;packet-length-analysis&quot;&gt;Packet length analysis&lt;/h2&gt;

&lt;p&gt;So far so good. Assume we want to investigate elevated retransmission rates. We might want to look at
the distribution of packet lengths, for internal and external traffic. With the current setup,
we can hand the dataframe to &lt;a href=&quot;https://seaborn.pydata.org&quot;&gt;seaborn&lt;/a&gt; for visualization.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;seaborn&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sns&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;sns&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;histplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;IP:len&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;internal&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;element&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;step&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;yscale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;log&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlabel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Packet size / Bytes&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/packet-length.png&quot; alt=&quot;Distribution of packet lengths&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;dns-server-analysis&quot;&gt;DNS server analysis&lt;/h2&gt;

&lt;p&gt;Next, we might want to investigate the DNS queries. Let’s look at the frequency of query names.
Since we captured traffic on all interfaces, we want to filter out queries for internal
servers. That’s easily done with Polars. Furthermore, since we don’t specify the direction of the query,
incoming or outgoing, we capture both: incoming DNS queries to the authoritative server where tcpdump was running, as well as, name lookups originating from the server.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;dns_stats&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;explode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;drop_nulls&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value_counts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ends_with&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;in-addr.arpa.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;not_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;pl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DNS:qnames&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ends_with&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;local.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;not_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dns_stats&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;count&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;descending&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;DNS:qnames&lt;/th&gt;
      &lt;th&gt;count&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;“gitlab.sauerburger.com.”&lt;/td&gt;
      &lt;td&gt;91&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;em&gt;(redacted)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;55&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“ns-1.sit-servers.net.”&lt;/td&gt;
      &lt;td&gt;30&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“ns-2.sit-servers.net.”&lt;/td&gt;
      &lt;td&gt;30&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“frank.sauerburger.io.”&lt;/td&gt;
      &lt;td&gt;12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“fjell.ai.”&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“neodns.io.”&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“sauerburger.io.”&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“debugci.dev.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“www.fjellai.cloud.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“sAUeRbuRgEr.DeV.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“ds.sit-servers.net.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“nEodns.teCH.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“Ns-1.sIT-servErs.neT.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“NS-2.SiT-SErVerS.NeT.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“ns-2.sit-SeRveRS.nET.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“.uhepp.org.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“net.stratus.sit-servers.net.”&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“NS-1.sIT-SERVErs.NEt.”&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“neODNS.TecH.”&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;“nS-1.siT-SerVERS.NEt.”&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;If you’re wondering why some of the DNS entries have random captialization,
that’s just &lt;a href=&quot;https://xkcd.com/1361/&quot;&gt;Google focussing on its core business&lt;/a&gt;.&lt;/p&gt;</content><author><name>frank</name></author><category term="python" /><category term="internet" /><category term="sysadmin" /><summary type="html">Sometimes, debugging state-of-the-art AI applications in an on-premise Kubernetes cluster requires capturing network packets and performing complex statistical traffic exploration and analysis. Traffic is easily captured with</summary></entry></feed>