Frank Sauerburger - Blog
-
Character encoding issues
If you’re parsing text or processing old non-English websites, you’ve probably encountered strings like ä. These strings are symptoms of character encoding issues. This article summarizes common encoding errors and what they probably meant. Use the document as a reference to quickly identify the encoding issue you’re facing. Use CTRL+F to search for the symptom you’re seeing. -
Deutsche Bahn delay
Deutsche Bahn is not known for its punctuality. Since the Euro 2024, it has been known for its delays. I’m a frequent train commuter and happended to ride on an ICE on September 2, 2024 that was scheduled to arrive at 7:47. When the train left my home station earlier that morning, it was delayed due to technical difficulties on the train. The actual arrival was 7:54. While disembarking the train, I took the following photo. -
AI infrastructure at resonable scale
From data analysis in Jupyter Notebooks to production applications–In this blog post, I’d like to introduce ideas for an AI infrastructure at a reasonable scale, to bridge the gap between doing data analysis with artificial intelligence (AI) and machine learning (ML) in a Jupyter Notebook and building production applications. So, let’s start with a quote by Michael Dell: We are unleashing this super genius power. Everyone is going to have access to this technology […] -
Semantic versioning and it's contradictions
Semantic versioning, or SemVer, is great. There are alternative idea’s like Jacob Tomlinson’s EffVer. In this article, I’ll show-case two internal contradictions of SemVer. However, besides all the criticism, I still think it’s the best versioning method we have. The two examples, are probably not relevant in practice and the fact that people fail to use SemVer consistently is not an argument against it. -
Sticky bits
Native Linux file permissions are often denoted as “r” (read), “w” (write), and “x” (executable, which in the case of a directory means that a user can change into the directory). Treating them as binary flags with the values r = 4, w = 2, and x = 1, we can express any combination in a single digit. This scheme is repeated three times to denote independent permissions for the file owner, any user in the group of the file, and anyone else. For example, a file with permissions 0644 or rw-r--r--, is readable and writeable only by the owner (rw-). Users in the group of the file and anybody else has read-only access (r--). However, what does the leading 0 mean? There is more to it. I introduce to you: the sticky bit and its friends. -
Nvidia library/driver version mismatch
On systems with an NVIDIA GPU, a simple apt upgrade could leave you in a dreaded situation where the GPU still works, but the NVIDA tooling like nvidia-smi doesn’t work anymore. They just print an error message like $ nvidia-smi NVML: Driver/library version mismatch This happens when apt upgrades the version of your tooling but the nvidia kernel modules were already in memory before, so they’re still running with the previous version. The usual approach is to reboot your machine. Sometimes this is not acceptable. In these cases, you can instead remove the now outdated kernel modules and load the updated version. -
httpx as a basic gRPC client
gRPC is a concise, fast, and powerful remote procedure call protocol. It leverages Google’s ProtoBuf as the wire format and HTTP/2 as transport. Besides simple RPC invocations, gRPC supports additional metadata and client and/or server-side streaming of messages (providing support for server push messages). Getting started can be daunting. Understanding the internals is even more challenging. Assuming prior knowledge of HTTP/2, can we use HTTP/2 and ProtoBuf by hand to create a gRPC client? This article implements a rudimentary client for educational purposes based on the Python library httpx. -
Retag docker images
It is common practice to build Docker containers in CI pipelines using tools like Kaniko. It is also common practice to version Docker images with tags like 1.0.1, 1.0, 1, latest. At one point in time, all tags probably pointed to the same image. Is it possible to write a CI/CD job that retags images without downloading the full image first? -
Kubernetes tricks
This article is a collection of techniques that have proven valuable when interacting with a Kubernetes cluster, especially when developing or debugging applications deployed to the cluster. -
Let's encrypt certificates on internal machines
A lot of system administrators are faced with the challenge of obtaining TLS certificates for internal machines that are not exposed to the public. In these scenarios, a domain is mapped to a private IP address by a link-local DNS server. For example, when a user on the internal network accesses https://example.com, their browser connects to 172.16.0.1:443. HTTP-based challenges don’t work with the internal machines. Let’s encrypt cannot connect to the internal servers. However, there are a few solutions to obtain TLS certificates from a CA. -
Mkdocs without loading Google fonts
MkDocs is one of the greatest tools to document your software projects. Combined with the material theme, developers are able to build high-quality documentation websites in minutes. For example debugci.dev. -
Handling static files in Kubernetes
How to host static assets in Kubernetes? The standard answer is: You don’t. Use a CDN. Ok, fair enough. While a CDN might be the most performant solution, especially considering geo-routed options, for a lot of deployments to a Kubernetes cluster, it is simply not necessary. What are alternative solutions? -
SSH over wireguard
SSH is the default technology to connect to remote servers. Millions of servers can be reached via SSH. Everyone who has ever administrated a server with SSH exposed on the internet knows that there is a constant stream of login failures. Assuming the security of the SSH protocol and the correctnes of its implementation, this would be just a nuisance. However, software is hardly perfect, and even SSH is subject to security vulnerabilities. It is therefore advisable to use additional techniques to protect the SSH server. This article describes a solution that’s applicable for rented root servers where customers cannot add a hardware firewall or DMZ. -
Common IPv6 prefixes
Although it has been long foreseen that the IPv4 address space will not be enough for the tremendous growth of the internet, the rollout of IPv6 has been much slower than initially anticipated. There are many reasons for this, but I won’t discuss them in this article. Superficially, IPv6 simply blows up the address space by using 128-bit addresses. However, there are a few details and conventions beyond the sheer length of the address. This article summarizes a few points without claiming to be an extensive list and introduces common address prefixes. -
GitLab secret variable or not
Picture a typical GitLab CI/CD deployment pipeline of a web app. The actual technical implementation does not matter. The deployment is most likely configured through variables. For example, the app receives the following set of variables: -
Random random seeds
We need random numbers for many applications. Especially for physics simulations or machine learning (ML) model training, we are interested in a reproducible stream of random numbers—I’m not talking about cryptographic applications here. For simulations or model training, we are primarily interested in the reproducibility of the task. This can be achieved by seeding a random number generator. For example, in Python, -
Send signals to dockerized app with Ansible
Automating common tasks can save a lot of time, increase the reliability of a system, and reduce maintenance cost. One such task to automate is to make services reload its configuration files by sending the HUP signal. So far so good. It gets more difficult to auto this with Ansible when the service runs inside a container. Here are two options. -
XKCD easter egg in Google's DNS server
While working on a custom DNS64 implementation for DNS over TLS, I realized that the rumors are true. Google will shutdown its services and focus on its core project: the 8.8.8.8 DNS server. -
Wireguard on ICE
No, I’m not talking about a new cocktail, I’m talking about using Wireguard with Deutsche Bahn’s WiFi on ICE trains. I like to work remotely while traveling on the train, and Wireguard seems like a sensible choice for this occasion. My remote Debian server has a static IP address with a DNS record mapped to it. The Wireguard UDP port is exposed over the internet. This means my MacBook the train should have no problem traversing any NAT or firewall and establishing a secret tunnel. However, a very strange problem occurred: -
Airplay port hijacking
As part of my new role as senior machine learning engineer at MDPI, I’ve set up an mlflow service and experiment with the mlflow CLI to serve models stored in the Model Registry. Strangely, all requests to the endpoint returned an empty results. -
Rigol 5435 rise time
Recently, I upgraded my makeshift electronics lab with a new oscilloscope. I’ve bought the Rigol MSO5407, an 8 GSa/s 4-channel mixed signal scope with a built-in logic analyzer and two 25 MHz arbitrary waveform function generators. The above version has a software bandwidth limit of 70 MHz. Remarkably, the hardware of the 2-channel version or the 350MHz version is identical. The channels and the bandwidth are disabled in software. -
Tracking Deutsche Bahn
ICE trains in Germany offer free WiFi (sometimes at least) with an “in-flight” portal iceportal.de. The portal displays information about the current trip, the next stop, the location, and the current speed. These are interesting pieces of information, especially since GPS signals are usually hard to receive in modern train cars. How about querying and recording these pieces of information programmatically? -
Directly running ansible playbooks
Over a year ago, I wrote about the possibility to create a YAML/Bash hybrid file, that contains YAML configuration settings and a bash executable part. Lately, I started using a slight modification of this that allows to execute ansible playbooks with ./playbook.yaml as if they were executables. -
SCPI with PeakTech 4095
I’ve been intrigued by the possibility to control the bench multimeter PeakTech’s 4095 remotely via SCPI commands. After unboxing the device, it seems it does not support DHCP, so one has to set the IP address manually. Furthermore, one has to enable network access manually after each power cycle. -
Rate limiting a Twitter Stream
Consider an application that uses the Twitter Streaming API as an input to perform Natural Language Processing. The analysis could require a lot of computing resources, and we might want to control the average number of tweets per time. For the analysis, we might want to focus on tweets from people with many followers. Twitter streams are very volatile, so how can we control this? This article discusses the issue of limiting the average number of events and assumes that a stream of tweets is already in place. -
Python's datetime.utcnow()
Time is difficult. Not only from the point of view of special and general relativity but also for simple calendars. Time zones, Gregorian vs. Julian calendar, leap years, and leap seconds make this awfully difficult. Time and date is a complex concept in our society, so we must capture this complexity in software. In general, a computer program has two easy-to-use approaches Work with timestamps or work with ISO-8601 date time strings. Both are perfectly valid approaches with different advantages and disadvantages. This article discusses issues introduced by leap seconds and the Python approach with the datetime package. -
Machine learning with uncertainties
Machine learning provides a scalable way to go from raw data to meaning. However, often the input data has noise and uncertainties. Coming from the perspective of a physicist, every non-negligible uncertainty must be quantified in order to be able to state a quantitative result of the measurement. In the case of a machine learning classifier, could we even go one step further and teach the classifier which input variables are noise to minimize the impact on the result. Yes, ML classifiers can exploit our knowledge about uncertainties in the inputs. This article shows an example. -
Namespaced GitLab agent
GitLab’s Kubernetes agent is a great tool to deploy your applications directly to a Kubernetes cluster, ensuring the deployed version matches the state of the repository. The repository is the single source of truth. -
Debian OS versions
Each Debian release has a name, probably supposed to help remembering them. I’m running bullseye is much easer than, I have Debian 11.3. However, with names, it’s much more difficult to tell which release is newer, or how many releases are between two releases. The situation is exacerbated by Ubuntu, which although based on Debian, comes with its own release name aliases. This article summarizes the release names of Debian and the corresponding Ubuntu LTS releases. -
Manual PDF editing
Recently, I’ve obtained the Professional Scrum Master certification. The preparation for the examination involved dissecting the official Scrum Guide. Scrum as a framework is well structured into principles, accountabilities, events, and artifacts with important relations between them. However, while going through the guide, I would have preferred to have numbered section heads to highlight the hierarchy of each section. This is a great opportunity to try editing a real-life PDF with a text editor such as vim. -
Nginx domain fronting
It is speculated that Google and Amazon blocked domain fronting upon request from the Russian government since domain fronting allows circumventing censorship. What is domain fronting, and how to use it in Nginx? -
Zoom on Debian/Bookworm with Wayland
After a system upgrade, I was suddenly unable to share my screen on zoom on my Debian laptop. I’m running Debian Bookwork unstable. This articles serves as a reminder to myself. When I click on ‘Share’, I see the following error message: -
Anatomy of IPFS identifiers
The interplantary filesystem (IPFS) is an interesting idea of a distributed web based on content addressing. In short, it’s a peer-to-peer network, objects can be located by their hash value. Public gateways provide easy access from regular client-server HTTP. The system has a bold name and some people even call it Web 3.0. When I first encountered the system, I was a bit overwhelmed by the multitude of different hash formats and their usage. This article serves as a summary to dissect commonly encountered hash patterns. -
Big-endian and little-endian
This article is for everybody that knows the difference between big-endian and little-endian but frequently confuses the two terms. If you are like me, you might think “Big-endians end with the big part of the number (i.e., the most significant byte, MSB).” This is wrong. So, please forget that notion right away. This article shows the correct mental model for big- and little-endians. -
Median Stack with Python
This article shows how to do image processing in Python in just a few lines. During a vacation in Venice a couple of years ago, I took several photos of the Piazza San Marco from the same vantage point without moving the camera. Although it was very early in the morning, a few people were already on the Piazza and an even larger number of pigeons. Wouldn’t it be nice to have a picture without other people (and maybe even without pigeons)? -
Inspect a Kubernetes PersistentVolumeClaim
Containers made the idea of ephemeral systems famous. But almost all systems need to store data permanently. In Kubernetes, the PersistentVolumeClaim (PVC) resources allow you to claim a volume that can be mounted in a container. With systems like rook-ceph, it’s very convenient to provide dynamic provisioning of persistent volumes. -
Bayes' Theorem
The coronavirus pandemic has changed quite a lot. One of the lesser important things is that people in typical math problems are not as far-fetched anymore as they used to be. And no, I’m not talking about people buying 1000 eggs or 200 rolls of toilet paper. No, I’m talking about math exercises concerned with Bayes’ Theorem. This articles answers: What is Bayes’ Theorem, and why is it relevant? -
Debugging Tensorflow 2
Tensorflow comes with eager execution enabled by default which makes debugging much more straightforward. What does eager execution mean, and why it’s not always so eager? -
Banana Pi: Getting started
I’ve got my hands on a brand new Banana Pi BPi R64 with a dual-core 64-bit ARM Cortex. The fact that it has multiple Ethernet ports makes it the perfect toy to build a router. This article covers the very basics of how to get started with the device. This article is intended as a more verbose version of the official getting started guide. -
Alpakka FCM options
I a recent article, I’ve posted a snipped how to get an interactive terminal in PlayFramework. Back then, I was working on a web service that forwards webhook triggers to web sockets and an Android app connected via Google’s Firebase platform. -
Go Lang channel blocking
Channels are one of the selling points of the language Go. The logic of whether the reading of writing operation on a channel is blocking depends on a number of factors. This article summaries common scenarios. -
Decoding the digital EU COVID vaccination certificate
If you receive a vaccine shot against the coronavirus within the EU, you might get a digital certificate in the form of a QR code to prove your vaccination status. The QR code can be verified with specialized apps, like the CovPass Check app. -
Pattern Matching in Python 3.10
The release of Python 3.10 is expected on October 4, 2021. As with previous releases, this release will introduce new language features and constructs. I’m looking forward to one special new feature in the upcoming release, namely Pattern Matching. If you have experience in a functional programming language, like Haskell, or a language with support for programming in a functional style like Scala (just to name two), pattern matching is one of the standard items in your toolbox. However, if you are not yet familiar with the concept or its implementation in Python, this article is for you. I will walk you through two examples in increasing complexity to whet your appetite until it’s finally released. -
Interactive repositories with Play Framework
According to the StackOverflow Developer Survey 2020, the Scala programming language is mainly used in connection with Hadoop and Spark. In contrast to the survey, the Play Framework provides a modern web framework that is inspired by Django or Ruby on Rails. Coming from a Django background, I think the learning curve for the PlayFramework is much steeper, and some things that come naturally in Django are much harder to get in Play. This article shows how to get an interactive shell to work with your models in Play, similar to Django’s python manage.py shell. -
VBox DHCP IP addresses exhausted
As described in earlier articles, I’m using an ephemeral virtual machine to run a privileged container with access to the Docker docker of the virtual machine as a system to build Docker images in a CI pipeline. Occasionally, one of the machines hosting the virtual machines breaks and stops spinning up new virtual machines on demand. The pending docker-machine instances display the error message -
Yaml Bash hybrid
The default Netfilter configuration file under Debian is stored at /etc/nftables.conf. The file is a plain-text configuration file with a clever shebang line. This means you can execute the configuration file to load its contents. Nifty, right. -
Jekyll and server-side code
Jekyll is a convenient and yet powerful tool to build static web pages. It is instrumental in combination with automatic deployment via GitHub or GitLab pages. You might have noticed that this blog is made using Jekyll. However, sometimes, you want to add a bit of server-side logic to the website. It might be as simple as a contact form or a comment box. In contrast to other online resources that suggest embedding third-party services with JavaScript, this article illustrates how to combined server-side PHP with Jekyll. -
Hadoop and friends
When I first started working with Hadoop and Spark, it felt like walking through a jungle: Each project depends on other projects or at least references many different projects with which this project can interoperate or to which it is the “better” alternative. In this article, I’ll try to clear this up a bit. The article should serve as a glossary. -
Python in HEP community
Python has become one of the most popular programming languages. Its concise and straightforward syntax, as well as the easy-to-use type system, ensure a flat learning curve. Besides C++, Python has become the programming language of choice in the high-energy physics community. The small entry barrier has the inadvertent disadvantage that more low-quality code is contributed to shared codebases. -
CSS flex attributes
Flexbox layout in CSS has unlocked countless design possibilities and greatly reduced the implementation burden of common layouts. Many layout options are now supported without the use of additional JavaScript. The use of Flexbox is detailed in A Complete Guide to Flexbox. At first, the large set of CSS attributes can be a bit confusing. Let’s clear that up. Many attributes follow a composite verb-object scheme. -
SequenceFile compression
The book Hadoop: The Definitive Guide by Tom White builds countless examples around NCDC’s sizable Integrated Surface Database. The dataset contains hourly temperature readings from thousands of weather stations around the globe, dating back until 1901. Each measurement is represented by a cryptic looking string: -
Working with Prolog
From time to time, I’d like to learn a new language, to see what it offers, and to broaden my mind. Intrigued by a chapter on predicate logic in Russell Norvig’s book Artificial Intelligence, I had a look at Prolog. -
Testing Python C-extensions
sortednp is my first open-source Python package which uses Python’s and Numpy’s C-API. There are huge ecosystems for testing Python libraries and C or C++ libraries. However, I found it quite cumbersome to test the C extension. This article is a short summary of the steps I took to have low-level tests of the C extension in a manylinux environment and high-level tests of the Python API provided by sortednp. -
Command-line copy and paste: fclip
Wouldn’t it be nice to have a command-line tool to cut, copy and paste files from one directory to another, even across terminal windows just like you could with nautilus or any other file browser? I present to you: fclip. With fclip it’s easy to copy files from the working directory of one terminal to the working directory of another terminal. -
Universal Notifications
This week’s article is about how we spend the time while code is compiling, a web app is being deployed, a neural network is learning or a job sent to an HPC facility is being executed. Some of these tasks might take quite some time, but in the day-to-day life of a programmer, there are a lot of tasks that take something between 20 seconds and 5 minutes. One can stare idly at the blinking console cursor and wait for the prompt to reappear, or follow Randall Munroe’s suggestion: -
Fix linter issues
Exhibit 1: A commit message from last year. The commit looks innocent enough. The title says “Fix linter issues”. The commit changes line breaks and white spaces to pass the linter in the GitLab pipeline. So what’s wrong with that? -
Project management with GitLab
One of the more useful tools I’ve recently discovered is the Eisenhower matrix. In short, it lets you categorize issues or tasks into four categories based on their urgency and importance. With GitLab, one can easily create the four labels, tag issues with the appropriate label and optionally event use an issue board to have a get an overview. Creating this over and over for every project becomes tedious very quickly. -
Server-side rendering with matplotlib
A couple of months ago, I’ve strung together a small web app to create a customizable plot showing which particle-physical reactions happen at the LHC at which rate. -
Shades of gray
Whenever one creates a web application, one probably needs some shades of gray: for the text, for the background, for shadows, and much more. In my experience, I usually pick rather light colors for dark gray and rather dark colors for a light background gray/white which—if not corrected—won’t make the app look cool. For people who have the same issues, I’ve created a survey of grays that are around the web. The selection of websites is very subjective. As it turns out, there is a lot of nuance to the last few least significant bits of the color code. -
Credit card verification process
When you use your credit card online to place an order, you might need to follow the Verified by Visa procedure. This procedure should authenticate the owner of the card by using an additional authenticate method provided by your bank, in order to prevent misuse of your credit card details. However, recently I’ve learned that ING introduced a completely new type of thread when an account owner follows their Verified by Visa procedure. -
Prevent CSS parent margin collapse
Collapsing margins are probably one of the most annoying features in CSS. Initially, collapsing margins of adjacent objects should help to style collections of objects. However, I never understood how margins of children falling outside the parent should be useful, or why adding background color does not prevent the collapse. -
Web2Print with CUPS
We are all happily working from home nowadays, and I hope everybody is well. In many workplaces, teleworking was introduced in a hurry. The IT departments in some companies were not well prepared for this rapid increase in teleworking employees. I’ve come across a setup where a desktop PC is deployed to the employees’ home. The computer connects to the companies internal network via VPN and from there accesses the actual work PC using a remote desktop software. -
Corona App
Currently, there are a lot of ideas floating around on how to flatten the curve of the SARS-CoV-2 virus pandemic. One of these ideas is an app that records the presence of other devices in the vicinity via Bluetooth. In case of a confirmed infection with the new coronavirus, people who were in close contact with the infected person can be notified and asked (or forced) to self-quarantine. -
ARM Gitlab CI Runner
It is straight forward to install a Gitlab runner with Docker executor on a small Raspberry Pi. The runner can then be attached to a specific Gitlab project or the whole Gitlab instance. I would, however, recommend disabling the option “Run untagged jobs” and add a tag, e.g., arm. -
Awkward arrays and numba
In high energy physics, a lot of data is presented in the form of ragged or jagged arrays. Often each row represents an event produced by the collision of highly energetic particles at a particle accelerator. The number of items per row depends on the event itself. For example, this could depend on the number of jets produced in a proton-proton interaction at the LHC. Awkward array’s JaggedArray makes working with these data structures easy and performant in Python. -
Outline stuck to origin in histograms
Over the last months, I’ve developed two Python packages: atlasify and nnfwtbn. The former is a package which applies the ATLAS collaboration specific plotting style to matplotlib plots, while the latter is a framework to train neural network tailored for high-energy physics based on keras. Both frameworks have a special focus on plotting. While the demand and the code base of two projects grew rapidly, I ran into two independent issues with matplotlib. At first, I assumed there is an issue in the way I use matplotlib; however, it turned out to be a bug in matplotlib itself. -
Dangerous type handling in PHP
PHP is a dynamically typed language. In addition, it performs some automatic type conversion. Since PHP 7.0, it is possible to declare strict_types=1, which means that type hints are enforced. For me, this leads to unexpected behavior. Consider the following (condensed) example. -
Install vim spellfiles
Has it ever happened to you that you change the language of vim’s spellchecker (:set spelllang=de), but when asked if vim should install the spell files you accidentally hit no? You can easily install the files yourself. Let’s assume you want to install German spell files. Execute -
List of OpenPGP Keyservers
Every time I update my personal or other people’s OpenPGP certificates, I can not remember the hostname of a public OpenPGP keyserver. This list should function as a quick reference for commonly used and (somewhat) reliable keyservers. -
Invalid unicode characters in PDF files
Over the last months, I’ve developed two Python packages: atlasify and nnfwtbn. The former is a package which applies the ATLAS collaboration specific plotting style to matplotlib plots, while the latter is a framework to train neural network tailored for high-energy physics based on keras. Both frameworks have a special focus on plotting. While the demand and the code base of two projects grew rapidly, I ran into two independent issues with matplotlib. At first, I assumed there is an issue in the way I use matplotlib; however, it turned out to be a bug in matplotlib itself. -
TICK stack demo in docker
The TICK software stack seems to be a natural fit for low-frequency measurements in laboratories (ranging from once per hour up to 100 Hz or so, rather not GHz), such as temperatures, pressure, and other properties of experimental setups. The TICK stack is a composition of InfluxDB, chronograf, kapacitory and telefgraf. -
Provision a Raspberry Pi without keyboard and monitor
This article explains how to provision a Raspberry Pi (3B+ in my case) with Raspbian Buster Lite. In principle, this should be easy but gets quite challenging when you don’t have a keyboard or a monitor. All I have is a laptop with an SD card reader and the Wi-Fi at home. The task is to copy the Raspbian image to the SD card, activate SSH, and set up Wi-Fi. -
Restoring partitions with parted
TL;DR: Use parted rescue The Problem Last Sunday evening, I accidentally deleted the partitions of my laptop while it was running. Initially, I wanted to remove all partitions from an SD card and create a single partition spanning the whole card. -
PySpark with Python 3
The end of Python 2 is near. (The end might be a bit later though.) The latest release of Apache Spark (2.4.4) still uses Python 2 by default. Configuring PySpark to run with Python 3 and IPython is pretty straightforward. -
Setting up Guake under Fedora 30
New OS, new problems. Recently, I wanted to try out Fedora. In my daily work routine, I heavily rely on Guake, the terminal that pops up when a global hotkey is pressed. However, when I started Guake for the first time on Fedora 30, there was an error message saying that it could not bind the global hotkey F12. -
Wake-on-LAN on CentOS 7
This post shows the steps required to set up Wake-on-LAN (WOL) on a 15-year-old computer running CentOS 7. I can imagine that different hardware configurations might need less or different steps. -
Internet threats in real life
When someone mentions man-in-the-middle attacks, I will immediately think about security concerns regarding an internet protocol such as TLS. However, recently, I rented a new apartment in a city far away from my previous place. I found the offer for the new apartment online on a popular website and received a positive reply from the landowner. It was time to transfer the deposit and the rent for the first month. Without having ever met the recipient or seen the apartment. Could this be a scam? -
Project Templates with GiTemplet
A lot of projects start from a set of identical skeletons. For example, a project for a Python library should have the basic structure including a setup.py and a few CI jobs. A react webpack project should have a node package configuration and the webpack configuration. Only a few properties have to be adjusted for the initial commit of the project, such as the name of the packages and the project URL. Over time, I’ve accumulated a couple of scripts which replace placeholders in template projects by the new projects name. In the last couple of weeks, I’ve converted these scripts into a proper Python project itself. The project is called GiTemplet, a conctenation of Git and template because it takes Git repositories as templates. -
Local dark matter density with pyveu
One of last week’s xkcd comics was about the local dark matter density. Wait. If only 5% of the universe is ordinary matter and 25% is dark matter (the rest being dark energy), see 1 and 2, then why is there only one dark matter squirrel on earth? This is a good opportunity to play around with the Python package pyveu. -
Exclude traffic to an IP in wireshark
Have you ever tried to remove records in wireshark to and from a specific IP address? I played with funneling traffic from a program trough a proxy server and wanted to check if the program sends any requests ignoring the proxy settings. This is a simple task for tools like wireshark. Start it, hide every record going through the proxy and check if there is anything else. -
Links to GitLab CI artifacts
I find it convenient to create build artifacts using GitLab’s CI and link to the artifacts from the README file. It is important that the links point to the most recent version and not to a specific CI job. This article is a short summary of the different URLs to access the artifacts. -
Git: Reverting a merge commit
Git is an excellent tool that boosted by productivity tenfold. I like Git’s clean data model and the directed, acyclic graph of commits. I use Git on a daily basis and–naive as I am–thought I knew all the tricks and corner cases until recently when I stumbled over a reverted merge request. Have you ever reverted a merge request? To quote the Git documentation, reverting a merge request “may or may not be what you want.” So let’s dissect git revert -m to see what it does and what the consequences are. -
My first magnetic tape drive
Recently, I bought my very first tape drive. Yes, it is 2019. Tape drives are not archaic, obsolete pieces of technology. They excel at particular use-cases. I bought an LTO-7 drive which offers 6 TB of storage per tape, or even more if the input is compressible. Earlier this year, I wrote a piece about my NAS setup. The final missing piece of my setup was a way to create off-site backups need in case of catastrophic events (fire, etc.). Using my new tape drive, I can quickly ship backups to other geographical locations. -
ROOT.TString as Python dict key
Python dictionaries allow access to its data items via keys. If you store two numbers with the same key, the last call will overwrite the former number. For a research project, I worked on a script, that had a list of ROOT objects, possibly with duplicates. The objects are uniquely identified by their name. The name is returned by a custom method called getName(). The task was to eliminate all duplicates. So, In a single line of Python code, I put all the objects in a dictionary with the object’s names as keys, thus removing all duplicates. -
Testing code snippets in documentation
There is a convenient way to write unit tests in every major programming language. In Python and other languages, there is a very convenient way to test the documentation strings of a method. -
How large is the LHC
My newest project is titled HowLargeIsTheLHC.com. Its centerpiece is a Google Maps instance with an overlay of the shape of the Large Hadron Collider. The site tries to answer the stated question: How large is the Large Hadron Collider by showing it on a map. The key here is that the user can drag the map and thus move the collider overlay to any point on earth. -
SKS dumps
Keyservers play an essential role in the world of (Open)PGP. Simson Garfinkel put it as follows in his classic book “PGP: Pretty Good Privacy” from 1995: The PGP Internet key servers are an attempt to solve the fundamental problem of public key cryptography: how to get the public key of a person with whom you wish to communicate. -
Install Guest Additions for CentOS
This article is a summary on how to install the VBox 5.2 Guest Additions for a CentOS 7 guest OS. In order to install the Guest Additions, we needed to prepare the guest operation system by installing various programs need by the installer. -
Joining Google maps routes
This article is about Google maps’ route planning. The article shouldn’t be understood as an endorsement. This article relies on the current API and URL scheme so it is possible that the methods presented in this article might stop to work at any time in the future. -
Scientific rounding with pyveu
Recently, one of my colleagues needed to print a collection of measurements with uncertainties in scientific rounding to publish them in a paper. If there are only two or three fixed numbers, it’s probably easier to do this by hand. As soon as there are more numbers or if the numbers might change in the future, it is more convenient to have a method which formats the number and the uncertainty automatically. -
Tagging Docker images in GitLab's CI
GitLab’s continuous integration and deployment are great. If you have special runners, you can even build and deploy docker images of your software in the CI. This new possibility immediately leads to the following questions. When should you build a docker image? How should it you tag the docker image? -
Exploiting TLS and HSTS to track clients
The method I’m explaining in this article is not new, however, it is interesting enough to create a live demonstration. I’m talking about a technique to track web users (or more precisely their web browsers) using Transport Layer Security (TLS) with HTTP Strict Transport Security (HSTS). Tracking users via regular cookies is very common. Tracking via TLS and HSTS is somewhat rare. Users don’t have an easy-to-use way to remove the tracking information. Until a few years ago, this tracking method even survived the private browsing mode. So, how does it work? -
My NAS setup
Over the last year, I changed my personal storage and backup strategy. For a long time (almost decades) I was using a commercial RAID-5 NAS device. Over time, the NAS ran out of disk space and I added a server with a two-disk RAID-1. To protect my data from accidental deletion or from attacks with ransomware, I ran hand-made bash scripts to backup both NAS’ to external hard drives. The scripts are based on rsync, exploiting hard-links to create incremental backups. The external disks are connected only during taking backup. -
Assignment to a function call in C++?
Recently, during a technical meeting, a single line of code blew my mind. The talk was about a software library and how to use the C++ API. On one slide there was a statement of the sort some_function(some_object) = some_value; -
Fail2ban failed with Docker timezones
The title makes it sound like this article is a bug report. However, this article is a mere summary of an effect I have observed when time zones are not set correctly in a docker container. It’s not a bug, it is a misconfiguration. My setup was a follows. I have a public-network facing application in a docker container. Users need to authenticate by public key before they can use the service. If authenticate fails, the occurrence is logged in a text file. Outside the docker container, on the host system, I use fail2ban to block random brute force attacks, in order to avoid cluttering log fails and draining resources. -
Exploit suid on SSHFS?
Have you ever wondered if you could leverage sshfs and the suid flag to run any program as root? Well, the idea is simple. You prepare an executable on a server under your control. The file should be owned by root and the suid flag shout be set. When the containing directory is mounted on the target machine via sshfs, you have the prepared file owned by root with the suid flag set, so the program should run as root. Right? Let’s see how it goes. -
Security of Git commit signatures
In 2017 CWI and Google published the first found collision for the SHA-1 hash function. This immediately sparked discussions about the security of Git because its beautiful data model relies heavily on SHA-1. -
My experience in the ATLAS Control Room
During this year I had the opportunity to work in the ATLAS Control Room at CERN in the course of my Ph.D. project. The ATLAS experiment operates one of the two large-scale, general-purpose detectors at the LHC that discovered the Higgs Boson in 2012. Currently, the ATLAS collaboration is taking data at the highest center-of-mass energies in proton-proton collisions. Unlike the other posts here, this post is not about a technical problem that I have encountered, it is about my personal experience in the Control Room. -
Using ROOT with Python 3 in Docker
A couple of weeks ago, I was asked to prepare a lecture for bachelor students on ROOT for the advanced laboratory in Physics at the University of Freiburg. I knew the students had prior knowledge in Python, so I decided to show them PyROOT. One of the obvious benefits is that data files can be read really easily. While developing the talks, I wanted to add a CI pipeline to test all the code example on my slides. (I think there is nothing worse than a talk with buggy code examples, see doxec, a tool to test code examples). I quickly realized that there is no Docker image for ROOT using Python 3. -
Determine start of download
Back in 2011, I answered a StackOverflow question. The question boils down to showing different content on the web page when the user has started a download. Using an event listener on the link to the download doesn’t work, because the download might be delayed because it is generated dynamically. The page, however, should update as soon as the download actually starts. -
Create docker machine with custom certificates
Have you ever tried to set up a new virtual machine with docker-machine using custom certificates? The docker-machine create command offers a couple of options to set custom certificates. However, to me, it was not clear, which option has what effect. This article discusses which certificates and keys are involved and which can be overridden with custom files. -
Sign CI artifacts
When you build your software with CI, you might want to sign the build artifacts and distribute the signature with the software in an automated way. There are a few things to note. -
Trust signatures
Modern encryption technologies for day-to-day use are based on public key algorithms. This reduces the problem of exchanging secret keys to the problem of (exchanging and) validating public keys, which are, as the name indicates, not secret. However, the challenge is that there is a priori no way to tell which public key belongs to who. -
GitLab internal vs. external calendar
According to this issue at GitLab.com, version 11.1 of GitLab will add an option to include private contributions in the activity calendar. The activity calendar can be found on a profile at GitLab and shows the number of contributions of the user for every day in the past year. What would it mean if private contributions are included? -
Building Docker images in GitLab's CI
Last weeks article was about ways to make the docker daemon available inside a container. One of the motivations was GitLab’s CI. While the article discussed three different methods and advocated the docker-machine-based solution, it failed to explain how to actually use the methods in order to set up a GitLab runner and then actually run CI jobs with docker access. This is what this article will illustrate. -
Using docker inside a container
Docker is great. In my opinion, Docker has the power to redefine the way we develop, test, deploy, maintain and even view applications. It might have redefined them already. For a lot of people, the first time they interacted with docker, was when they started using GitLab pipelines. With more experience and after I saw how cool docker can be, I wanted to dive deeper and develop applications that are shipped as docker images. However, this immediately leads to problems. -
Thoughts about GDPR
Privacy and protection of personal data is a very important topic in my opinion. The new European General Data Protection Regulation (GPDR) is probably a step in the right direction. However, some of the regulations sound rather vague and bureaucratic and therefore there has been a lot of confusion about how the rules will be applied once in action. It depends on the courts if GPDR will mainly bother owners of small blogs or websites of clubs, or if GPDR will force big data companies to implement and apply efficient privacy protection. In this post, I’d like to discuss a few thoughts about GPDR, to what does it apply and which type of service might not be possible anymore. -
Make your iptables rules persistent in CentOS 7
In CentOS 7, firewalld is used by default to administrate the firewall. If you want to switch back to iptables, you might run into the problem that your firewall rules are not automatically loaded when the system boots. So let’s look at this closer. -
Installing CentOS 7.4 as Xen PV guest
After I discovered GitLab CI, I was a huge fan of Docker and its containers. I wanted to dig deeper into the business of virtualization and look at alternative approaches. Recently, I thought I’d give Xen a try. I’m also rather new to CentOS, but I planned to setup a CentOS dom0 and a para-virtualized domU anyway. It didn’t go as planned.
subscribe via RSS
Discovering and learning about the mysteries of nature in mathematics, physics, and computer science is utterly rewarding and satisfying; why else would one spend years until it finally clicks.