
Blog
Blog
loopback through leaf / single nic iperf test
When bringing up new servers, especially with new or untested network interface cards (NICs) or cables, you might want to test their raw network throughput. While standard iperf tests between two different machines are common, sometimes you want to isolate the test to a single NIC, pushing traffic through it and back, effectively looping it at a nearby switch. This verifies the physical path, the SFP, and the NIC's transmit/receive capabilities without involving another server's NIC or network stack.
invalid elf header magic / rook-ceph / k8s
Yesterday, we noticed some previously unseen messages in the logs: Invalid ELF header magic: != \x7fELF. Of course an ELF binary should start with a valid header. Who/what is trying to run something else? These messages appeared on Kubernetes nodes that were recently installed and not yet fully in use for production workloads. Coincidentally, these particular nodes were running Ubuntu 24.04 (Noble Numbat), while the rest of this cluster was still on older versions.
nginx / no realip in logs / bug
Recently we were doing a vulnerability scan on an external endpoint of a website. During this time, we noticed simultaneous suspicious activity coming from inside our internal network. What's the deal? To make a long story short: we use a proxy (like HAProxy) in front of our webservers, and it speaks the Proxy Protocol to pass along the original client's IP address. That way, when a request hits our internal nginx server, the logs show the real client's IP, not the IP of the reverse proxy that actually made the connection.
supermicro / ikvm / expired certificate
Supermicro computers generally work very well. At our company we've been using them for ages. One thing that does keep giving us the occasional trouble however is the BMC. Today, we'll be looking at expired certificates of the iKVM interface. Introduction First, some definitions, which we generally use interchangeably: The Baseboard Management Controller (BMC) is the chipset core of the Intelligent Platform Management Interface (IPMI) implementation, which we mostly use to remotely attach a Keyboard, Video and Mouse (KVM, over IP).
pveproxy / systemd journald / perl buffering
Because I was debugging — what later turned out to be — an application firewall that was messing with my connections to Proxmox pveproxy on port 8006, I wanted more debug logs. Telling pveproxy to give me more info gave me a bit of a hard time. Connections from a remote peer would get disconnected, and it initially looked like pveproxy was to blame. Step one: get more info from pveproxy.
sed / regular expressions / optimizing
Last week, we were looking at using sed(1) to remove line feeds from CSV files. The regular expressions in that post could use some further inspection. I'm sorry that this is a rather dry post. I wanted to get the numbers out there because the differences are significant. But without the example runs you're just left with numbers. Feel free to skip right to the conclusions. To recap: we were working on multiline CSV files.
sed / remove line feeds / csv
Can you use sed(1) to remove line feeds? Yes you can. Even though sed, the stream editor, is line based, you can use it to remove line feeds from output. In this post we'll be exploring how that is done and look into how a slight refactoring step can provide a big speed gain. The task at hand is removing line feeds from CSV file with multiline strings. Why? Because I want to use grep and awk on it, and multiline strings would complicate that a lot.
recap 2024 - updates at OSSO
2024 – a story in four acts The end of the year. A good time to reflect on what happened the past twelve months. The ups, the downs. What did we achieve? And what can we look forward to? Some of you have asked to be updated on what we’re working on at OSSO. We’ll gladly share the gist here in this recap, which is in English for once (after four Dutch editions in 2020, 2021, 2022 and 2023).
mysql binlog replay / max_allowed_packet
Trying to replay MySQL binlogs? And running into max_allowed_packet errors? Fear not. Likely this is not corruption, but packets that really are that big. In 2021 I wrote about mariabackup and selective table restore. That blog entry shows how one might restore a mariabackup-saved snapshot of a MariaDB (MySQL) database. Specifically, it focuses on recovering only one or two tables selectively and it details optional GPG decryption and decompression. Here's a recap of the mandatory steps for full recovery and additionally how to replay binlogs to a specific point in time.
mariadb check table / galera locking
After upgrading some database nodes from MariaDB 10.3 to 10.6 we encountered some issues with tables not being fully correct. We'd like to CHECK (and maybe REPAIR) TABLE the entire database. But the database must not block queries on other nodes of the cluster. The corruption we saw might have been a small corruption that had crept in during even earlier upgrades, we don't know. But we do know that we'd like to get this sorted before the corruption gets worse after further upgrades.
systemd-networkd-wait-online / stalling and failing
systemd-networkd-wait-online.service failing? Maybe it's IPv6. Do you have a list of failed systemd units that looks like this? # systemctl list-units --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● systemd-networkd-wait-online.service loaded failed failed Wait for Network to be Configured Then that could be due to IPv6 networking not being set up properly. Check the interfaces list: # ip -br a lo UNKNOWN 127.0.0.1/8 ::1/128 ens18 UP 10.20.30.41/31 fe80::a124:092f:fa18:109e/64 ens19 UP 192.168.1.3/31 fe80::a124:092f:fa15:01d5/64 Proper IPv4 IPs, and only link-local IPs for IPv6?
nmap ssl-enum-ciphers / haproxy / tls / no results
When doing an nmap ssl-enum-ciphers scan on a Haproxy machine, I got zero working ciphers. But connecting with TLS worked just fine. What is up? While updating TLS ciphers to disable certain unsafe ciphers, we wanted to test the current offers. nmap with the ssl-enum-ciphers script should work fine for this, and it is reasonably fast: $ nmap -Pn --script=ssl-enum-ciphers -p 443 google.com ... 443/tcp open https | ssl-enum-ciphers: | TLSv1.
gpg-agent / ssh / ed25519 / agent refused
After putting in all the work to get ED25519 OpenPGP keys in my Yubikey smart card, I was slightly frustrated that the SSH support "sometimes" didn't work. I thought I had tested that it worked, but today it didn't. $ ssh my_server sign_and_send_pubkey: signing failed for ED25519 "cardno:000612345678" from agent: agent refused operation walter@my_server: Permission denied (publickey). That's odd. Adding some debug to the gpg-agent — debug 1024 in gpg-agent.conf — got me this:
recap 2023 - updates van OSSO
2023 – betere omgevingen, betere security Waar we vorig jaar druk bezig zijn geweest legacy op te ruimen, was dit jaar meer een jaar van de uitbreidingen: ArgoCD, NetworkPolicies, Cilium, Loki/Mimir… ContainerDay Security Met het gehele OSSO team zijn we afgelopen maart naar Hamburg geweest, naar ContainerDay Security 2023. Daar hebben we bijgepraat over de huidige stand van zaken op security gebied binnen Kubernetes. Hier kwam het laatste zetje wat we nodig hadden om Cilium te gaan uitproberen voor Kubernetes networking.
mplayer / screen saver / wayland
I don't usually watch movies from my laptop, but when I do, I don't want the screen saver to kick in. Occasionally, I notice that my old time movie player mplayer does not inhibit the screen saver or screen blanking. That means that 5 minutes into a movie, the screen turns black. I don't think this used to be the case. Maybe it's because I'm running the Wayland display server in GNOME now.
bash / postfix health check / dev tcp
Using /dev/tcp in bash for a health check? Here's an example. I had a script that used netcat to connect to a Postfix email daemon to check its health status. To avoid pipelining errors I had it sleep between each write. The core looked somewhat like this: messages=$(for x in \ 'EHLO localhost' \ 'MAIL FROM:<healthz@localhost>' \ 'RCPT TO:<postmaster@example.com>' \ RSET \ QUIT do sleep 0.
gpgv / can't allocate lock for
gpgv prints out a warning that it cannot allocate a lock. This looks like something should be fixable, but it isn't. Observed with gpg version 2.2.27-3ubuntu2.1: $ gpgv </dev/null gpgv: can't allocate lock for '/home/walter/.gnupg/trustedkeys.gpg' gpgv: verify signatures failed: Unknown system error The issue at hand here is the “gpgv: can't allocate lock for '/home/walter/.gnupg/trustedkeys.gpg'”. Can we fix something to suppress that? TL;DR: no Investigation Fire up the debugger gdb and break at keybox_lock.
etckeeper / git / pack-objects died of signal 9
Is your etckeeper dying on the git gc? On several machines I have now seen git run out of memory when trying to do repo optimization and garbage collection. Usually this happened in /etc where we like to have etckeeper. It might look like this: ... warning: The last gc run reported the following. Please correct the root cause and remove .git/gc.log Automatic cleanup will not be performed until the file is removed.
segfault in library / addr2line / objdump
Yesterday, we spotted some SEGFAULTs on an Ubuntu/Focal server. We did not have core dumps, but the kernel message in dmesg was sufficient to find a culprit. The observed messages were these: nginx[854]: segfault at 6d702e746379 ip 00007ff40dc2f5a3 sp 00007fffd51c8420 error 4 in libperl.so.5.30.0[7ff40dbc7000+166000] Code: 48 89 43 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 40 00 0f b6 7f 30 48 c1 e8 03 48 29 f8 48 89 c3 74 89 48 8b 02 <4c> 8b 68 10 4d 85 ed 0f 84 28 01 00 00 0f b6 40 30 49 c1 ed 03 49 nginx[951947]: segfault at 10 ip 00007fba4a1645a3 sp 00007ffe57b0f8a0 error 4 in libperl.
qpress / qz1 extension
This is a quick note to my future self. As we're using qpress less in favor of lz4 that has been available on Ubuntu Focal and above, we're inclined to forget what the .qz1 extension means. Wondering what files with the .qz1 extension do? This concerns single stream qpress compressed files. Historically qpress uses the .qp extension, but that concerns multifile archives. The qpress binary can write compressed streams to stdout, but it will not decompress them to stdout.
dirmngr / keyserver / disable-ipv6 / bionic
This morning a build pipeline failed. dirmngr called by apt-key tried to use IPv6, even though it was disabled. The build logs had this to say: 21.40 + apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 21.50 Warning: apt-key output should not be parsed (stdout is not a terminal) 21.57 Executing: /tmp/apt-key-gpghome.KzTTOZjgZP/gpg.1.sh --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 21.65 gpg: keyserver receive failed: Cannot assign requested address This was strange for a number of reasons:
mariadb / gdb / debugging shutdown deadlock / part 2
We were looking at a core dump of MariaDB instance that was in a deadlocked state. It was killed with SIGABRT. We now like to get MySQL thread info from it. Python support in gdb is here to help. To recap: we were dissecting the core dump with gdb (gdb /usr/sbin/mariadbd core.208161), found out that it was waiting on one of the threads to stop before stopping completely. We now want to know what exactly it was waiting for.
mariadb / gdb / debugging shutdown deadlock / part 1
I was asked to look into a MariaDB deadlock during shutdown. We had a core dump (luckily). Now it's time to dissect it. Ensure you can get core dumps For starters, you want a nice core dump whenever you hit an issue. We do. Here are some tips to ensure you do too. You need to have a couple of parameters set correctly: the equivalent of the systemd LimitCORE=infinity and the assurance that the working directory is writable (maybe WorkingDirectory=/var/lib/mysql).
laptop battery discharge / logging
I recently got a new Framework Laptop. It is generally nice. Apart from the reflective screen, and the excessive battery consumption. In suspend mode, it draws too much battery: more than a Watt. This is a known issue. The worst offenders are the expansion cards. For instance the USB-A cards consume about 350mW each, just by being plugged in. To do some testing, I whipped up a script allowing easy access to battery usage logs: discharge-log
viewing unencrypted traffic / ltrace / bpftrace
Can we view TLS-encrypted traffic on the originating or terminating host, without having to decode the data from the wire? This is a question that comes up every now and then when trying to debug a service by looking at how it communicates. For the most insight, we should capture the encrypted traffic and use the (logged!) pre-master secret keys. See example details in make-master-secret-log discussing how to have HAProxy log them.
removing auditd / disabling logging
After installing auditd for testing purposes and removing it again, my kernel logs got flooded with messages. How do I disable them? If you happened to have installed auditd, it is likely that the kernel audit subsystem was enabled. Even when there are no rules left (auditctl -l) you can still get more messages in your kernel logs than before. For instance, after uninstalling auditd, I still get the following ones:
netplan / docker0 / bind on 172.17.0.1
If you want to bind your host-service to a the docker IP, exposing it to docker instances, means that that IP needs to exist first. If it doesn't, your log might look like this: LOG: listening on IPv4 address "127.0.0.1", port 5432 LOG: could not bind IPv4 address "172.17.0.1": Cannot assign requested address WARNING: could not create listen socket for "172.17.0.1" LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" As you probaby know, you cannot bind to an IP that is not configured on an interface anywhere — barring the net.
ansible / ipv6 addresses / without link local
Trying to get the IPv6 addresses as ansible fact, but getting unwanted link_local scope addresses? Maybe you're landing here because you did not see a solution at “all_ipv6_addresses includes link local addresses”. (I find it most odd that they locked the GitHub ticket, eliminating the possibility for anyone to reply with a fix.) The ansible_all_ipv6_addresses | reject("ansible.utils.in_network", "fe80::/10") construct works for me. For example: local-address=127.0.0.1, ::1, {{ ( ansible_all_ipv4_addresses + ( ansible_all_ipv6_addresses|reject("ansible.
zabbix server / jammy upgrade / missing font
The other day, we upgraded the host OS for our Zabbix Server from Ubuntu/Focal to Ubuntu/Jammy. This caused all text to go missing from the rendered graphs. The php (uwsgi) logs had the following to say: PHP Warning: imagettfbbox(): Could not find/open font in /usr/share/zabbix/include/graphs.inc.php on line 600 At least that's a pretty clear message. Through a quick php live hack we learned that it tried to open /usr/share/zabbix/assets/fonts/graphfont.ttf. This file was a symlink to /etc/alternatives/zabbix-frontend-font and that was a symlink to /usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.
postfix / no system resources / proxy protocol
Connecting to Postfix and getting a "421 4.3.2 No system resources"? Maybe you forgot you're using the (HAProxy) Proxy Protocol... If you're trying to connect to your Postfix mail daemon, and it looks like this: $ nc localhost 25 ... wait for 5 seconds ... 421 4.3.2 No system resources Then I bet you're using HAProxy as reverse proxy to your mailserver and you have the following configured: $ postconf | grep ^postscreen_upstream postscreen_upstream_proxy_protocol = haproxy postscreen_upstream_proxy_timeout = 5s To test a direct connection, you'll need to prefix your traffic with the proxy protocol v1 handshake.
oneliner / finding fixed kernel bugs
Recently we were bitten by an old kernel bug on Ubuntu that only rarely triggers. Finding out where the problem was is easier if you know where to look. We had no kernel logs to go on. Only a hanging machine with no output. And a hunch that the running kernel version linux-image-5.4.0-122 was the likely culprit. Was there another way than meticulously reading all changelogs and changes to find out which bug we're dealing with?
CephFS EINVAL specified for ceph.dir.subvolume
What could be the case when creating a subvolume in CephFS throws an EINVAL VolumeException? A client of ours recently got errors when creating a 24GiB subvolume: ceph fs subvolume create cephfs-filesystem \ example1 --group_name=thegroup \ --size=25769803776 But, instead of silence and a new subvolume, they got this in their face: a big Python backtrace. Error EINVAL: Traceback (most recent call last): File "ceph/mgr/volumes/fs/operations/versions/subvolume_base.py", line 271, in discover self.fs.stat(self.base_path) File "cephfs.
windows openvpn / unexpected default route
The other day, I was looking into a VPN client issue. The user could connect, they would get their routes pushed, but they would then proceed to use the VPN for all traffic instead of just the routes we provided them. We did not push a default route, because this VPN server exposed a small internal network only. Any regular internet surfing should be done directly. So, when I looked at a tcpdump I was baffled when I saw that DNS lookups were attempted through the OpenVPN tunnel:
Kubernetes CRL support with the front-proxy-client and Haproxy
“Why does kube-apiserver not take a CRL file?” Kubernetes clusters at OSSO are usually setup using PKI infrastructure from which we create client certificates for users as well. Unfortunately, users can sometimes be a little careless [citation needed] and sometimes they manage to share their keys with the world. PKI caters for lost certificates through the issuance of periodic (or ad-hoc) Certificate Revocation Lists (CRLs), in which the PKI admins can place certificates that have been compromised: if a certificate is listed in the CRL, it is revoked, and will not be accepted by the TLS agent.
django 1.8 / python 3.10
After upgrading a machine to Ubuntu/Jammy there was an old Django 1.8 project that refused to run with the newer Python 3.10. ... File "django/db/models/sql/query.py", line 11, in <module> from collections import Iterator, Mapping, OrderedDict ImportError: cannot import name 'Iterator' from 'collections' (/usr/lib/python3.10/collections/__init__.py) This was relatively straight forward to fix, by using the following patch. Some parts were stolen from a stackoverflow response by Elias Prado. --- a/django/core/paginator.py 2023-01-11 14:09:04.
sysctl / modules / load order / nf_conntrack
Recently we ran into an issue where connections were unexpectedly aborted. Connections from a NAT-ed client (a K8S pod) to a server would suddently get an old packet (according to the sequence number) in the middle of the data. This triggered the Linux NAT-box to issue a reset packet (RST). Setting the kernel flag to mitigate this behaviour required some knowledge of module load order during boot. Spurious retransmits causing connection teardown To start off: we observed that traffic from a pod to a server got disconnected.
recap 2022 - part 2 / stable diffusion
2022 – genereren van plaatjes door de computer Als je een beetje tech-nieuws hebt gevolgd, kan het je niet ontgaan zijn dat er allerhande sprongen in de kunstmatige intelligentie (AI) gebeurd zijn het afgelopen jaar. Machine learning (ML) bestaat natuurlijk al langer. Maar in 2022 ging de bal voor computer generated images echt rollen. En niet alleen images. GPT-3 en ChatGPT laten zien dat computergegeneerde teksten ook al verbluffend goed kunnen zijn.
recap 2022 - part 1
2022 – afgesloten met gezelligheid Bij OSSO hebben we het jaar afgesloten met een lekker kerstdiner. Sfeerlampjes aan 💡 Goed eten 🥩 Familie erbij 👪 Dat gezegd hebbende, beseffen we ons maar al te goed dat de Russische agressie in Oekraïne enorm leed veroorzaakt. We hopen dat de politiek genoeg wil kan tonen om David te blijven steunen in z’n strijd om de indringers weer buiten de landsgrenzen te zetten.
avoiding 255 / 31-bit prefixes
At OSSO, we've been using a spine-leaf architecture in the datacenter, using BGP and Layer 3 to the host. This means that we can have any IP address of ours just pop up anywhere in our network, simply by adding a prefix on a leaf switch. We sacrifice half of our IP space for this. But we gain simplicity by avoiding all Layer 2 tricks. TL;DR: Avoid IP addresses ending in .
supermicro / x9drw / quest for kvm
I'm connected to an “ancient” Supermicro machine — according to today's standards — that saw the light somewhere around 2013. I'm looking for a way to access the KVM module (Keyboard, Video, Mouse) so I can update it safely. You know, to be able to fix boot issues if they arise. Unfortunately, the firmware is rather old and I cannot get the iKVM application to run, like I'm used to.
chromium browser / without ubuntu snap / linux mint
In 2019, Clement "Clem" Lefebvre of Linux Mint, wrote these profetic words: “As long as snap is a solution to a problem, it’s great. Just like Flatpak, it can solve some of the real issues we have with frozen package bases. It can provide us with software we couldn’t otherwise run as packages. When it starts replacing packages for no good reason though, when it starts harming our interaction with upstream projects and software vendors and reducing our choice, it becomes a threat.
falco helm upgrade / labelselector field immutable
Today I got this unusual error when upgrading the Falco helm chart from 1.19.4 to 2.0+. Error: UPGRADE FAILED: cannot patch "falco" with kind DaemonSet: DaemonSet.apps "falco" is invalid: spec.selector: Invalid value: v1.LabelSelector{ MatchLabels:map[string]string{"app.kubernetes.io/instance":"falco", "app.kubernetes.io/name":"falco"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil) }: field is immutable The explanation is here as given by Stackoverflow user misha2048: You cannot update selectors for [...] ReplicasSets, Deployments, DaemonSets [...] from my-app: ABC to my-app: XYZ and then simply [apply the changes].
flipper zero multi-tool / developing
Here are some pointers on how to get started editing/developing plugins for the Flipper Zero multi-tool. (When writing this, the stable version was at 0.63.3. Things are moving fast, so some of the next bits may be outdated when you read them.) Starting Starting the Flipper Zero and adding an SD-card is documented in Flipper Zero first-start. Now you can use all the nice pentest features already included. The SD-card is necessary to unlock some features.
ubuntu jammy / ssh / rsa keys
With the new Ubuntu/Jammy we also get tighter security settings. Here are some aliases that will let you connect to older ssh servers. For access to old Cisco routers, we already had the first two options in this alias; we now add two more: # Alias on Ubuntu/Jammy with ssh 8.9p1-3+ to access old routers/switches: alias ssholdhw="ssh \ -oKexAlgorithms=+diffie-hellman-group1-sha1,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1 \ -oCiphers=+aes128-cbc,aes192-cbc,aes256-cbc,3des-cbc \ -oHostkeyAlgorithms=+ssh-rsa \ -oPubkeyAcceptedKeyTypes=+ssh-rsa" That fixes so we can connect to old Cisco and old HP equipment.
thunderbird / opening links / ubuntu
For some reason, opening links from Thunderbird stopped working. When clicking a URL, I expected Chromium to open the website, but nothing happened. After visiting a few bug reports and the Thunderbird advanced configuration, I turned my attention to xdg-open: $ xdg-open 'https://wjd.nu' ERROR: not connected to the gnome-3-38-2004 content interface. Okay. So it wasn't a Thunderbird problem at all. The culprit was that I had been doing some housekeeping in snap.
dnssec validation / authoritative server
The delv(1) tool is the standard way to validate DNSSEC signatures. By default it will validate up to the DNS root zone, for which it knows and trusts the DNSKEY. If you want to validate only a part of a chain, you'll need to know a few things. Regular DNSSEC validation Using delv is normally as simple as this: $ delv -t A @1.1.1.1 dnssec.works. ; fully validated dnssec.works. 3600 IN A 5.
nvme drive refusing efi boot
UEFI is the current boot standard. Instead of fighting it, we've adopted it as the default for all hardware machines we install. We've had some issues in the past, but they could all be attributed to a lack of knowledge by the operator, not by a problem with EFI itself. But, this time we couldn't figure out why the SuperMicro machine refused to boot from these newly installed EFI partitions: no bootable UEFI device found.
fat16 filesystem layout
First there was FAT, then FAT12, FAT16 and finally FAT32. Inferior filesystems nowadays, but nevertheless both ubiquitous and mandatory for some uses. And sometimes you need to be aware of the differences. A short breakdown of FAT16 follows — we'll skip the older FAT as well as various uncommon settings, because those are not in active use. Sector size The storage device defines (logical) sector sizes. This used to be 512 bytes per sector for a long time (we're skipping pre-hard disk tech), but this is now rapidly moving to 4096 bytes per sector on newer SSD and NVMe drives.
reading matryoshka elf / dirtypipez
While looking at the clever dirtypipez.c exploit, I became curious how this elfcode was constructed. On March 7 2022, Max Kellerman disclosed a vulnerability he found in Linux kernel 5.8 and above called The Dirty Pipe Vulnerability. Peter (blasty) at haxx.in quickly created a SUID binary exploit for it, called dirtypipez.c. This code contains a tiny ELF binary which writes another binary to /tmp/sh — the ELF Matryoshka doll. I was wondering how one parses this code — to ensure it does what it says it does, and just because.
rst tables with htmldjango / emoji two columns wide
For a project, we're using Django to generate a textual report. For readability, it is in monospace text. And we've done it in reStructuredText (RST) so we can generate an HTML document from it as well. A table in RST might look like this: +-----------+-------+ | car brand | users | +===========+=======+ | Peugeot | 2 | +-----------+-------+ | Saab | 1 | +-----------+-------+ | Volvo | 4 | +-----------+-------+ Transforming this to HTML with a rst2html(1) generates a table similar to this: