May 26, 2025

Network-Block-Device Copy-on-Write fixes

Berend De Schouwer

Fixing two copy-on-write bugs in the NBD server.

What is What?
The Bugs
- Sequential Read after Write
- Sparse Write at the Wrong Offset
The Patches

What is What?

NBD is a network block device protocol. It has some overlap with iSCSI, and a little with NFS. The protocol is much simpler than either, and has one extra feature: copy-on-write.

Copy-on-write allows for sharing the same file to multiple machines, and only writing changes back to disk. This can save a lot of storage space in certain situations.

There are two bugs, and two one-line fixes presented here.

The Bugs

Sequential Read after Write

For the first case, it’s enough to read and write more than one sequential block. The second and subsequent blocks read will read into the wrong offset of the buffer, and copy invalid data to the client.

I use a 4096 block size in this example, but I’ve used others. I did that to match the filesystem, but for the test I don’t even need a filesystem.

The Test

export OFFSET=0
export COUNT=3 # anything >= 1
dd if=/dev/urandom of=testdata bs=4096 count=$COUNT # random data
dd if=testdata of=/dev/nbd0 bs=4096 seek=$OFFSET count=$COUNT
dd if=/dev/nbd0 of=compdata bs=4096 skip=$OFFSET count=$COUNT
sum testdata compdata

The data, testdata and compdata, will be different.

If the kernel does a partition check when /dev/nbd0 is mounted, this test will fail with COUNT=1 as well.

Sparse Write at the Wrong Offset

For the second case, with sparse_cow=true, we need to repeat the test with an offset > 0. expwrite() calls write() instead of pwrite().

The Test

export OFFSET=100
export COUNT=3 # anything >= 1
dd if=/dev/urandom of=testdata bs=4096 count=$COUNT # random data
dd if=testdata of=/dev/nbd0 bs=4096 seek=$OFFSET count=$COUNT
dd if=/dev/nbd0 of=compdata bs=4096 skip=$OFFSET count=$COUNT
sum testdata compdata

The first time it’s run, it will result in an Input/Output error.

The second time it’s run, it will work.

The Patches

diff --git a/nbd-orig/nbd-server.c b/nbd-patched/nbd-server.c
index 92fd141..18e5ddd 100644
--- a/nbd-orig/nbd-server.c
+++ b/nbd-patched/nbd-server.c
@@ -1582,6 +1582,7 @@ int expread(READ_CTX *ctx, CLIENT *client) {
                        if (pread(client->difffile, buf, rdlen, client->difmap[mapcnt]*DIFFPAGESIZE+offset) != rdlen) {
                                goto fail;
                        }
+                       ctx->current_offset += rdlen;
                        confirm_read(client, ctx, rdlen);
                } else { /* the block is not there */
                        if ((client->server->flags & F_WAIT) &&
(client->export == NULL)){

diff --git a/nbd-orig/nbd-server.c b/nbd-patched/nbd-server.c
index 92fd141..9a57ad5 100644
--- a/nbd-orig/nbd-server.c
+++ b/nbd-patched/nbd-server.c
@@ -1669,7 +1669,7 @@ int expwrite(off_t a, char *buf, size_t len,
CLIENT *client, int fua) {
                                if(ret < 0 ) goto fail;
                        }
                        memcpy(pagebuf+offset,buf,wrlen) ;
-                       if (write(client->difffile, pagebuf, DIFFPAGESIZE) != DIFFPAGESIZE)
+                       if (pwrite(client->difffile, pagebuf, DIFFPAGESIZE, client->difmap[mapcnt]*DIFFPAGESIZE) != DIFFPAGESIZE)
                                goto fail;
                }
                if (!(client->server->flags & F_COPYONWRITE))

posted at 07:00 · programming · programming C network

May 25, 2025

Logging Made Easy

Berend De Schouwer

A number of programs will not log to syslog, only logging to a file they control, or to a custom log server. This post describes how to get one such program to write to syslog.

TL;DR
Problem Description
Assumptions
What is Syslog?
What is Modsecurity?
Why Do We Want Syslog?
Pet Peeve

TL;DR

Setup a FIFO

/etc/systemd/system/journal-pipe-modsecurity.socket

[Unit]
Description=Journal Pipe
Documentation=man:systemd-journald.service(8) man:journald.conf(5)
DefaultDependencies=no
Before=sockets.target
IgnoreOnIsolate=yes

[Socket]
ListenFIFO=/run/systemd/journal/pipes/modsecurity
ReceiveBuffer=8M
Accept=no
Service=journal-pipe-modsecurity.service
SocketMode=0660
Timestamping=us
# Access control for the FIFO
SocketUser=some-user
SocketGroup=some-group

Setup a Service

/etc/systemd/system/journal-pipe-modsecurity.service

[Unit]
Description=Journal Pipe for Modsecurity
After=network.target journal-pipe-modsecurity.socket
Requires=journal-pipe-modsecurity.socket

[Service]
Type=simple
StandardInput=fd:journal-pipe-modsecurity.socket
ExecStart=/usr/bin/systemd-cat --identifier=modsecurity
TimeoutStopSec=5
Group=some-group
PrivateTmp=yes
DynamicUser=yes
ProtectHome=yes

[Install]
WantedBy=default.target

Configure Modsecurity

modsecurity.conf

SecAuditLogType Serial
SecAuditLog /run/systemd/journal/pipes/modsecurity

Problem Description

Some programs will only log to a file, or to a custom logger. There is no way to configure them to write to syslog, or a standard logger.

Good system administrators want to use syslog, or another standard logger. In this post, we’re going to get Modsecurity to log to syslog, which is an often requested feature.

Assumptions

We have an application, that can write logs, and we are interested in those logs.

That application insists on writing logs to a file, or to a custom log manager. A custom log manager means yet another application installed, and configured, and storage allocated.

When the application is configured to write logs to a file, the application must be configured to rotate the logs based on time or size, and must manage the log size.

What is Syslog?

Syslog is a Unix-y standardised logger. For this post, it’s only important that it’s standardised across the OS. In fact, we’re actually going to use journald, and not syslog.

Syslog accepts logs on a number of interfaces.

A file interface at /dev/log
A socket interface
A UDP network interface

Syslog then writes the logs using a timestamp, a hostname, and an application and PID where possible.

Various syslog (and journal) servers exist that can

Limit diskspace for logs
Send logs to another drive, or another server, or a printer
Manage logs before the filesystem is read-write, reducing lost logs
Rotate logs without losing logs
Throttle logs (unfortunately losing some) to prevent DoS attacks

What is Modsecurity?

Modsecurity is a web application firewall (WAF). It follows rules, and prevents or allows HTTP requests.

Logically, it sits between the web server and a web application. Usually the web server that runs Modsecurity acts as a proxy in front of a Java or PHP web application.

Think of it as an anti-virus or firewall for the web.

Why Do We Want Syslog?

Standardising the logs gives a number of benefits.

Viewing Multiple Logs, Ordered

It’s often useful to view multiple logs, ordered sequentially, to track bugs or security problems. This is doubly true for a program like Modsecurity, that hooks into a webserver — usually as a proxy — and an application server.

When something goes wrong, we can get the HTTP context from the webserver logs, the application context from the application server logs, and the security logs from the WAF, all ordered.

We also have timestamps in the same format.

Space Allocation

If the logs are all in a specific system, we can decide to allocate space on a specific drive, optimised for writes, separate from any database.

Legal Data Retention

We may need to keep certain logs for legal reasons.

Separate Secure Storage

Syslog servers can send their data to a separate server. In the best cases, to a server without an IP number. This means that any attackers cannot delete the logs.

That’s a major security help.

Compartementilisation

The syslog server does not have to run as the same user as the application. That means that when an attacker breaches security, and has access rights similar to the application, the attacker does not have permissions to delete the logs.

That’s also a major security help.

Separate Backups

If your application writes and manages your logs, chances are your backup system has to backup and restore those logs. It’s almost always better to manage the backups of logs, and the backups of the application data separate.

Pet Peeve

Applications that do their own logging is a pet peeve of mine. There are almost always bugs.

Logging is harder than you think, and mistakes are common.

For example, Tomcat — in it’s recommended configuration — will still claim diskspace of long-since deleted logs.

posted at 17:00 · configuration · configuration programming

Apr 19, 2025

External Storage Mistakes

Berend De Schouwer

A description of a firmware bug in external USB storage that cause disk error reports, and the way to avoid them.

Problem Description
Explanation
Root Cause
Solution
Thanks

Problem Description

When you connect an external USB drive you may see:

sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
sd 0:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current] 
sd 0:0:0:0: [sda] tag#0 Add. Sense: Invalid command operation code
sd 0:0:0:0: [sda] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 00 00 22 00 00 00 06 00 00
critical target error, dev sda, sector 34 op 0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0
critical target error, dev sda, sector 40 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 2

If the following are true, you have the same problem:

You see DISCARD or UNMAP
You see Write same(16), and

The external storage is a spinning disk You delete a lot of data, or reformat the disk

Explanation These commands are all the same as what is also known as trim, which is used to tell an SSD disk to mark a data area as re-usable. Spinning disks do not support trim, and so reject it. The kernel is attempting to run the command because the USB enclosure told the kernel that it supports the commands writesame16 or unmap. The error is reported because that command fails. Root Cause The root cause is that the USB enclosure supports the command in firmware, and the spinning disk does not. The USB enclosure fails to negotiate this command with the harddrive before negotiating it with the OS. This happens when manufacturers use the same USB chips for SSD and spinning disk drives on the cheap. Solution The solution is to tell the kernel that this does not work, for this device, by using a udev rule, eg. /etc/udev/rules.d/99-cheap-disk.rulesACTION=="add", SUBSYSTEM=="scsi_disk", SUBSYSTEMS=="scsi", ATTRS{vendor}=="WD", ATTRS{model}=="My Passport *", OPTIONS="log_level=debug", PROGRAM="/usr/bin/logger -t udev/99-cheap-disk Found cheap disk", ATTR{provisioning_mode}="disabled" In my case, the bad harddrive was a “Western Digital My Passport 2626“, with revision 1034. Thanks For the exact same problem and solution for SAN/NAS, see: Chris Hofstaedtler posted at 06:00 · firmware · firmware

Feb 09, 2025 Garmin Index S2 Scale and Encryption Berend De Schouwer A deep dive on firmware bugs that prevent Garmin Index S2 scales from connecting to encrypted Wifi networks. I describe some of the problems, a solution of sorts if you’re a network administrator, and a guess as to the root cause. I do not, in this article, reverse engineer the firmware. Problem Description Official requirements What we (customers) want Compliant Wifi What worked yesterday, to work today Resetting the scale is difficult Resetting the scale is unnecessary Updating Garmin Documentation. Wifi connecting 🛜 Data Syncing 🔁 Done ✅ Actual Syncing Problems Firewall Issues ISP Issues Scale Sync Network Activity DHCP DNS NTP HTTP HTTPS Accumulated Problems Problem Details TCP 101 TCP 102 What is Observed What Else is Dropped? TCP 201 Workarounds that Don’t Work Different Wifi All Alone Different Encryption on Wifi Quality of Service Different Channel on Wifi Different Garmin Computer Captcha Adjusting the MTU Hardcode ARP Workarounds that Work No Encryption on Wifi Hack the RTO Custom Kernel Enable custom kernel Proxy Probable Cause Problem Description Garmin Index S2 scales are notorious for not connecting reliable to a number of Wifi Networks. For example, read Garmin Forums Quite often the scale won’t connect, or will connect and not sync, and it’s display is not clear, and not well documented to find the fault. For other frustrations, you can read my previous blog post Garmin Index Scale Firmware Problems Official requirements The official requirements are: 2.4 GHz (no 5.0 or 6.0) 802.11 ac, b, g or n (no 802.1x) Channels 1-11 only No hidden SSIDs Security: Unencrypted, WPA and WPA2 Passwords must be at least 8 characters Some of these requirements are simply due to the age and power of the embedded controller. It was never going to support 5.0 GHz, for instance. What we (customers) want Compliant Wifi Encrypted, at least WPA2 Channels 1-13 A Wifi access point can typically be configured for a specific channel, or for all channels. No Wifi access points allow you to specify a channel range. If you are outside the US, this means: Exactly one channel, or Channels 1-13. So you are reduced to running one channel. Luckily the scale does actually connect on channels 1-13. It’s doubtful the chip would be certified in the countries it’s sold, if it doesn’t. So we will ignore the first problem. What worked yesterday, to work today The scale frequently gets stuck, and the usual support response is to reset the scale. Resetting the scale is difficult It requires tapping a button on the bottom, and simultaneously viewing the top display. Testing requires putting significant weight on top of the scale, but your finger is still tapping the bottom. Resetting the scale is unnecessary Frequently the scale is actually still working. The problem is that the display isn’t communicating to you, the user, what’s happening. Quite often it’s busy, and you should simply wait. It does this by flashing an hourglass —- ⌛ —- and then switching the display off. To the average user, this looks like the scale fails to switch on. The correct thing to do is wait 5 minutes. Updating Garmin Documentation. Lets first update the documentation. Lets create a useful Wifi manual for the Garmin Index S2 Wifi connection status. There are three icons: Wifi 🛜 Sync 🔁 Done ✅ Wifi connecting 🛜 While 🛜 blinks, the scale is connecting to Wifi. If it stops blinking, it has connected to Wifi, and your WPA2 password works. As soon as this happens, you no longer need to reset your scale. Data Syncing 🔁 While data is syncing, 🔁 is animating. At this point, the scale is talking over the network to Garmin servers. Under certain conditions this can take a long time, and the display will switch off. It will still be syncing in the background, though. Note: If the display switches off in this state, it’s power-saving the display. It has not switched off. If you power on the scale, you will see an hourglass ⌛. Wait 5 minutes. Be patient. Done ✅ It’s done. Actual Syncing Problems If the data never syncs, read on. If the data never syncs, you may have: Firewall issues (not if you didn’t have them yesterday) ISP issues (ping connect.garmin.com) Hit a firmware or controller bug (the rest of this blog post) Note: At this point the Wifi is connected. The scale found your Wifi, your SSID, and has negotiated encryption. Firewall Issues This is simple. If you don’t know what a firewall is, you didn’t break it. If you haven’t modified your router settings —- if your ISP allows it —- you didn’t break it. If you have edited your firewall, roll back, try again. ISP Issues Do the normal tests. Connect to any other site and check that Garmin is up. If these work, your ISP is probably not down. Scale Sync Network Activity How does the scale sync with Garmin Connect? I’m glad you asked. In order, the scale uses the following protocols. DHCP DNS NTP HTTP HTTPS These steps are fairly normal and expected. What is so unexpected is that step 5 fails when all the other steps work, and how it fails. DHCP This is how the scale gets an IP, and part of the Wifi negotiation. It’s fairly standard. The most notable is that the client identifier is GarminIntern and the host name is WINC-00-00. DNS It then does a DNS lookup, to do an NTP sync. It looks up two machines: time.google.com time.garmin.com This is notable because time.garmin.com does not exist. The scale will do more DNS lookups as we go along. They tend to work fine. NTP The scale syncs it’s clock. This, again, is normal. It’s necessary because it will later use HTTPS, and that requires a valid clock. time.garmin.com doesn’t exist, but it doesn’t stop the clock from syncing. The scale also does normal NTP on time.google.com and another NTP call on clock.garmin.com on port 4123. HTTP The scale proceeds to send POST /OBN/OBNServlet to gold.garmin.com. This seems to be mainly to get a Cloudflare response, for example CF-RAY. This doesn’t usually fail, but the errors start here. The reason it doesn’t fail outright is that it will retry, and eventually a retry will work before the clock runs out. HTTPS Now the scale starts sending data to: services.garmin.com api.gcs.garmin.com connectapi.garmin.com omt.garmin.com At this point the errors accumulate, and eventually the clock does run out. The errors slow down the connection to the point where the scale fails to send it’s data inside the 5 minute timeout. Accumulated Problems So what exactly fails? Once data flows, the scale fails to acknowledge the server’s TCP ACK packets about 80% of the time. If too many packets are missed, the server closes the connection, and the scale tries again. Once the scale retries too many times, it gives up. Since this happens ±80% of the time, and multiple connections are made, the scale fails very often. Every now and then it works. Problem Details TCP 101 TCP was created to transfer data without having the application worry about reliability. Data gets chopped up, usually in ±1500 bytes¹². If it needs to get chopped up further, the OS gets notified. TCP also takes care of putting the data back together again. This can be more difficult than just Packet 1 + Packet 2. Packet 2 can arrive before Packet 1 Packet 2 can get lost, requiring retransmission Let’s look at this in a bit more detail. Let’s say the scale wants to send 5000 bytes of data. It gets chopped up into 1500 bytes. Time Sender Recipient Sequence Length Acknowledgement 00:01 Scale Server 0 1500 0 00:02 Server Scale 0 0 1500 00:03 Scale Server 1500 1500 0 00:04 Server Scale 0 0 3000 00:05 Scale Server 3000 1500 0 00:06 Server Scale 0 0 4500 00:07 Scale Server 4500 500 0 00:08 Server Scale 0 0 5000 As you can see, the acknowledgements tell the scale how much data the server has received, and from where to continue.³⁴ TCP 102 Of course, you can’t just start sending data. You have to tell the server you want to send data, and the server must accept, so it goes something like: Handshake SYN client → server SYN/ACK server → client ACK client → server Data, as above. ACK client → server ACK server → client … Stop⁵ FIN/ACK client → server FIN/ACK server → client ACK client → server What is Observed The scale starts sending data, but the server’s packets aren’t received. The scale then eventually resends packets. Something like: Time Sender Recipient Sequence Length Acknowledgement 00:01 Scale Server 0 1500 0 00:02 Server Scale 0 0 1500 00:03 Scale Server 1500 1500 0 00:04 Server Scale 0 0 3000 00:06 Server Scale 0 0 3000 00:10 Server Scale 0 0 3000 00:18 Server Scale 0 0 3000 00:34 Server Scale 0 0 3000 00:35 Scale Server 3000 1500 0 00:36 Server Scale 0 0 4500 Note: The time increases exponentially, as the server asks for more data. If this happens too often, the scale times out, and no data is sent.⁶ This means the scale doesn’t receive the ACK packets from the server. What Else is Dropped? At the start, nothing. DHCP, DNS and NTP all work. Once the scale starts using HTTP (over TCP), packet drops start. This is usually about 15 seconds after the scale connects to the Wifi network. However, once the packet drops start, other packets are dropped to. Ping ICMP packets to check if the scale is up. ARP Ethernet packets to match an IP to a MAC addressees. ARP being dropped is very interesting. When ARP goes, everything stops until ARP is answered. This would indicate that Wifi encryption updates might get dropped too. TCP 201 The retransmitted acknowledgements need not come from the actual server, although they look like they do. They can and often do come from a router or firewall in the middle as a performance optimization. Workarounds that Don’t Work Different Wifi I am in the lucky position to try multiple Wifi routers, so I did. I tried 3 different ones. All Alone Because I tried multiple routers, the scale was the only device on the network. There was no traffic congestion, and no competition. This did not help. Different Encryption on Wifi Encrypted Wifi can be WPA or WPA2⁷, and support different authentication methods and encryption standards. WPA uses TKIP, WPA2 uses CCMP. Nope, this did not help. Note: Different router hardware might not let you set some protocols, since it may be handled in the Wifi network hardware. Quality of Service Boost Garmin IP networks. Boost the scale. Boost empty ACKs and retransmitted ACKs. Nope. Different Channel on Wifi The Garmin Index S2 Scale officially only supports channels 1-11. Since I’m not in the US, Wifi equipment usually uses channels 1-13, and uses different frequency blocks. Note: To get certified for sale in non-US countries (like the EU), the scale would be tested by the local regulator, and must pass local wireless regulations. Therefore I do not believe the official documentation —- the scale would be illegal for sale. However, I did try hard coding channels 1, 6, 11, and different country regulations on the Wifi router. This did not make a difference. Different Garmin The various Garmin services resolve to multiple IP addresses. This is likely for load balancing. Modify the DNS on the router to supply specific ones. This didn’t help. Note: This might not make as much of a difference as you think, since they are behind Cloudflare, and the CDN will intercept and reroute these connections. Computer Captcha Since it’s behind Cloudflare, the scale might be hitting Cloudflare’s captcha protection. Maybe HTTP and HTTPS don’t work unless the scale can prove it’s human. Network dumps prove this is not what’s happening. They do reveal Garmin’s ID and other Cloudflare details, that I will not post in this blog. Adjusting the MTU A common problem in network stacks is that they don’t notice when the MTU needs adjusting. This can and does happen when the transport changes, for example when it changes from Wifi to Cable. Yes, I did try adjusting this, both larger and smaller. And I did check for DNF and Fragmentation Needed. Hardcode ARP This is easy, and it removes some network packets from the equation. It does not fix the problem, though. Workarounds that Work No Encryption on Wifi This works perfectly, but it’s not acceptable in my household. No encryption does result in no retransmissions, though. Hack the RTO The retransmissions are also called TCP RTO (retransmission timeout). We can reduce this, and increase the number of RTO packets that the router sends, to greatly increase our chances of sending a retransmission at the moment the scale is listening again. This does work, but it requires two things: A custom kernel Enable custom kernel A proxy Custom Kernel We need a custom kernel because we’re going to move these values out of the TCP specification. Since this network isn’t used for anything else, let’s do it. There are a number of queries on mailing lists for this. However, there aren’t a great many final solutions. As I said, it’s outside spec. I’m including a diff here, in case other people want it. This Increases the number of retries, so it doesn’t stop before the scale times out. Decreases the highest timeout value, so we will retry much quicker. Sets every connection to thin, meaning that the timeouts increase linearly instead of exponentially. This diff is for Linux 6.12, and should work on Debian Stable. diff --git a/include/net/tcp.h b/include/net/tcp.h index b3917af30..cce7a5350 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -90,14 +90,14 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCP_URG_NOTYET 0x0200 #define TCP_URG_READ 0x0400 -#define TCP_RETR1 3 /* +#define TCP_RETR1 8 /* * This is how many retries it does before it * tries to figure out if the gateway is * down. Minimal RFC value is 3; it corresponds * to ~3sec-8min depending on RTO. */ -#define TCP_RETR2 15 /* +#define TCP_RETR2 30 /* * This should take at least * 90 minutes to time out. * RFC1122 says that the limit is 100 sec. @@ -138,8 +138,8 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCP_DELACK_MIN 4U #define TCP_ATO_MIN 4U #endif -#define TCP_RTO_MAX ((unsigned)(120*HZ)) -#define TCP_RTO_MIN ((unsigned)(HZ/5)) +#define TCP_RTO_MAX ((unsigned)(5*HZ)) +#define TCP_RTO_MIN ((unsigned)(HZ/10)) #define TCP_TIMEOUT_MIN (2U) /* Min timeout for TCP timers in jiffies */ #define TCP_TIMEOUT_MIN_US (2*USEC_PER_MSEC) /* Min TCP timeout in microsecs */ @@ -226,7 +226,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCP_NAGLE_PUSH 4 /* Cork is overridden for already queued data */ /* TCP thin-stream limits */ -#define TCP_THIN_LINEAR_RETRIES 6 /* After 6 linear retries, do exp. backoff */ +#define TCP_THIN_LINEAR_RETRIES 60 /* After 6 linear retries, do exp. backoff */ /* TCP initial congestion window as per rfc6928 */ #define TCP_INIT_CWND 10 diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index b65cd417b..e5656e919 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -639,7 +639,7 @@ void tcp_retransmit_timer(struct sock *sk) */ if (sk->sk_state == TCP_ESTABLISHED && (tp->thin_lto || READ_ONCE(net->ipv4.sysctl_tcp_thin_linear_timeouts)) && - tcp_stream_is_thin(tp) && + //tcp_stream_is_thin(tp) && icsk->icsk_retransmits <= TCP_THIN_LINEAR_RETRIES) { icsk->icsk_backoff = 0; icsk->icsk_rto = clamp(__tcp_set_rto(tp), Enable custom kernel To get the full effect, you may need to enable thin streams on some Linux distributions. echo 1 > /proc/sys/net/ipv4/tcp_thin_linear_timeouts Add it to /etc/sysctl.conf or it’s subdirectories Proxy We only control TCP RTO on local sockets and some NAT-ed connections, so we need to setup a proxy. I tried this first with HTTP, using Squid, and it worked. However I also needed to do HTTPS, and then encryption certificates make it hard work. However, I’m not looking inside the packets. I don’t need to decrypt them, just forward them. I’m just interested in modifying TCP RTO, so I can treat HTTPS like any other TCP Socket. The easiest way to do this is with a systemd socket or xinetd redirect and NAT. You will need one per Garmin IP address. Example xinetd config: service garmin-gold_garmin_com { type = UNLISTED socket_type = stream protocol = tcp wait = no user = nobody bind = 0.0.0.0 port = 3129 only_from = **my_network** redirect = gold.garmin.com 80 } and a nat rule: table inet filter { chain prerouting { type nat hook prerouting priority -100; policy accept ip saddr **scaleip** ip daddr { gold.garmin.com } tcp dport { 80 } dnat to 10.43.0.1:3129 } } Probable Cause This only happens when the network is encrypted. The scale is not recording the server’s data, and the scale is also not sending errors back. This could happen if: the controller is too slow to decrypt the firmware decides the packets are corrupt an interrupt goes missing not enough RAM, hence forced to drop Which exactly it is is unknown, and where you draw the line between controller and firmware can be a grey area. For example, in most computers, the network card will perform some calculations on the packets instead of the OS. This is called hardware off-loading, but in practice it happens in firmware. This can be smaller when data reaches a different carries, like a VPN. ↩ New networks allow for much bigger Jumbo frames. ↩ TCP will usually send a few packets in a row, for performance. ↩ The sequence numbers and time are not realistic. ↩ There are more types of packets, like PSH and RST, but it’s not necessary to describe this problem. ↩ This time increase depends on a number of TCP network stack settings, and can be both set manually and auto-adjusted based on network conditions, so these timestamps are purely for illustration. ↩ And now WPA3, but not for the scale. ↩ posted at 16:00 · IOT · garmin firmware wifi iot Dec 07, 2024 DSLR webcam and pipewire"">DSLR webcam and pipewire Berend De Schouwer tl;dr It used to be possible to use a DSLR as a webcam, or in OBS, on Linux, and with the move to pipewire this broke. Here’s the code to make it work again: gst-launch-1.0 \ clockselect \ v4l2src \ device=/dev/video0 \ ! queue ! videoconvert \ ! pipewiresink mode=provide stream-properties="properties,media.class=Video/Source,media.role=Camera" \ client-name=DSLR background On Linux, you can use your DSLR as a webcam, and have any application use it. What you need is: * gphoto2 * ffmpeg * v4l2loopback (kernel module) The recipe can be slightly different based on camera functionality like frame rate and resolution, but it basically follows gphoto2 --set-config liveviewsize=2 \ --stdout --capture-movie \ | ffmpeg -i - \ -re \ -vcodec rawvideo \ -pix_fmt yuv420p \ -threads 2 \ -f v4l2 /dev/video0 Most DSLRs are supported. With the move to pipewire, another step is needed. This blog post describes that step. You may need to change: liveviewsize=2 Different cameras will have different options -re -re should make ffmpeg match the native framerate, which may save CPU cycles clock bug The output of the gst command should have an advancing clock. If the clock remains stuck on 00:00:00, you will need to add clockselect to the gst-launch command at the top. This is related to Pipewire Regressin 4389 when Right now, December 2024. Firefox, OBS and Chrome are moving to pipewire. The move has happened in some distributions, and is in the process in others. firefox Look in about:config for media.webrtc.camera.allow-pipewire chrome Look in chrome://flags for Pipewire Camera Support who pipewire aims to improve the handling of audio and video on Linux, and it’s pretty good at that. You’ll need version 1.2.6 or later for this to work. Look for libpw-v4l2.so. what A DSLR camera, attached via USB, that you used to use using gphoto2 and v4l2loopback where wpctl status Look for a section like: Video ├─ Devices: │ 56. ... │ ├─ Sinks: │ ├─ Sources: │ * 64. USB2.0 FHD UVC WebCam (V4L2) │ 93. DSLR You want your DSLR to show up under Sources. You can then get more information using pw-cli info 93 why Update to pipewire. It really is better. how After running gphoto2 the way you normally do, run the command at the top. You can name your DSLR something else if you like. Test it using wpctl status and pw-cli info posted at 16:00 · Photography · photography Apr 05, 2024 ifunky xz Berend De Schouwer xz and openssh There was a briefly successful attempt to add a backdoor to /usr/bin/sshd on Linux. There are a lot of discussions of how, when, what and why already on the Internet. I’ll include a few links below. Most posts fall in one of two camps: brief description detailed description, starting with a .m4 file. This post instead will try to turn this into a story. What are the attacker’s goals, and how can they best be achieved? Hopefully this different perspective will make it understandable to a different audience, which will help prevent this in the future. goals The attackers goal is to add a backdoor to some software that is both commonly installed, and has privileged access. Some criteria for a good back door: The door is hidden The door can be used unoticed The door has a lock, so others can’t use it Because it’s open source, the door plans are hidden too. location of the plans The obvious location is OpenSSH. However, that project is very well run. That means hiding the plans becomes difficult. Well, where else can we hide it? Let’s look: $ ldd /usr/sbin/sshd /lib64/ld-linux-x86-64.so.2 libcrypt.so.1 => /lib64/libcrypt.so.1 libaudit.so.1 => /lib64/libaudit.so.1 libpam.so.0 => /lib64/libpam.so.0 ... liblzma.so.5 => /lib64/glibc-hwcaps/x86-64-v3/liblzma.so.1.2.3 libzstd.so.1 => /lib64/glibc-hwcaps/x86-64-v3/libzstd.so.4.5.6 ... It turned out that liblzma, from xz utils is a good choice. On a different day, or a different project, one of the others could have worked. chapter 1: place the door front door If we modify liblzma, how does that open a door in /usr/sbin/sshd? For that, we need ifunc(). This behaves a little like $LD_PRELOAD, in that we can override functions. It has advantages for the attack: It’s less well known. Most setuid binaries filter $LD_PRELOAD It’s better hidden. Let’s give a quick ifunc() example: int add_numbers_fast(int x, int y) { return x + y; /* add */ } int add_numbers_slow(int x, int y) { return x + y; /* add */ } int add_numbers(int x, int y) __attribute__((ifunc ("resolve_add_numbers"))); static void *resolve_add_numbers(void) { if (1) return add_numbers_fast; else return add_numbers_slow; } Compile with gcc -fPIC -shared add_numbers.c -o add_numbers.so This has two implementations of add_numbers(), and based on some criteria — like CPU — we pick one. This is fairly common for performance-critical functions. This is also true for some encryption functions in OpenSSH. We would then use it with #include <stddef.h> #include <stdio.h> extern int add_numbers(int x, int y); int main(int argc, char *argv[]) { int x, y; x = y = 3; printf("add_numbers(%d, %d) = %d\n", x, y, add_numbers(x, y)); return 0; } Output: add_numbers(3, 3) = 6 backdoor Suppose the library includes another implementation, like so: extern int add_numbers(int x, int y); int add_numbers_fake(int x, int y) { return x * y; /* multiply instead! */ } int add_numbers(int x, int y) __attribute__((ifunc ("resolve_add_numbers"))); static void *resolve_add_numbers(void) { return add_numbers_fake; } Now we can get modified the output: add_numbers(3, 3) = 9 a note on compiling To compile the examples¹, it’s best to split compilation and linking, like gcc -fPIC -shared add_numbers.c -o add_numbers.so gcc -fPIC -shared add_numbers_backdoor.c -o add_numbers_backdoor.so gcc -c ifunc_example.c -o ifunc_example.o ld /usr/lib64/crti.o /usr/lib64/crtn.o /usr/lib64/crt1.o \ -lc ifunc_example.o \ add_numbers_backdoor.so add_numbers.so \ -dynamic-linker /lib64/ld-linux-x86-64.so.2 \ -o ifunc_example Then you can swap implementations by swapping the order of add_numbers.so and add_numbers_backdoor.so chapter 2: hiding the door Now that we have something to include, we need to hide it. We cannot add random function names to xz utils: it will get noticed. A number of systems do automatic checks, like: $ file * add_numbers_backdoor.c: C source, ASCII text add_numbers_backdoor.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=6883795af4a391d0f9cb256aea233498a37ba668, with debug_info, not stripped add_numbers.o: ELF 64-bit LSB relocatable, x86-64, version 1 (GNU/Linux), not stripped ifunc_example.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped Adding a binary blob that matches an ELF object will be suspicious. What we can do is add the file encrypted. In xz this is easy, because it includes binary test data. A very simple encryption scheme, that shifts a -> b, b -> c, d -> e, … # encrypt echo 'hello world' | tr '[a-z]' '[b-z]a' ifmmp xpsme To decrypt # decrypt echo ifmmp xpsme | tr '[a-z]' 'z[a-y]' hello world Now we simply need to decrypt it before compiling.² We can add the following to either the .m4 file or Makefile. Our choice. cat myfile | tr '[a-z]' 'z[a-y]' > myfile.c gcc myfile.c chapter 3: hiding the use Now that we can include some code in OpenSSH, where is the best place to put it? before login, so we’re still root before sandboxing³, so we have full access after encryption started, so the commands are encrypted This is why the exploit overrides RSA_public_decrypt. links Links to the actual files, and diagnosis of the actual files: original openwall announcement faq The real OpenSSH uses different link options. One of the things the attacker did was change the link flags to include -Wl,now. ↩ Yes, it was an object file, and double (or tripple) encrypted. It does actually use tr. ↩ sshd is normally sandboxed in /var/empty. ↩ posted at 09:30 · Security · security Mar 01, 2024 finger webfinger Berend De Schouwer Finger and Webfinger Finger and Webfinger answer the same question: “what information is available about this user?” This blog was written because both were harder to get going than necessary. Webfinger Webfinger does something similar. Query my mastodon handle, @berend@emptybox.deschouwer.co.za, and you should get something like { "aliases": [ "https://emptybox.deschouwer.co.za/nextcloud/index.php/index.php/apps/social/@berend", "https://emptybox.deschouwer.co.za/nextcloud/index.php/u/berend" ], "links": [ { "href": "https://emptybox.deschouwer.co.za/nextcloud/index.php/u/berend", "rel": "http://webfinger.net/rel/profile-page", "type": "text/html" } ], "subject": "berend@emptybox.deschouwer.co.za" } It’s a good API to find more information about a specific user. Finger Finger is old. It’s from 1991, 5 years before http. You can see it in action by running finger berend@berend.deschouwer.co.za The original finger server was horribly insecure. This server is not running that. Webfinger Back The Webfinger backend is provided by the Social app on Nextcloud. I’m going to document some of the pitfalls here, since: An installation step isn’t documented Some of the error reporting is misleading. All of the official help is “configure your webserver/proxy”, even when the answer isn’t that. google-ing this doesn’t help, since everything redirects you back to webserver configuration help. Configure your webserver Once, for Nextcloud Yes, you do need to configure your webserver. Under your nextcloud instance, when logged in as admin, navigate to security. If Nextcloud Admin Security Check complains with a similar message Your web server is not configured correctly to resolve “/.well-known/caldav”. More information can be found on our documentation. Your web server is not configured correctly to resolve “/.well-known/carddav”. More information can be found on our documentation. You need to fix your webserver or proxy. Follow Google. The solutions are complete. Configuring Nextcloud Twice, for the Social App in Nextcloud Nope, you don’t need to do it twice. If the Social App complains with a similar message .well-known/webfinger isn’t properly set up! Social needs the .well-known automatic discovery to be properly set up. If Nextcloud is not installed in the root of the domain, it is often the case that Nextcloud can’t configure this automatically The problem is not your webserver. It’s the Social App Configuring the App (the missing installation step) You will need the CLI occ from Nextcloud. You may have used it to perform backups or upgrades. It’s in the webroot, and you will usually run it something like: sudo -u www /var/www/nextcloud/occ backup First, look at the config. Run: cd /var/www/nextcloud/ sudo -u www occ config:list Look for the Social app. "social": { "address": "https:\/\/emptybox.deschouwer.co.za\/", "enabled": "yes", "url": "https:\/\/emptybox.deschouwer.co.za\/nextcloud\/index.php\/apps\/social\/", "social_url": "https:\/\/emptybox.deschouwer.co.za\/nextcloud\/index.php\/index.php\/apps\/social\/", "cloud_url": "https:\/\/emptybox.deschouwer.co.za\/nextcloud\/" }, Ensure that these values are OK. If they are not, run sudo -u www php ./occ config:app:set --value https://emptybox.deschouwer.co.za/overthere social url Three, misleading errors Webfinger not supported Now you get to try: http --follow 'https://emptybox.deschouwer.co.za/.well-known/webfinger?resource=acct%3Aberend' If you get 404 and { "message": "webfinger not supported" } Don’t panic! It doesn’t mean webfinger isn’t supported, it means the account specified isn’t a valid account. You didn’t specify a domain. Add a domain. Webfinger is empty and 404 http --follow 'https://emptybox.deschouwer.co.za/.well-known/webfinger?resource=acct%3Aberend%40example.com' If you get 404 and "" Dont panic! It means that the specified acocunt (berend@example.com) is not on this server. Your Nextcloud instance isn’t example.com, so you should specify an account you are authoritive for. Finger Back The backend for 1991 finger is a simple script that serves my whoami page. Since the actual finger servers are insecure, I decided to re-implement it again. Since that means even more insecure, I simplified everything. systemd to the rescue This is the reason for writing this section. systemd can really help us locking it down. [Unit] Description=Finger Per-Connection Server [Service] ExecStart=/usr/.../.py # Keep some things secret StandardInput=socket # Send errors to the logs, instead of to the caller StandardError=journal SyslogIdentifier=finger # Don't run as root DynamicUser=yes # Don't read /tmp, /dev, /home PrivateTmp=true PrivateDevices=true ProtectHome=true # Restrict for DoS MemoryMax=100M Nice=19 IOSchedulingClass=best-effort IOSchedulingPriority=7 CPUQuota=50% IOWeight=25 # Restrict OS calls SystemCallFilter=@system-service SystemCallErrorNumber=EPERM ProtectSystem=strict RestrictSUIDSGID=true MemoryDenyWriteExecute=true # Don't load calls that are frequently used by exploits InaccessiblePaths=/usr/bin/at InaccessiblePaths=/usr/bin/bash InaccessiblePaths=/usr/bin/sh InaccessiblePaths=/usr/bin/wget InaccessiblePaths=/usr/bin/curl InaccessiblePaths=/usr/bin/ssh InaccessiblePaths=/usr/bin/scp InaccessiblePaths=/usr/bin/perl User The first level of security, is do not run as root. No public directories No public /tmp or any such directory. If you do hack it, your in a sandbox. Limited Access Then we lock it down a bit for DoS reasons. We run it low priority, with limited RAM, so you don’t take down the entire machine. Too many details Then we lock up some other processes. Some of these are for example purposes. It’s really, really nice that systemd allows us to lock it down like this. Multiple users? Nope The original finger backend allows you to query any user that exists, and changes the output based on whether they are logged in. Not for me. I give you the same answer, no matter what. Redirects? Nope You could query users on another server. Connecting to example.com, you could ask it about joe@smith.com. Not on my server. Memory Leaks? Nope You could send very large user names. On my server, you get 512 bytes, then I disconnect. Since we don’t care about the data, we discard it. discard = os.read(0, 512) DoS? Nope You can’t keep the finger socket open for very long. If it doesn’t receive data very soon, it will disconnect. def timeout(signum, frame): raise Exception("Timed out") signal.signal(signal.SIGALRM, timeout) signal.alarm(3) DDoS? Yep DDoS is always a “yep”. More details I’d give you more details, but here aren’t. We timeout after 3 seconds, we read a maximum of 512 bytes of data, and we always give the exact same response. posted at 09:30 · Web · web social nextcloud Feb 22, 2024 why rust won’t let you borrow twice Berend De Schouwer What Question Am I Answering? A question came up in a coding class, about why in Rust there’s a borrow, and there’s a an error when borrowing. You may have seen it like (copied from the rust book): error[E0499]: cannot borrow `s` as mutable more than once at a time --> src/main.rs:5:14 | 4 | let r1 = &mut s; | ------ first mutable borrow occurs here 5 | let r2 = &mut s; | ^^^^^^ second mutable borrow occurs here 6 | 7 | println!("{}, {}", r1, r2); | -- first borrow later used here For more information about this error, try `rustc --explain E0499`. error: could not compile `ownership` due to previous error The Rust book does explain what borrow is, and the theory behind why it’s allowed and not allowed. I’m showing the risks practical example. How Am I Answering It? With as little computer theory as possible. No assembler No diagrams No multiple languages short, simple examples No theory until after the bug is shown Caveat The answer is in C. It’s in C because Rust won’t let me break borrow, not even in an unsafe {} block. It’s all in plain C, though. No assembler, and no C++. As straightforward C as I can make it, specifically to allow for plain examples. As few as possible shortcuts are taken. I’m not trying to write a C-programmer’s C code. I’m trying to write an example that shows the problem, for non-C programmers. It should not be difficult to follow coming from Rust. Inspiration The inspiration comes from the C max() function, which is actually a macro. That is great for explaining the differences between functions and macros, and it’s great for explaining pass-by-code instead of value or reference. It’s also great for explaining surprises. First Examples First, we give example code for pass-by-value and pass-by-reference. Pass-by-reference is also called borrow in Rust. The examples are simple, and we do not yet discuss the difference. For now, it’s simply about C syntax. #include <stdio.h> int double_by_value(int x) { x = x * 2; return x; } int double_by_reference(int *x) { *x = *x * 2; return *x; } int main() { int x; x = 5; printf("double_by_value(x) = %d\n", double_by_value(x)); /* double_by_value(x) = 10 */ x = 5; printf("double_by_reference(x) = %d\n", double_by_reference(&x)); /* double_by_reference(x) = 10 */ return 0; } The code is almost identical. You can see the syntax difference, but no functional difference yet. The logic is the same, the input is the same, and the output is the same. Extra variables Let’s make it a tiny bit more challenging, and add a second variable. #include <stdio.h> int add_by_value(int x, int y) { x = x + y; return x; } int add_by_reference(int *x, int *y) { *x = *x + *y; return *x; } int main() { int x, y; x = y = 5; printf("add_by_value(x, y) = %d\n", add_by_value(x, y)); /* add_by_value(x, y) = 10 */ x = y = 5; printf("add_by_reference(x, y) = %d\n", add_by_reference(&x, &y)); /* add_by_reference(x, y) = 10 */ return 0; } The logic is the same, the input is the same, and the output is the same. Put it together Add the two and we get… #include <stdio.h> /* Previous example code here */ int add_and_double_by_value(int x, int y) { x = double_by_value(x); y = double_by_value(y); x = x + y; return x; } int add_and_double_by_reference(int *x, int *y) { *x = double_by_reference(x); *y = double_by_reference(y); *x = *x + *y; return *x; } int main() { int x, y; x = y = 5; printf("add_and_double_by_value(x, y) = %d\n", add_and_double_by_value(x, y)); /* add_and_double_by_value(x, y) = 20 */ x = y = 5; printf("add_and_double_by_reference(x, y) = %d\n", add_and_double_by_reference(&x, &y)); /* add_and_double_by_reference(x, y) = 20 */ return 0; } No surprised yet. Both examples print the same, correct answer. The logic is the same, the input is the same, and the output is the same. Surprise Now let’s make it simpler… #include <stdio.h> /* Previous example code here */ int main() { int x; x = 5; printf("add_and_double_by_value(x, x) = %d\n", add_and_double_by_value(x, x)); /* add_and_double_by_value(x, x) = 20 */ x = 5; printf("add_and_double_by_reference(x, x) = %d\n", add_and_double_by_reference(&x, &x)); /* add_and_double_by_reference(x, x) = 40 */ return 0; } The logic is the same, the input is the same, but the output is different. Why is it 40? Double borrow So what happened? Why did removing a variable, y, result in the “wrong” answer? Pass by value copies the value, and passes the value, not the variable. The original is never modified. Pass by reference (or borrow), passes a reference, not a copy. The original is modified. Look at int add_and_double_by_reference(int *x, int *y) { *x = double_by_reference(x); *y = double_by_reference(y); *x = *x + *y; return *x; } When x is doubled, at *x = double_by_reference(x);, and *x is also *y, y is also doubled. X is now 10, as expected, but y is also 10. Then y is doubled. And since *y is _also *x, x is doubled again. Now both variables are 20. Tada! This is (one of the reasons) why Rust won’t let you borrow the same variable twice. What do you do instead? Borrow once, or .clone() /* copy */. Bonus time So why have pass by reference or borrow at all? It’s faster, when the data is large. It’s faster for streamed data. It’s not possible in assembler pass complicated variables by value. Manual copies must be made, and the language you use might not implement implicit copies. Extra Points For extra points, do the same with threads. posted at 09:30 · Rust · rust programming c Jan 25, 2023 garmin connect doesn’t connect firmware to users Berend De Schouwer Who Is This Post For? Anyone who has tried to connect a Garmin smart device to the cloud with Garmin Connect. Anyone who has tried to connect any smart device that has no keyboard to any cloud anywhere. What Device A Garmin Index S2 scale. This is a human weight scale that can upload your weight to the cloud. The connects to the cloud directly via wifi. It can connect to a phone via bluetooth, but only for wifi configuration. Everything after that goes via wifi. Even small things like 12/24 hour clock configuration go via wifi. The scale has no keyboard, and a limited screen, so configuration is via a phone. Problem Experienced The scale does not send any weight data to the cloud. The scale does not receive any configuration changes from the cloud. The scale does display weight data, and a wifi signal strength icon. The scale does not display any errors. There is no indication that anything is wrong. Garmin Connect App on the phone does not display any errors. It claims everything is working. More About Garmin Connect Garmin Connect means two different things: A Phone App, used to configure various devices A Web App, used to view the data More About Errors Not Displayed On the Scale The scale doesn’t show wifi errors. It can display some text, like your name, but doesn’t even show E123, or any other error code. The user believes it works. The scale does show a wifi signal strength icon (triangle, 4 lines for strength), and displays a sync icon (twirling circle). Neither icon changes to an error (red X, for example.) The user is lead to believe it works. On the Phone The phone does not show any errors, wifi or otherwise. The phone does display setup complete. The phone app let’s you test wifi. It’s a bit complicated, but you can. It will then display Wifi Connected OK. Suspected Error Eventually, after re-configuring hundreds of times, I suspect an network error. I suspect the scale is trying to connect to garmin.com on a non-standard port. It’s not that, but it gives… The Only Error Displayed Hours later, I finally look on the router. I’m looking at the network traffic, and see: 4-Way handshake failed for ifindex: 3, reason: 15 KEY_SEQ not returned in GET_KEY reply So I know wifi isn’t working. The WPA2 handshake is not completing even though the phone app thinks the scale’s wifi is OK. Non-specified Requirements Bluetooth or Wifi? Bluetooth and Wifi When you configure the scale, you should eventually see a message on Garmin Connect on the phone that sync completed, along with a green icon. If you do not see this, the scale is on a different wifi network than the phone. Why would that be? In my case the phone is on a 5G network (because it can) and the scale is on a 2.4G network (because it can’t) Until you put the phone on the same network as the scale, the last message you will see is: configuration completed, not sync completed, and this means that the scale is not working. At this point, the scale isn’t syncing to the Garmin cloud. It’s just synced it’s configuration with the phone. WPA? WPA2? WPA1.5 The scale claims to support WPA2 on a 2.4G network. For a device was first announced in 2020, this is attrocious, but that’s true for a lot of these smart devices. Even then, it doesn’t work with the entire gamut of WPA2 protocols. In my case, I had to swap from iwd to wpa_supplicant to connect the scale. Questions for Garmin Why do you display that the wifi network tested OK when the 4-way handshake failed? Couldn’t you add a connections to https://test.garmin.com or something? Why do you not display a sync error message on either, but preferably both the scale and phone app? Why do you not check that the phone and scale wifi are the same? The setup fails silently otherwise. Why do you display configuration complete, when it isn’t yet complete? posted at 09:00 · firmware · garmin firmware wifi Dec 27, 2022 rust async on single vs. multi-core machines Berend De Schouwer Who Is This Post For? Rust programmers tracking down strange behaviours that doesn’t always show up in debuggers or tracers What Was Rust Used For? I wanted to connect browsers to terminal programs. Think running a terminal in a browser like hterm or ajaxterm. One side of the websocket runs a program that may at any time send data. In between there are pauses that stretch from milliseconds to hours. The other side is the same. This is a perfect fit for asynchronous programming. It’s also a candidate for memory leaks over time. Both problems were tackled using Rust. Problem Experienced Sometimes the Rust program would stop. The program would still run in the background, running epoll(7), indicating that an async wait was running. The program would not crash, and would not run away on the CPU. The last statement executed: debug!("This runs"); Err("This does not run!") } Which is strange, to say the least. This would only happen on single-core machines. On machines with two or more cores, it would run fine. This would happen on multiple targets architectures, multiple OS-es. It would go into an infinite look on Err(“…”) on single core machines. More About the Program The program runs two parallel asynchronous threads, and waits for either thread to stop. It does that because the network side or the terminal side could stop and close the connection. So it basically runs: task::spawn(tty_to_websocket()); task::spawn(websocket_ty_tty()); try_join!(tty_to_websocket, websocket_to_tty); try_join! should wait for either task to stop with an error. I’ve setup both tasks to throw an error even on successful completion. This is because join! might wait for both to stop, and it’s possible for either side to stop without the other noticing. try_join! never completes, because Err() never completes, which is strange. What Does Err() Do? Err() ends the function. In Rust that also runs the destructors, like an object-oriented program might. Lets say you have a function like: fn error() -> Result<> { let number: int = 2; Err("error"); } When Err() runs, Rust de-allocates number. The memory is returned. For an int this is simple, but it can be more complicated. What Is A TTY? A TTY, for a program, is a file descriptor on a character device. It’s a bi-directional stream of data, typically keyboard in and text out. The C API to use one uses a file descriptor. One way to get such a file descriptor is forkpty(3), which has a Rust crate. Most Rust code want a std::fs::File, not a raw file descriptor, so it needs to be converted: unsafe { let rust_fd = std::os::from_raw_fd(c_fd); } The first bell is unsafe {}. The code is indeed unsafe because we’re working with a raw file descriptor. The second bell is in the documentation for from_raw_fd. The documentation is in bold in the original: This function consumes ownership of the specified file descriptor. The returned object will take responsibility for closing it when the object goes out of scope. Where Is The Bug? The bug happens because both tasks need a std::fs::File. One to read the TTY, and one to write to it. Both tasks consume ownership, and both tasks take responsibility for closing it. Both destroy the rust_fd and hence close the c_fd, when the tasks run Err(). Expected Bug The expected bug is that the second task to close won’t be able to close. The second task should get EBADF (bad file descriptor). However, this is not the bug experienced. Experienced Bug The experienced bug is that on single core machines the program just stops, and keeps calling epoll(), which is something Rust does at a low level for async functions. This makes it harder to debug, since there is no panic!, no crash. Real Bug The real bug is that on machines with two or more cores, the program continues fine. It should not continue. On two or more cores, it should behave the same as on single core machines. Solution The solution is to skip running the destructor. When it’s just a file descriptor, it can be enough to run mem::forget(rust_fd); Now we have stopped the crash. We need to take responsibilty and run try_join!(tty_to_websocket, websocket_to_tty); close(c_fd); to prevent leaking file descriptors. If you wrap the fd in a buffer, don’t forget to de-allocate the buffer by running: let buffer = reader.buffer(); drop(buffer); to prevent a memory leak. posted at 09:00 · rust · rust async Next → Page 1 of 2 Powered by Pelican · Blue Penguin Theme · Atom Feed

Time	Sender	Recipient	Sequence	Length	Acknowledgement
00:01	Scale	Server	0	1500	0
00:02	Server	Scale	0	0	1500
00:03	Scale	Server	1500	1500	0
00:04	Server	Scale	0	0	3000
00:05	Scale	Server	3000	1500	0
00:06	Server	Scale	0	0	4500
00:07	Scale	Server	4500	500	0
00:08	Server	Scale	0	0	5000