anhedonic

dynamic soap bubbles

Apr 19, 2025

External Storage Mistakes

Berend De Schouwer

A description of a firmware bug in external USB storage that cause disk error reports, and the way to avoid them.

Problem Description

When you connect an external USB drive you may see:

sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
sd 0:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current] 
sd 0:0:0:0: [sda] tag#0 Add. Sense: Invalid command operation code
sd 0:0:0:0: [sda] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 00 00 22 00 00 00 06 00 00
critical target error, dev sda, sector 34 op 0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0
critical target error, dev sda, sector 40 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 2

If the following are true, you have the same problem:

  • You see DISCARD or UNMAP
  • You see Write same(16), and
  • The external storage is a spinning disk
  • You delete a lot of data, or reformat the disk

Explanation

These commands are all the same as what is also known as trim, which is used to tell an SSD disk to mark a data area as re-usable. Spinning disks do not support trim, and so reject it.

The kernel is attempting to run the command because the USB enclosure told the kernel that it supports the commands writesame16 or unmap. The error is reported because that command fails.

Root Cause

The root cause is that the USB enclosure supports the command in firmware, and the spinning disk does not. The USB enclosure fails to negotiate this command with the harddrive before negotiating it with the OS.

This happens when manufacturers use the same USB chips for SSD and spinning disk drives on the cheap.

Solution

The solution is to tell the kernel that this does not work, for this device, by using a udev rule, eg.

/etc/udev/rules.d/99-cheap-disk.rules
ACTION=="add", SUBSYSTEM=="scsi_disk", SUBSYSTEMS=="scsi",
ATTRS{vendor}=="WD", ATTRS{model}=="My Passport *",
OPTIONS="log_level=debug",
PROGRAM="/usr/bin/logger -t udev/99-cheap-disk Found cheap disk",
ATTR{provisioning_mode}="disabled"

In my case, the bad harddrive was a “Western Digital My Passport 2626“, with revision 1034.

Thanks

For the exact same problem and solution for SAN/NAS, see: Chris Hofstaedtler

Feb 09, 2025

Garmin Index S2 Scale and Encryption

Berend De Schouwer

A deep dive on firmware bugs that prevent Garmin Index S2 scales from connecting to encrypted Wifi networks. I describe some of the problems, a solution of sorts if you’re a network administrator, and a guess as to the root cause. I do not, in this article, reverse engineer the firmware.

Problem Description

Garmin Index S2 scales are notorious for not connecting reliable to a number of Wifi Networks. For example, read Garmin Forums Quite often the scale won’t connect, or will connect and not sync, and it’s display is not clear, and not well documented to find the fault.

For other frustrations, you can read my previous blog post Garmin Index Scale Firmware Problems

Official requirements

The official requirements are:

  • 2.4 GHz (no 5.0 or 6.0)
  • 802.11 ac, b, g or n (no 802.1x)
  • Channels 1-11 only
  • No hidden SSIDs
  • Security: Unencrypted, WPA and WPA2
  • Passwords must be at least 8 characters

Some of these requirements are simply due to the age and power of the embedded controller. It was never going to support 5.0 GHz, for instance.

What we (customers) want

Compliant Wifi

  • Encrypted, at least WPA2
  • Channels 1-13

A Wifi access point can typically be configured for a specific channel, or for all channels. No Wifi access points allow you to specify a channel range.

If you are outside the US, this means:

  • Exactly one channel, or
  • Channels 1-13.

So you are reduced to running one channel. Luckily the scale does actually connect on channels 1-13. It’s doubtful the chip would be certified in the countries it’s sold, if it doesn’t. So we will ignore the first problem.

What worked yesterday, to work today

The scale frequently gets stuck, and the usual support response is to reset the scale.

Resetting the scale is difficult

  • It requires tapping a button on the bottom, and simultaneously viewing the top display.
  • Testing requires putting significant weight on top of the scale, but your finger is still tapping the bottom.

Resetting the scale is unnecessary

Frequently the scale is actually still working. The problem is that the display isn’t communicating to you, the user, what’s happening.

Quite often it’s busy, and you should simply wait. It does this by flashing an hourglass —- ⌛ —- and then switching the display off. To the average user, this looks like the scale fails to switch on.

The correct thing to do is wait 5 minutes.

Updating Garmin Documentation.

Lets first update the documentation. Lets create a useful Wifi manual for the Garmin Index S2 Wifi connection status.

There are three icons:

  • Wifi 🛜
  • Sync 🔁
  • Done ✅

Wifi connecting 🛜

While 🛜 blinks, the scale is connecting to Wifi. If it stops blinking, it has connected to Wifi, and your WPA2 password works.

As soon as this happens, you no longer need to reset your scale.

Data Syncing 🔁

While data is syncing, 🔁 is animating. At this point, the scale is talking over the network to Garmin servers.

Under certain conditions this can take a long time, and the display will switch off. It will still be syncing in the background, though.

Note: If the display switches off in this state, it’s power-saving the display. It has not switched off. If you power on the scale, you will see an hourglass ⌛. Wait 5 minutes. Be patient.

Done ✅

It’s done.

Actual Syncing Problems

If the data never syncs, read on.

If the data never syncs, you may have:

  • Firewall issues (not if you didn’t have them yesterday)
  • ISP issues (ping connect.garmin.com)
  • Hit a firmware or controller bug (the rest of this blog post)

Note: At this point the Wifi is connected. The scale found your Wifi, your SSID, and has negotiated encryption.

Firewall Issues

This is simple.

If you don’t know what a firewall is, you didn’t break it.

If you haven’t modified your router settings —- if your ISP allows it —- you didn’t break it.

If you have edited your firewall, roll back, try again.

ISP Issues

Do the normal tests. Connect to any other site and check that Garmin is up.

If these work, your ISP is probably not down.

Scale Sync Network Activity

How does the scale sync with Garmin Connect? I’m glad you asked.

In order, the scale uses the following protocols.

  1. DHCP
  2. DNS
  3. NTP
  4. HTTP
  5. HTTPS

These steps are fairly normal and expected. What is so unexpected is that step 5 fails when all the other steps work, and how it fails.

DHCP

This is how the scale gets an IP, and part of the Wifi negotiation. It’s fairly standard.

The most notable is that the client identifier is GarminIntern and the host name is WINC-00-00.

DNS

It then does a DNS lookup, to do an NTP sync. It looks up two machines:

  • time.google.com
  • time.garmin.com

This is notable because time.garmin.com does not exist.

The scale will do more DNS lookups as we go along. They tend to work fine.

NTP

The scale syncs it’s clock. This, again, is normal. It’s necessary because it will later use HTTPS, and that requires a valid clock.

time.garmin.com doesn’t exist, but it doesn’t stop the clock from syncing. The scale also does normal NTP on time.google.com and another NTP call on clock.garmin.com on port 4123.

HTTP

The scale proceeds to send POST /OBN/OBNServlet to gold.garmin.com. This seems to be mainly to get a Cloudflare response, for example CF-RAY.

This doesn’t usually fail, but the errors start here. The reason it doesn’t fail outright is that it will retry, and eventually a retry will work before the clock runs out.

HTTPS

Now the scale starts sending data to:

  • services.garmin.com
  • api.gcs.garmin.com
  • connectapi.garmin.com
  • omt.garmin.com

At this point the errors accumulate, and eventually the clock does run out. The errors slow down the connection to the point where the scale fails to send it’s data inside the 5 minute timeout.

Accumulated Problems

So what exactly fails? Once data flows, the scale fails to acknowledge the server’s TCP ACK packets about 80% of the time. If too many packets are missed, the server closes the connection, and the scale tries again. Once the scale retries too many times, it gives up.

Since this happens ±80% of the time, and multiple connections are made, the scale fails very often. Every now and then it works.

Problem Details

TCP 101

TCP was created to transfer data without having the application worry about reliability. Data gets chopped up, usually in ±1500 bytes12. If it needs to get chopped up further, the OS gets notified.

TCP also takes care of putting the data back together again. This can be more difficult than just Packet 1 + Packet 2.

  • Packet 2 can arrive before Packet 1
  • Packet 2 can get lost, requiring retransmission

Let’s look at this in a bit more detail. Let’s say the scale wants to send 5000 bytes of data. It gets chopped up into 1500 bytes.

Time Sender Recipient Sequence Length Acknowledgement
00:01 Scale Server 0 1500 0
00:02 Server Scale 0 0 1500
00:03 Scale Server 1500 1500 0
00:04 Server Scale 0 0 3000
00:05 Scale Server 3000 1500 0
00:06 Server Scale 0 0 4500
00:07 Scale Server 4500 500 0
00:08 Server Scale 0 0 5000

As you can see, the acknowledgements tell the scale how much data the server has received, and from where to continue.34

TCP 102

Of course, you can’t just start sending data. You have to tell the server you want to send data, and the server must accept, so it goes something like:

  • Handshake
    • SYN client → server
    • SYN/ACK server → client
    • ACK client → server
  • Data, as above.
    • ACK client → server
    • ACK server → client
  • Stop5
    • FIN/ACK client → server
    • FIN/ACK server → client
    • ACK client → server

What is Observed

The scale starts sending data, but the server’s packets aren’t received. The scale then eventually resends packets. Something like:

Time Sender Recipient Sequence Length Acknowledgement
00:01 Scale Server 0 1500 0
00:02 Server Scale 0 0 1500
00:03 Scale Server 1500 1500 0
00:04 Server Scale 0 0 3000
00:06 Server Scale 0 0 3000
00:10 Server Scale 0 0 3000
00:18 Server Scale 0 0 3000
00:34 Server Scale 0 0 3000
00:35 Scale Server 3000 1500 0
00:36 Server Scale 0 0 4500

Note: The time increases exponentially, as the server asks for more data. If this happens too often, the scale times out, and no data is sent.6

This means the scale doesn’t receive the ACK packets from the server.

What Else is Dropped?

At the start, nothing. DHCP, DNS and NTP all work. Once the scale starts using HTTP (over TCP), packet drops start. This is usually about 15 seconds after the scale connects to the Wifi network.

However, once the packet drops start, other packets are dropped to.

Ping
ICMP packets to check if the scale is up.
ARP
Ethernet packets to match an IP to a MAC addressees.

ARP being dropped is very interesting. When ARP goes, everything stops until ARP is answered. This would indicate that Wifi encryption updates might get dropped too.

TCP 201

The retransmitted acknowledgements need not come from the actual server, although they look like they do. They can and often do come from a router or firewall in the middle as a performance optimization.

Workarounds that Don’t Work

Different Wifi

I am in the lucky position to try multiple Wifi routers, so I did. I tried 3 different ones.

All Alone

Because I tried multiple routers, the scale was the only device on the network. There was no traffic congestion, and no competition.

This did not help.

Different Encryption on Wifi

Encrypted Wifi can be WPA or WPA27, and support different authentication methods and encryption standards. WPA uses TKIP, WPA2 uses CCMP. Nope, this did not help.

Note: Different router hardware might not let you set some protocols, since it may be handled in the Wifi network hardware.

Quality of Service

Boost Garmin IP networks. Boost the scale. Boost empty ACKs and retransmitted ACKs. Nope.

Different Channel on Wifi

The Garmin Index S2 Scale officially only supports channels 1-11. Since I’m not in the US, Wifi equipment usually uses channels 1-13, and uses different frequency blocks.

Note: To get certified for sale in non-US countries (like the EU), the scale would be tested by the local regulator, and must pass local wireless regulations. Therefore I do not believe the official documentation —- the scale would be illegal for sale.

However, I did try hard coding channels 1, 6, 11, and different country regulations on the Wifi router. This did not make a difference.

Different Garmin

The various Garmin services resolve to multiple IP addresses. This is likely for load balancing.

Modify the DNS on the router to supply specific ones. This didn’t help.

Note: This might not make as much of a difference as you think, since they are behind Cloudflare, and the CDN will intercept and reroute these connections.

Computer Captcha

Since it’s behind Cloudflare, the scale might be hitting Cloudflare’s captcha protection. Maybe HTTP and HTTPS don’t work unless the scale can prove it’s human.

Network dumps prove this is not what’s happening. They do reveal Garmin’s ID and other Cloudflare details, that I will not post in this blog.

Adjusting the MTU

A common problem in network stacks is that they don’t notice when the MTU needs adjusting. This can and does happen when the transport changes, for example when it changes from Wifi to Cable.

Yes, I did try adjusting this, both larger and smaller.

And I did check for DNF and Fragmentation Needed.

Hardcode ARP

This is easy, and it removes some network packets from the equation. It does not fix the problem, though.

Workarounds that Work

No Encryption on Wifi

This works perfectly, but it’s not acceptable in my household.

No encryption does result in no retransmissions, though.

Hack the RTO

The retransmissions are also called TCP RTO (retransmission timeout). We can reduce this, and increase the number of RTO packets that the router sends, to greatly increase our chances of sending a retransmission at the moment the scale is listening again.

This does work, but it requires two things:

  1. A custom kernel
  2. Enable custom kernel
  3. A proxy

Custom Kernel

We need a custom kernel because we’re going to move these values out of the TCP specification. Since this network isn’t used for anything else, let’s do it.

There are a number of queries on mailing lists for this. However, there aren’t a great many final solutions. As I said, it’s outside spec.

I’m including a diff here, in case other people want it. This

  • Increases the number of retries, so it doesn’t stop before the scale times out.
  • Decreases the highest timeout value, so we will retry much quicker.
  • Sets every connection to thin, meaning that the timeouts increase linearly instead of exponentially.

This diff is for Linux 6.12, and should work on Debian Stable.

diff --git a/include/net/tcp.h b/include/net/tcp.h
index b3917af30..cce7a5350 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -90,14 +90,14 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 #define TCP_URG_NOTYET 0x0200
 #define TCP_URG_READ   0x0400

-#define TCP_RETR1  3   /*
+#define TCP_RETR1  8   /*
                 * This is how many retries it does before it
                 * tries to figure out if the gateway is
                 * down. Minimal RFC value is 3; it corresponds
                 * to ~3sec-8min depending on RTO.
                 */

-#define TCP_RETR2  15  /*
+#define TCP_RETR2  30  /*
                 * This should take at least
                 * 90 minutes to time out.
                 * RFC1122 says that the limit is 100 sec.
@@ -138,8 +138,8 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 #define TCP_DELACK_MIN 4U
 #define TCP_ATO_MIN    4U
 #endif
-#define TCP_RTO_MAX    ((unsigned)(120*HZ))
-#define TCP_RTO_MIN    ((unsigned)(HZ/5))
+#define TCP_RTO_MAX    ((unsigned)(5*HZ))
+#define TCP_RTO_MIN    ((unsigned)(HZ/10))
 #define TCP_TIMEOUT_MIN    (2U) /* Min timeout for TCP timers in jiffies */

 #define TCP_TIMEOUT_MIN_US (2*USEC_PER_MSEC) /* Min TCP timeout in microsecs */
@@ -226,7 +226,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 #define TCP_NAGLE_PUSH     4   /* Cork is overridden for already queued data */

 /* TCP thin-stream limits */
-#define TCP_THIN_LINEAR_RETRIES 6       /* After 6 linear retries, do exp. backoff */
+#define TCP_THIN_LINEAR_RETRIES 60      /* After 6 linear retries, do exp. backoff */

 /* TCP initial congestion window as per rfc6928 */
 #define TCP_INIT_CWND      10
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index b65cd417b..e5656e919 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -639,7 +639,7 @@ void tcp_retransmit_timer(struct sock *sk)
     */
    if (sk->sk_state == TCP_ESTABLISHED &&
        (tp->thin_lto || READ_ONCE(net->ipv4.sysctl_tcp_thin_linear_timeouts)) &&
-       tcp_stream_is_thin(tp) &&
+       //tcp_stream_is_thin(tp) &&
        icsk->icsk_retransmits <= TCP_THIN_LINEAR_RETRIES) {
        icsk->icsk_backoff = 0;
        icsk->icsk_rto = clamp(__tcp_set_rto(tp),

Enable custom kernel

To get the full effect, you may need to enable thin streams on some Linux distributions.

echo 1 > /proc/sys/net/ipv4/tcp_thin_linear_timeouts

Add it to /etc/sysctl.conf or it’s subdirectories

Proxy

We only control TCP RTO on local sockets and some NAT-ed connections, so we need to setup a proxy. I tried this first with HTTP, using Squid, and it worked. However I also needed to do HTTPS, and then encryption certificates make it hard work.

However, I’m not looking inside the packets. I don’t need to decrypt them, just forward them. I’m just interested in modifying TCP RTO, so I can treat HTTPS like any other TCP Socket.

The easiest way to do this is with a systemd socket or xinetd redirect and NAT. You will need one per Garmin IP address.

Example xinetd config:

service garmin-gold_garmin_com
{
    type = UNLISTED
    socket_type = stream
    protocol = tcp
    wait = no
    user = nobody
    bind = 0.0.0.0
    port = 3129
    only_from = **my_network**
    redirect = gold.garmin.com 80
}

and a nat rule:

table inet filter {
    chain prerouting {
        type nat hook prerouting priority -100;
        policy accept        
        ip saddr **scaleip**
            ip daddr { gold.garmin.com } tcp dport { 80 }
            dnat to 10.43.0.1:3129
    }
}

Probable Cause

This only happens when the network is encrypted. The scale is not recording the server’s data, and the scale is also not sending errors back.

This could happen if:

  • the controller is too slow to decrypt
  • the firmware decides the packets are corrupt
  • an interrupt goes missing
  • not enough RAM, hence forced to drop

Which exactly it is is unknown, and where you draw the line between controller and firmware can be a grey area. For example, in most computers, the network card will perform some calculations on the packets instead of the OS. This is called hardware off-loading, but in practice it happens in firmware.


  1. This can be smaller when data reaches a different carries, like a VPN

  2. New networks allow for much bigger Jumbo frames. 

  3. TCP will usually send a few packets in a row, for performance. 

  4. The sequence numbers and time are not realistic. 

  5. There are more types of packets, like PSH and RST, but it’s not necessary to describe this problem. 

  6. This time increase depends on a number of TCP network stack settings, and can be both set manually and auto-adjusted based on network conditions, so these timestamps are purely for illustration. 

  7. And now WPA3, but not for the scale. 

posted at 16:00  ·   ·  garmin  firmware  wifi  iot

Jan 25, 2023

garmin connect doesn’t connect firmware to users

Berend De Schouwer

Who Is This Post For?

Anyone who has tried to connect a Garmin smart device to the cloud with Garmin Connect.

Anyone who has tried to connect any smart device that has no keyboard to any cloud anywhere.

What Device

A Garmin Index S2 scale. This is a human weight scale that can upload your weight to the cloud.

The connects to the cloud directly via wifi. It can connect to a phone via bluetooth, but only for wifi configuration. Everything after that goes via wifi.

Even small things like 12/24 hour clock configuration go via wifi.

The scale has no keyboard, and a limited screen, so configuration is via a phone.

Problem Experienced

The scale does not send any weight data to the cloud. The scale does not receive any configuration changes from the cloud.

The scale does display weight data, and a wifi signal strength icon.

The scale does not display any errors. There is no indication that anything is wrong.

Garmin Connect App on the phone does not display any errors. It claims everything is working.

More About Garmin Connect

Garmin Connect means two different things:
  • A Phone App, used to configure various devices
  • A Web App, used to view the data

More About Errors Not Displayed

On the Scale

The scale doesn’t show wifi errors. It can display some text, like your name, but doesn’t even show E123, or any other error code. The user believes it works.

The scale does show a wifi signal strength icon (triangle, 4 lines for strength), and displays a sync icon (twirling circle).

Neither icon changes to an error (red X, for example.) The user is lead to believe it works.

On the Phone

The phone does not show any errors, wifi or otherwise.

The phone does display setup complete.

The phone app let’s you test wifi. It’s a bit complicated, but you can. It will then display Wifi Connected OK.

Suspected Error

Eventually, after re-configuring hundreds of times, I suspect an network error. I suspect the scale is trying to connect to garmin.com on a non-standard port.

It’s not that, but it gives…

The Only Error Displayed

Hours later, I finally look on the router. I’m looking at the network traffic, and see:

4-Way handshake failed for ifindex: 3, reason: 15
KEY_SEQ not returned in GET_KEY reply

So I know wifi isn’t working. The WPA2 handshake is not completing even though the phone app thinks the scale’s wifi is OK.

Non-specified Requirements

Bluetooth or Wifi? Bluetooth and Wifi

When you configure the scale, you should eventually see a message on Garmin Connect on the phone that sync completed, along with a green icon.

If you do not see this, the scale is on a different wifi network than the phone.

Why would that be? In my case the phone is on a 5G network (because it can) and the scale is on a 2.4G network (because it can’t)

Until you put the phone on the same network as the scale, the last message you will see is: configuration completed, not sync completed, and this means that the scale is not working.

At this point, the scale isn’t syncing to the Garmin cloud. It’s just synced it’s configuration with the phone.

WPA? WPA2? WPA1.5

The scale claims to support WPA2 on a 2.4G network.

For a device was first announced in 2020, this is attrocious, but that’s true for a lot of these smart devices.

Even then, it doesn’t work with the entire gamut of WPA2 protocols.

In my case, I had to swap from iwd to wpa_supplicant to connect the scale.

Questions for Garmin

  1. Why do you display that the wifi network tested OK when the 4-way handshake failed? Couldn’t you add a connections to https://test.garmin.com or something?
  2. Why do you not display a sync error message on either, but preferably both the scale and phone app?
  3. Why do you not check that the phone and scale wifi are the same? The setup fails silently otherwise.
  4. Why do you display configuration complete, when it isn’t yet complete?