A deep dive on firmware bugs that prevent Garmin Index S2
scales from connecting to encrypted Wifi networks. I describe
some of the problems, a solution of sorts if you’re a network
administrator, and a guess as to the root cause. I do not,
in this article, reverse engineer the firmware.
Problem Description
Garmin Index S2 scales are notorious for not connecting reliable to
a number of Wifi Networks. For example, read
Garmin Forums Quite often the scale won’t connect, or will connect and not sync,
and it’s display is not clear, and not well documented to find the fault.
For other frustrations, you can read my previous blog post
Garmin Index Scale Firmware Problems
Official requirements
The official requirements are:
- 2.4 GHz (no 5.0 or 6.0)
- 802.11 ac, b, g or n (no 802.1x)
- Channels 1-11 only
- No hidden SSIDs
- Security: Unencrypted, WPA and WPA2
- Passwords must be at least 8 characters
Some of these requirements are simply due to the age and power of the
embedded controller. It was never going to support 5.0 GHz, for instance.
What we (customers) want
Compliant Wifi
- Encrypted, at least WPA2
- Channels 1-13
A Wifi access point can typically be configured for a specific channel, or
for all channels. No Wifi access points allow you to specify a channel range.
If you are outside the US, this means:
- Exactly one channel, or
- Channels 1-13.
So you are reduced to running one channel. Luckily the scale does actually
connect on channels 1-13. It’s doubtful the chip would be certified in
the countries it’s sold, if it doesn’t. So we will ignore the first problem.
What worked yesterday, to work today
The scale frequently gets stuck, and the usual support response is to
reset the scale.
Resetting the scale is difficult
- It requires tapping a button on the bottom, and simultaneously
viewing the top display.
- Testing requires putting significant weight on top of the scale,
but your finger is still tapping the bottom.
Resetting the scale is unnecessary
Frequently the scale is actually still working. The problem is that
the display isn’t communicating to you, the user, what’s happening.
Quite often it’s busy, and you should simply wait. It does this by
flashing an hourglass —- ⌛ —- and then switching the display
off. To the average user, this looks like the scale fails to switch on.
The correct thing to do is wait 5 minutes.
Updating Garmin Documentation.
Lets first update the documentation. Lets create a useful Wifi manual
for the Garmin Index S2 Wifi connection status.
There are three icons:
Wifi connecting 🛜
While 🛜 blinks, the scale is connecting to Wifi. If it stops blinking,
it has connected to Wifi, and your WPA2 password works.
As soon as this happens, you no longer need to reset your scale.
Data Syncing 🔁
While data is syncing, 🔁 is animating. At this point, the scale is
talking over the network to Garmin servers.
Under certain conditions this can take a long time, and the display
will switch off. It will still be syncing in the background, though.
Note: If the display switches off in this state, it’s power-saving
the display. It has not switched off. If you power on the scale,
you will see an hourglass ⌛. Wait 5 minutes. Be patient.
Done ✅
It’s done.
Actual Syncing Problems
If the data never syncs, read on.
If the data never syncs, you may have:
- Firewall issues (not if you didn’t have them yesterday)
- ISP issues (ping connect.garmin.com)
- Hit a firmware or controller bug (the rest of this blog post)
Note: At this point the Wifi is connected. The scale found
your Wifi, your SSID, and has negotiated encryption.
Firewall Issues
This is simple.
If you don’t know what a firewall is, you didn’t break it.
If you haven’t modified your router settings —- if your ISP allows it —-
you didn’t break it.
If you have edited your firewall, roll back, try again.
ISP Issues
Do the normal tests. Connect to any other site
and check that Garmin is up.
If these work, your ISP is probably not down.
Scale Sync Network Activity
How does the scale sync with Garmin Connect?
I’m glad you asked.
In order, the scale uses the following protocols.
- DHCP
- DNS
- NTP
- HTTP
- HTTPS
These steps are fairly normal and expected. What is so unexpected is
that step 5 fails when all the other steps work, and how it fails.
DHCP
This is how the scale gets an IP, and part of the Wifi negotiation.
It’s fairly standard.
The most notable is that the client identifier is GarminIntern
and the host name is WINC-00-00.
DNS
It then does a DNS lookup, to do an NTP sync. It looks up two machines:
- time.google.com
- time.garmin.com
This is notable because time.garmin.com does not exist.
The scale will do more DNS lookups as we go along. They tend to work fine.
NTP
The scale syncs it’s clock. This, again, is normal. It’s necessary because
it will later use HTTPS, and that requires a valid clock.
time.garmin.com doesn’t exist, but it doesn’t stop the clock
from syncing. The scale also does normal NTP on time.google.com
and another NTP call on clock.garmin.com on port 4123.
HTTP
The scale proceeds to send POST /OBN/OBNServlet to
gold.garmin.com. This seems to be mainly to get a Cloudflare
response, for example CF-RAY.
This doesn’t usually fail, but the errors start here. The reason it
doesn’t fail outright is that it will retry, and eventually a retry
will work before the clock runs out.
HTTPS
Now the scale starts sending data to:
- services.garmin.com
- api.gcs.garmin.com
- connectapi.garmin.com
- omt.garmin.com
At this point the errors accumulate, and eventually the clock does run
out. The errors slow down the connection to the point where the scale
fails to send it’s data inside the 5 minute timeout.
Accumulated Problems
So what exactly fails? Once data flows, the scale fails to acknowledge
the server’s TCP ACK packets about 80% of the time. If too many packets
are missed, the server closes the connection, and the scale tries again.
Once the scale retries too many times, it gives up.
Since this happens ±80% of the time, and multiple connections are made,
the scale fails very often. Every now and then it works.
Problem Details
TCP 101
TCP was created to transfer data without having the application worry
about reliability. Data gets chopped up, usually in ±1500
bytes. If it needs to get chopped up further, the
OS gets notified.
TCP also takes care of putting the data back together again. This can
be more difficult than just Packet 1 + Packet 2.
- Packet 2 can arrive before Packet 1
- Packet 2 can get lost, requiring retransmission
Let’s look at this in a bit more detail. Let’s say the scale wants
to send 5000 bytes of data. It gets chopped up into 1500 bytes.
Time |
Sender |
Recipient |
Sequence |
Length |
Acknowledgement |
00:01 |
Scale |
Server |
0 |
1500 |
0 |
00:02 |
Server |
Scale |
0 |
0 |
1500 |
00:03 |
Scale |
Server |
1500 |
1500 |
0 |
00:04 |
Server |
Scale |
0 |
0 |
3000 |
00:05 |
Scale |
Server |
3000 |
1500 |
0 |
00:06 |
Server |
Scale |
0 |
0 |
4500 |
00:07 |
Scale |
Server |
4500 |
500 |
0 |
00:08 |
Server |
Scale |
0 |
0 |
5000 |
As you can see, the acknowledgements tell the scale how much data
the server has received, and from where to continue.
TCP 102
Of course, you can’t just start sending data. You have to tell the
server you want to send data, and the server must accept, so it goes
something like:
- Handshake
- SYN client → server
- SYN/ACK server → client
- ACK client → server
- Data, as above.
- ACK client → server
- ACK server → client
- …
- Stop
- FIN/ACK client → server
- FIN/ACK server → client
- ACK client → server
What is Observed
The scale starts sending data, but the server’s packets aren’t received.
The scale then eventually resends packets. Something like:
Time |
Sender |
Recipient |
Sequence |
Length |
Acknowledgement |
00:01 |
Scale |
Server |
0 |
1500 |
0 |
00:02 |
Server |
Scale |
0 |
0 |
1500 |
00:03 |
Scale |
Server |
1500 |
1500 |
0 |
00:04 |
Server |
Scale |
0 |
0 |
3000 |
00:06 |
Server |
Scale |
0 |
0 |
3000 |
00:10 |
Server |
Scale |
0 |
0 |
3000 |
00:18 |
Server |
Scale |
0 |
0 |
3000 |
00:34 |
Server |
Scale |
0 |
0 |
3000 |
00:35 |
Scale |
Server |
3000 |
1500 |
0 |
00:36 |
Server |
Scale |
0 |
0 |
4500 |
Note: The time increases exponentially, as the server asks for more
data. If this happens too often, the scale times out, and no data is
sent.
This means the scale doesn’t receive the ACK packets from the server.
What Else is Dropped?
At the start, nothing. DHCP, DNS and NTP all work. Once the scale
starts using HTTP (over TCP), packet drops start. This is usually about
15 seconds after the scale connects to the Wifi network.
However, once the packet drops start, other packets are dropped to.
- Ping
- ICMP packets to check if the scale is up.
- ARP
- Ethernet packets to match an IP to a MAC addressees.
ARP being dropped is very interesting. When ARP goes, everything
stops until ARP is answered. This would indicate that Wifi encryption
updates might get dropped too.
TCP 201
The retransmitted acknowledgements need not come from the actual
server, although they look like they do. They can and often do come
from a router or firewall in the middle as a performance optimization.
Workarounds that Don’t Work
Different Wifi
I am in the lucky position to try multiple Wifi routers, so I did.
I tried 3 different ones.
All Alone
Because I tried multiple routers, the scale was the only device on the
network. There was no traffic congestion, and no competition.
This did not help.
Different Encryption on Wifi
Encrypted Wifi can be WPA or WPA2, and support different
authentication methods and encryption standards. WPA uses TKIP,
WPA2 uses CCMP. Nope, this did not help.
Note: Different router hardware might not let you set some
protocols, since it may be handled in the Wifi network hardware.
Quality of Service
Boost Garmin IP networks. Boost the scale. Boost empty ACKs and
retransmitted ACKs. Nope.
Different Channel on Wifi
The Garmin Index S2 Scale officially only supports channels 1-11.
Since I’m not in the US, Wifi equipment usually uses channels 1-13, and
uses different frequency blocks.
Note: To get certified for sale in non-US countries (like the EU),
the scale would be tested by the local regulator, and must pass local
wireless regulations. Therefore I do not believe the official
documentation —- the scale would be illegal for sale.
However, I did try hard coding channels 1, 6, 11, and different country
regulations on the Wifi router. This did not make a difference.
Different Garmin
The various Garmin services resolve to multiple IP addresses. This is
likely for load balancing.
Modify the DNS on the router to supply specific ones. This didn’t help.
Note: This might not make as much of a difference as you think,
since they are behind Cloudflare, and the CDN will intercept and
reroute these connections.
Computer Captcha
Since it’s behind Cloudflare, the scale might be hitting Cloudflare’s
captcha protection. Maybe HTTP and HTTPS don’t work unless the scale
can prove it’s human.
Network dumps prove this is not what’s happening. They do reveal
Garmin’s ID and other Cloudflare details, that I will not post in
this blog.
Adjusting the MTU
A common problem in network stacks is that they don’t notice when the
MTU needs adjusting. This can and does happen when the transport changes,
for example when it changes from Wifi to Cable.
Yes, I did try adjusting this, both larger and smaller.
And I did check for DNF and Fragmentation Needed.
Hardcode ARP
This is easy, and it removes some network packets from the equation.
It does not fix the problem, though.
Workarounds that Work
No Encryption on Wifi
This works perfectly, but it’s not acceptable in my household.
No encryption does result in no retransmissions, though.
Hack the RTO
The retransmissions are also called TCP RTO (retransmission timeout).
We can reduce this, and increase the number of RTO packets that the router
sends, to greatly increase our chances of sending a retransmission at
the moment the scale is listening again.
This does work, but it requires two things:
- A custom kernel
- Enable custom kernel
- A proxy
Custom Kernel
We need a custom kernel because we’re going to move these values out
of the TCP specification. Since this network isn’t used for anything
else, let’s do it.
There are a number of queries on mailing lists for this. However,
there aren’t a great many final solutions. As I said, it’s outside spec.
I’m including a diff here, in case other people want it. This
- Increases the number of retries, so it doesn’t stop before the
scale times out.
- Decreases the highest timeout value, so we will retry much quicker.
- Sets every connection to thin, meaning that the timeouts increase
linearly instead of exponentially.
This diff is for Linux 6.12, and should work on Debian Stable.
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b3917af30..cce7a5350 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -90,14 +90,14 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
#define TCP_URG_NOTYET 0x0200
#define TCP_URG_READ 0x0400
-#define TCP_RETR1 3 /*
+#define TCP_RETR1 8 /*
* This is how many retries it does before it
* tries to figure out if the gateway is
* down. Minimal RFC value is 3; it corresponds
* to ~3sec-8min depending on RTO.
*/
-#define TCP_RETR2 15 /*
+#define TCP_RETR2 30 /*
* This should take at least
* 90 minutes to time out.
* RFC1122 says that the limit is 100 sec.
@@ -138,8 +138,8 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
#define TCP_DELACK_MIN 4U
#define TCP_ATO_MIN 4U
#endif
-#define TCP_RTO_MAX ((unsigned)(120*HZ))
-#define TCP_RTO_MIN ((unsigned)(HZ/5))
+#define TCP_RTO_MAX ((unsigned)(5*HZ))
+#define TCP_RTO_MIN ((unsigned)(HZ/10))
#define TCP_TIMEOUT_MIN (2U) /* Min timeout for TCP timers in jiffies */
#define TCP_TIMEOUT_MIN_US (2*USEC_PER_MSEC) /* Min TCP timeout in microsecs */
@@ -226,7 +226,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
#define TCP_NAGLE_PUSH 4 /* Cork is overridden for already queued data */
/* TCP thin-stream limits */
-#define TCP_THIN_LINEAR_RETRIES 6 /* After 6 linear retries, do exp. backoff */
+#define TCP_THIN_LINEAR_RETRIES 60 /* After 6 linear retries, do exp. backoff */
/* TCP initial congestion window as per rfc6928 */
#define TCP_INIT_CWND 10
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index b65cd417b..e5656e919 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -639,7 +639,7 @@ void tcp_retransmit_timer(struct sock *sk)
*/
if (sk->sk_state == TCP_ESTABLISHED &&
(tp->thin_lto || READ_ONCE(net->ipv4.sysctl_tcp_thin_linear_timeouts)) &&
- tcp_stream_is_thin(tp) &&
+ //tcp_stream_is_thin(tp) &&
icsk->icsk_retransmits <= TCP_THIN_LINEAR_RETRIES) {
icsk->icsk_backoff = 0;
icsk->icsk_rto = clamp(__tcp_set_rto(tp),
Enable custom kernel
To get the full effect, you may need to enable thin streams on some
Linux distributions.
echo 1 > /proc/sys/net/ipv4/tcp_thin_linear_timeouts
Add it to /etc/sysctl.conf or it’s subdirectories
Proxy
We only control TCP RTO on local sockets and some NAT-ed connections, so
we need to setup a proxy. I tried this first with HTTP, using Squid, and
it worked. However I also needed to do HTTPS, and then encryption
certificates make it hard work.
However, I’m not looking inside the packets. I don’t need to decrypt them,
just forward them. I’m just interested in modifying TCP RTO, so I can
treat HTTPS like any other TCP Socket.
The easiest way to do this is with a systemd socket or xinetd redirect
and NAT. You will need one per Garmin IP address.
Example xinetd config:
service garmin-gold_garmin_com
{
type = UNLISTED
socket_type = stream
protocol = tcp
wait = no
user = nobody
bind = 0.0.0.0
port = 3129
only_from = **my_network**
redirect = gold.garmin.com 80
}
and a nat rule:
table inet filter {
chain prerouting {
type nat hook prerouting priority -100;
policy accept
ip saddr **scaleip**
ip daddr { gold.garmin.com } tcp dport { 80 }
dnat to 10.43.0.1:3129
}
}
Probable Cause
This only happens when the network is encrypted. The scale is not
recording the server’s data, and the scale is also not sending errors back.
This could happen if:
- the controller is too slow to decrypt
- the firmware decides the packets are corrupt
- an interrupt goes missing
- not enough RAM, hence forced to drop
Which exactly it is is unknown, and where you draw the line between
controller and firmware can be a grey area. For example, in most
computers, the network card will perform some calculations on the
packets instead of the OS. This is called hardware off-loading,
but in practice it happens in firmware.