macos tahoe / ecn / slow downloads

macos tahoe / ecn / slow downloads

  • Written by
    Walter Doekes
  • Published on

Recently, a customer reported an intermittent but frustrating issue: since upgrading to macOS 26 (Tahoe), downloads from our servers were occasionally crawling. Not always, and not for everyone. The culprit turned out to be a combination of Explicit Congestion Notification (ECN), NIC offloading limitations, and the way classic congestion control algorithms react to "well-intentioned" signals.

The macOS Tahoe ECN lottery

The first mystery was the intermittency. We noticed that some connections were fast, while others to the same server appeared throttled.

The customer had already done a ton of investigating for us — this was great:

  • Change: the issues started after upgrading their Macs to version 26 (Tahoe).
  • Toggle: they told us that disabling ECN using net.inet.tcp.ecn_initiate_out=0 restored full speed.
  • Debug: they set up a while loop which downloaded the same file every 15 seconds — and provided us with the timestamps when the downloads were slow.

Looking at the traffic with tcpdump revealed that Explicit Congestion Notification (ECN) was indeed the trigger for the slowness. It also revealed that ECN was not applied to all outgoing connections, only to about 5% of them.

Connections with ECN and connections without

When a connection was using ECN, it looked like this in tcpdump:

14:24:46.778055 IP 1.1.1.1.51939 > 2.2.2.2.443: Flags [SEW], seq 276182108, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 936789772 ecr 0,sackOK,eol], length 0
14:24:46.778088 IP 2.2.2.2.443 > 1.1.1.1.51939: Flags [S.E], seq 1390193927, ack 276182109, win 65160, options [mss 1460,sackOK,TS val 1827830359 ecr 936789772,nop,wscale 7], length 0
14:24:46.781656 IP 1.1.1.1.51939 > 2.2.2.2.443: Flags [.], ack 1, win 2059, options [nop,nop,TS val 936789775 ecr 1827830359], length 0
14:24:46.782139 IP 1.1.1.1.51939 > 2.2.2.2.443: Flags [P.], seq 1:334, ack 1, win 2059, options [nop,nop,TS val 936789775 ecr 1827830359], length 333
14:24:46.782155 IP 2.2.2.2.443 > 1.1.1.1.51939: Flags [.E], ack 334, win 507, options [nop,nop,TS val 1827830363 ecr 936789775], length 0
14:24:46.784860 IP 2.2.2.2.443 > 1.1.1.1.51939: Flags [P.E], seq 1:3168, ack 334, win 507, options [nop,nop,TS val 1827830365 ecr 936789775], length 3167
...
14:24:50.112764 IP 2.2.2.2.443 > 1.1.1.1.51939: Flags [.EW], seq 3071982:3073430, ack 615, win 506, options [nop,nop,TS val 1827833693 ecr 936793106], length 1448
14:24:50.112782 IP 1.1.1.1.51939 > 2.2.2.2.443: Flags [.E], ack 3071982, win 2004, options [nop,nop,TS val 936793106 ecr 1827833691], length 0
14:24:50.112793 IP 2.2.2.2.443 > 1.1.1.1.51939: Flags [.E], seq 3073430:3074878, ack 615, win 506, options [nop,nop,TS val 1827833693 ecr 936793106], length 1448
14:24:50.115426 IP 1.1.1.1.51939 > 2.2.2.2.443: Flags [.E], ack 3073430, win 2026, options [nop,nop,TS val 936793109 ecr 1827833693], length 0

The E and W in the initial packet refer to the ECN-Echo (ECE) and Congestion Window Reduced (CWR) bits. In the handshake, these flags are used to negotiate whether both ends support the feature.

When a connection was not using ECN, we saw this:

14:25:05.190884 IP 1.1.1.1.52002 > 2.2.2.2.443: Flags [S], seq 1706341773, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 1632358482 ecr 0,sackOK,eol], length 0
14:25:05.190949 IP 2.2.2.2.443 > 1.1.1.1.52002: Flags [S.], seq 3548265341, ack 1706341774, win 65160, options [mss 1460,sackOK,TS val 1827848772 ecr 1632358482,nop,wscale 7], length 0
14:25:05.194954 IP 1.1.1.1.52002 > 2.2.2.2.443: Flags [.], ack 1, win 2059, options [nop,nop,TS val 1632358485 ecr 1827848772], length 0
14:25:05.196046 IP 1.1.1.1.52002 > 2.2.2.2.443: Flags [P.], seq 1:334, ack 1, win 2059, options [nop,nop,TS val 1632358486 ecr 1827848772], length 333
...
14:25:05.321982 IP 2.2.2.2.443 > 1.1.1.1.52002: Flags [P.], seq 2993200:3014920, ack 615, win 506, options [nop,nop,TS val 1827848903 ecr 1632358612], length 21720
14:25:05.322002 IP 2.2.2.2.443 > 1.1.1.1.52002: Flags [P.], seq 3014920:3059808, ack 615, win 506, options [nop,nop,TS val 1827848903 ecr 1632358612], length 44888
14:25:05.322031 IP 2.2.2.2.443 > 1.1.1.1.52002: Flags [P.], seq 3059808:3080080, ack 615, win 506, options [nop,nop,TS val 1827848903 ecr 1632358612], length 20272
14:25:05.322045 IP 1.1.1.1.52002 > 2.2.2.2.443: Flags [.], ack 2884600, win 6777, options [nop,nop,TS val 1632358613 ecr 1827848897], length 0

Three things stood out here:

  • The ECN connections required about 4000 packets (in total);
  • the packets in those connections appeared to be limited to 1448 octets in length;
  • the total download time was significantly slower.

Analyzing a larger bunch of traffic

Because we were sniffing live data, and everything was encrypted with TLS, it was slightly more work to get the pcaps sorted. Luckily, the hostname used for the tests was rarely accessed outside of the test loop. This hostname is sent unencrypted as part of Server Name Indication (SNI), so that was a convenient way to filter out the necessary packets.

# Collect the remote ports used
tcpdump -nnr big.pcap |
  sed -e 's/.*1[.]1[.]1[.]1[.]//;s/:\? .*//' |
  sort -u >remote-ports

# Split the pcaps, if they contain the hostname (in SNI)
for port in $(cat remote-ports); do
  tcpdump -Annr big.pcap port $port 2>/dev/null |
    grep -q THE_UNIQUE_HOSTNAME &&
  tcpdump -nnr big.pcap port $port -w samples/$port.pcap 2>/dev/null
done

Now I had a bunch of pcaps, all sized 3 MiB (because the test download was about that large):

# ls -lh samples/
total 155M
-rw-r--r-- 1 tcpdump tcpdump 3.0M Feb  4 14:22 52323.pcap
-rw-r--r-- 1 tcpdump tcpdump 3.0M Feb  4 14:22 52325.pcap
-rw-r--r-- 1 tcpdump tcpdump 3.0M Feb  4 14:22 52328.pcap
...

Running some stats on these was easy:

# for fn in *.pcap; do
    packets=$(tcpdump -nnr $fn 2>/dev/null | wc -l);
    has_ecn=$(tcpdump -nnr $fn 'tcp[13] & 64 != 0' 2>/dev/null | grep -q '' && echo X || echo -);
    lastline=$(tcpdump -nnr $fn 2>/dev/null | sed -ne '1p;$p' | wtimediff | tail -n1);
    duration=$(echo $lastline | awk '{print $1}');
    when=$(echo $lastline | awk '{print $2}');
  echo $duration $when $fn ecn=$has_ecn packets=$packets
done | sort -k2V

That yielded the following (truncated) output:

+0.116948 14:21:59.860106 51642.pcap ecn=- packets=234
+0.135041 14:22:15.031269 51646.pcap ecn=- packets=237
...
+0.170701 14:24:16.486446 51698.pcap ecn=- packets=168
+0.229468 14:24:31.737016 51760.pcap ecn=- packets=639
+3.355901 14:24:50.133956 51939.pcap ecn=X packets=4211
+0.142125 14:25:05.333009 52002.pcap ecn=- packets=271
+0.168694 14:25:20.534368 52073.pcap ecn=- packets=357
...
+0.137638 14:29:23.340260 52303.pcap ecn=- packets=240
+0.122912 14:29:38.504557 52305.pcap ecn=- packets=266
+3.066842 14:29:56.607659 52308.pcap ecn=X packets=4055
+0.194032 14:30:11.863645 52314.pcap ecn=- packets=496
+0.128605 14:30:27.062803 52320.pcap ecn=- packets=229

Like clockwork, for every 20 opened connections — or once every 5 minutes — one connection was tried with ECN. That one would last for 3 seconds or more, and require about 4000 IP packets instead of the usual 100-500.

In fact, once we checked sysctl values configured on their systems, this was totally explainable. The connection count was not relevant, but the 5 minutes were:

# Max ECN setup percentage
net.inet.tcp.ecn_setup_percentage = 100

# Initial minutes to wait before re-trying ECN
net.inet.tcp.ecn_timeout = 5

Rather than enabling ECN globally for every flow — which could lead to connectivity issues on broken middleboxes — macOS Tahoe attempts to negotiate ECN on a per-connection basis, favoring it for a subset of outgoing flows to "test the waters."

"Middleboxes" are the firewalls, load balancers, and routers that sit between the client and server. Historically, some of these devices were configured to drop any packet with "unknown" or "reserved" bits set in the IP header — which included ECN. If a device simply discards ECN-marked packets, the connection hangs or times out.

Now that the cause for the flakiness was explained, on to the real problem: why did enabling ECN cause slowness?

Hardware offloading and the 1448 cap

When ECN was negotiated, we observed that the server stopped sending large, coalesced segments. Usually, when we run a tcpdump locally on the server, we see "super-packets" — massive chunks of data (often 20KB to 60KB) being pushed toward the NIC. This is the result of Generic Segmentation Offload (GSO) or TCP Segmentation Offload (TSO).

Checking the NIC capabilities with ethtool:

# ethtool -k enp2s0np0 | grep segmentation
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: on
    tx-tcp6-segmentation: on
generic-segmentation-offload: on

The hardware supports standard offloading, but not for ECN. When ECN is active, the tx-tcp-ecn-segmentation: off [fixed] flag tells the Linux kernel: "I don't know how to handle ECN bits in giant chunks, you'll have to segment them yourself."

In theory, the kernel should still use GSO to handle this in software, keeping the packets large until the very last moment (after tcpdump has seen them). But in practice, our captures showed the server was sending individual 1448-byte segments (MTU minus headers).

Disabling TSO and GSO using ethtool -K iface gso off tso off did not change any symptoms: normal traffic was still fast and showed up as large packets in tcpdump, ECN traffic was still slow and capped at 1448 octets.

Firmware upgrades

Since we had some issues with offloading and VLANs in 2025, I thought it was worth a try to check for updates.

It probably was, because 14.17.2020 is from before 2017, so almost 10 years old now.

# ./mlxup
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX4LX
  Part Number:      Super_Micro_AOC-C25G-m1S
  Description:      ConnectX-4 Lx EN adapter card; 25GbE single-port SFP28; PCIe3.0 x8
  PSID:             SM_2011000xxxxxx
  PCI Device Name:  0000:02:00.0
  Base MAC:         0cc47axxxxxx
  Versions:         Current        Available
     FW             14.17.2020     N/A
     PXE            3.4.0903       N/A
     UEFI           14.11.0028     N/A

  Status:           No matching image found

Those firmware versions look surprisingly similar to the official Mellanox firmware versions. There the latest version for a ConnectX4LX device is 14.32.1908. But for this SuperMicro specific device — with a Micro-LP form factor — no updates could be found.

On the SuperMicro site, there is a "web download" (wdl) link to an empty directory for the Super_Micro_AOC-C25G-I2S. But for the Super_Micro_AOC-C25G-m1S there's not even that.

Getting a new firmware for this device was not going to be the easy route.

Disabling ECN on the server

We could disable ECN on the server side, using the net.ipv4.tcp_ecn=0 sysctl. Now every connection initiated by the macOS client (after the ecn_timeout had passed) tried ECN but it was ignored by the server. This did solve the problem for them.

While this was a solution to the problem, it would be a regression in network tech, so we're not doing that.

Changing congestion control algorithm

After we depleted all other options, Google Gemini suggested tcp_congestion_control=bbr. By default this is cubic on a generic Ubuntu or Debian system.

This is not a setting I touch generally, but it sounded safe enough to try. And it did exactly what we wanted. The packet summaries now looked like this:

...
+0.100778 14:53:34.630292 52764.pcap ecn=- packets=154
+0.126388 14:53:49.789472 52781.pcap ecn=- packets=308
+0.114749 14:54:04.954907 52786.pcap ecn=X packets=388
+0.105739 14:54:20.126535 52789.pcap ecn=- packets=168
+0.109139 14:54:35.287600 52792.pcap ecn=- packets=159
...
+0.147384 14:58:37.988716 52866.pcap ecn=- packets=166
+0.112264 14:58:53.137022 52881.pcap ecn=- packets=174
+0.107939 14:59:08.281076 52890.pcap ecn=X packets=417
+0.105624 14:59:23.453946 52900.pcap ecn=- packets=161
+0.154969 14:59:38.667600 52901.pcap ecn=- packets=218
...

A single download with ECN was now possible with fewer than 500 packets. The packet captures with and without ECN now looked roughly the same.

A few notes:

  • The packet counts were consistently over 350, whereas the non-ECN flows could go as low as 150.
  • The macOS client still only tried ECN once every 5 minutes, even though it wasn't slower than its non-ECN counterpart. I speculate that it only flags endpoints for consistent ECN if flow improves with it.
  • It appears that hardware offloading had no real effect here. In fact, one could even speculate that offloaded ECN flows might suffer from a poor default congestion control as well. Maybe we'll need to disable hardware ECN offloading at some point when the NIC does support it.

The root cause

The root cause of the throughput collapse was the way CUBIC interprets network signals. In classic congestion control, an ECN-Echo (ECE) mark is treated as a "hard" congestion signal, identical to a dropped packet.

Our servers were caught in a "death spiral":

  • Because the NIC could not offload ECN-marked traffic (TSO off), the kernel had to process all packets in software.
  • Every individual packet marked with Congestion Experienced (CE) by a router surfaced immediately in the TCP stack as a distinct ECN-Echo signal.
  • CUBIC reacts to these marks by aggressively reducing its Congestion Window (cwnd), treating each CE mark as a loss-equivalent congestion signal.
  • Generic Segmentation Offload (GSO) could still have coalesced packets, but the constant "braking" from the shrinking cwnd prevented sufficient batching, resulting in a stream of MTU-sized (1448-byte) drips instead of high-speed bursts.

BBR congestion control as the fix

Switching to BBR (Bottleneck Bandwidth and Round-trip propagation time) resolved the issue by changing the server's reaction to those ECN marks. Unlike CUBIC, BBR is model-based. It ignores the "panic" of individual ECN marks and focuses instead on the actual measured delivery rate and RTT of the path.

By maintaining a high-speed flow based on the actual capacity of the "pipe," BBR allowed the Linux kernel to effectively use GSO. Even though the NIC hardware still couldn't do the segmentation, the CPU was now able to bundle data into large "super-packets" before handing them to the driver.

This explains the dramatic drop from 4000+ packets to under 500: BBR restored the flow efficiency that CUBIC had traded away for caution.

Conclusion

If you run into slow downloads and see ECN in the mix, look at these options:

  • sysctl -w net.inet.tcp.ecn_initiate_out=0 on a macOS client (or the Linux equivalent);
  • sysctl -w net.ipv4.tcp_ecn=0 on the server (to confirm the hypothesis);
  • ethtool -K iface tx-tcp-ecn-segmentation on on the server (if supported by your hardware);
  • sysctl -w net.ipv4.tcp_congestion_control=bbr on the server (as the robust, modern fix). You may need to modprobe tcp_bbr.

One question remains: why doesn't the macOS client settle on ECN after it finds that it works equally well as the non-ECN flow?


Back to overview Older post: recap 2025 - updates at OSSO