A few weeks ago at Oxide, we encountered a bug where a particular, somewhat large, data structure was panicking on serialization to JSON via serde. The problem was that JSON only supports map keys that are strings or numbers, and the data structure had an infrequently-populated map with keys that were more complex than that1.

We fixed the bug, but a concern still remained: what if some other map that was empty most of the time had a complex key in it? The easiest way to guard against this is by generating random instances of the data structure and attempting to serialize them, checking that this operation doesn’t panic. The most straightforward way to do this is with property-based testing, where you define:

  • a way to generate random instances of a particular type, and
  • given a failing input, a way to shrink it down to a minimal failing value.

Modern property-based testing frameworks like proptest, which we use at Oxide, combine these two algorithms into a single strategy, through a technique known as integrated shrinking. (For a more detailed overview, see my monad tutorial, where I talk about the undesirable performance characteristics of monadic composition when it comes to integrated shrinking.)

The proptest library has a notion of a canonical strategy for a type, expressed via the Arbitrary trait. The easiest way to define Arbitrary instances for large, complex types is to use a derive macro. Annotate your type with the macro:

use test_strategy::Arbitrary;

#[derive(Arbitrary)]
struct MyType {
    id: String,
    data: BTreeMap<String, MyInnerType>,
}

#[derive(Arbitrary)]
struct MyInnerType {
    value: usize,
    // ...
}

As long as all the fields have Arbitrary defined for them—and the proptest library defines the trait for most types in the standard library—your type has a working random generator and shrinker associated with it. It’s pretty neat!

I put together an Arbitrary implementation for our very complex type, then wrote a property-based test to ensure that it serializes properly:

use test_strategy::proptest;

// The outermost struct is called PlanningReport.
#[proptest]
fn planning_report_json_serialize(planning_report: PlanningReport) {
    serde_json::to_string(&planning_report).unwrap();
}

And, running it:

% cargo nextest run -p nexus-types planning_report
        PASS [   4.879s] nexus-types deployment::planning_report::tests::planning_report_json_serialize
────────────
     Summary [   4.880s] 1 test run: 1 passed, 24 skipped

The test passed!

But while we’re here, surely we should also be able to deserialize a PlanningReport, and then ensure that we get the same value back, right? We’ve already done the hard part, so let’s go ahead and add this test:

#[proptest]
fn planning_report_json_roundtrip(planning_report: PlanningReport) {
    let json = serde_json::to_string(&planning_report).unwrap();
    let deserialized: PlanningReport = serde_json::from_str(&json).unwrap();
    prop_assert_eq!(
        planning_report,
        deserialized,
        "input and output are equal"
    );
}

And…

% cargo nextest run -p nexus-types planning_report
        FAIL [   3.688s] nexus-types deployment::planning_report::tests::planning_report_json_roundtrip
  stderr ───
  minimal failing input: input = _PlanningReportJsonRoundtripArgs {
      [... many fields omitted for brevity]
      mgs_updates: PlanningMgsUpdatesStepReport {
          blocked_mgs_updates: [],
          pending_mgs_updates: PendingMgsUpdates {
              by_baseboard: {
                  BaseboardId {
                      part_number: "",
                      serial_number: "",
                  }: PendingMgsUpdate {
                      sp_type: Sled,
                      slot_id: 0,
                      details: HostPhase1(
                          PendingMgsUpdateHostPhase1Details {
                              expected_active_phase_1_slot: A,
                              expected_boot_disk: A,
                              sled_agent_address: [::ffff:0.0.0.0]:0,
                          },
                      ),
                  },
              },
          },
      },
      [...]
  }

The roundtrip test failed!

Why in the world did the test fail? My first idea was to try and do a textual diff of the Debug outputs of the two data structures. In this case, I tried out the pretty_assertions library, with something like:

fn planning_report_json_roundtrip(planning_report: PlanningReport) {
    // ...
    pretty_assertions::assert_eq!(planning_report, deserialized);
}

And the output I got was:

% cargo nextest run -p nexus-types planning_report
[...]
Test failed: assertion failed: `(left == right)`

Diff < left / right > :
 PlanningReport {
     [...]
     mgs_updates: PlanningMgsUpdatesStepReport {
         blocked_mgs_updates: [],
         pending_mgs_updates: PendingMgsUpdates {
             by_baseboard: {
                 BaseboardId {
                     part_number: "",
                     serial_number: "",
                 }: PendingMgsUpdate {
                     baseboard_id: BaseboardId {
                         part_number: "",
                         serial_number: "",
                     },
                     sp_type: Sled,
                     slot_id: 0,
                     details: HostPhase1(
                         PendingMgsUpdateHostPhase1Details {
                             expected_active_phase_1_slot: A,
                             expected_boot_disk: A,
                             sled_agent_address: [::ffff:0.0.0.0]:0,
                         },
                     ),
                 },
             },
         },
     },
     [...]
 }

There’s nothing in the output! No < or > as would typically be printed. It’s as if there wasn’t a difference at all, and yet the assertion failing indicated the before and after values just weren’t the same.

What is going on?#

We have one clue to go by: the integrated shrinking algorithm in proptest tries to shrink maps down to empty ones. But it looks like the pending_mgs_updates map is non-empty. This means that something in either the BaseboardId key or the PendingMgsUpdate value was suspicious.

A PendingMgsUpdate is defined as:

// These types had many more fields -- most have been omitted for brevity.
pub struct PendingMgsUpdate {
    pub baseboard_id: Arc<BaseboardId>,
    pub sp_type: SpType,
    pub slot_id: u16,
    pub details: PendingMgsUpdateDetails,
}

pub enum PendingMgsUpdateDetails {
    // ...
    HostPhase1(PendingMgsUpdateHostPhase1Details),
}

pub struct PendingMgsUpdateHostPhase1Details {
    pub expected_active_phase_1_slot: M2Slot,
    pub expected_boot_disk: M2Slot,
    pub sled_agent_address: SocketAddrV6,
}

Most of these types were pretty simple. The only one that looked even remotely suspicious was the SocketAddrV6, which ostensibly represents an IPv6 address plus a port number.

What’s going on with the SocketAddrV6? Does the Arbitrary implementation for it do something weird? Well, let’s look at it:

arbitrary!(SocketAddrV6, SMapped<(Ipv6Addr, u16, u32, u32), Self>;
    static_map(any::<(Ipv6Addr, u16, u32, u32)>(),
        |(a, b, c, d)| Self::new(a, b, c, d))
);

Like a lot of abstracted-out library code it looks a bit strange, but at its core it seems to be simple enough:

  • generate four values: an Ipv6Addr, a u16, a u32, and another u32
  • then pass them in to SocketAddrV6::new.

The Ipv6Addr is self-explanatory, and the u16 is probably the port number. But what are these last two values? Let’s look at the SocketAddrV6::new constructor:

pub const fn new(
    ip: Ipv6Addr,
    port: u16,
    flowinfo: u32,
    scope_id: u32,
) -> SocketAddrV6 {
    SocketAddrV6 { ip, port, flowinfo, scope_id }
}

What in the world are these two flowinfo and scope_id values? They look mighty suspicious.

A thing that caught my eye was the “Textual representation” section of the SocketAddrV6, which defined the representation as:

  • A left square bracket ([)
  • The textual representation of an IPv6 address
  • Optionally, a percent sign (%) followed by the scope identifier encoded as a decimal integer
  • A right square bracket (])
  • A colon (:)
  • The port, encoded as a decimal integer.

Note what’s missing from this representation: the flowinfo field!

We finally have a theory for what’s going on:

  • proptest generated a SocketAddrV6 with a non-zero flowinfo field.
  • When we went to serialize this field as JSON, we used the textual representation, which dropped the flowinfo field.
  • When we deserialized it, the flowinfo field was set to zero.
  • As a result, the before and after values were no longer equal.

Why did this not show up in the textual diff of the Debug values? For most types in Rust, the Debug representation breaks out all the fields and their values. But for SocketAddrV6, the Debug implementation (quite reasonably) forwards to the Display implementation. So the flowinfo field is completely hidden, and the only way to look at it is through the flowinfo method. Whoops.

How can we test this theory? The easiest way is to generate random values of SocketAddrV6 where flowinfo is always set to zero, and see if that passes our roundtrip tests. The proptest ecosystem has pretty good support for generating and using this kind of non-canonical strategy. Let’s try it out:

// This defines a strategy where flowinfo is always 0.
fn socket_addr_v6_without_flowinfo() -> impl Strategy<Value = SocketAddrV6> {
    any::<(Ipv6Addr, u16, u32)>().prop_map(
        |(addr, port, scope_id)| SocketAddrV6::new(addr, port, 0, scope_id),
    )
}

// Then, we can use this function like so.
#[derive(Arbitrary)]
pub struct PendingMgsUpdateHostPhase1Details {
    pub expected_active_phase_1_slot: M2Slot,
    pub expected_boot_disk: M2Slot,
    #[strategy(socket_addr_v6_without_flowinfo())]
    pub sled_agent_address: SocketAddrV6,
}

Pretty straightforward, and similar to how serde lets you provide custom implementations through #[serde(with = ...)].

Let’s test it out again:

% cargo nextest run -p nexus-types planning_report
        PASS [   4.828s] nexus-types deployment::planning_report::tests::planning_report_json_roundtrip
────────────
     Summary [   4.829s] 1 test run: 1 passed, 24 skipped

All right, looks like our theory is confirmed! We can now merrily be on our way… right?

This little adventure left us with more questions than answers, though:

  • What does this flowinfo field mean?
  • A SocketAddrV4 is just an Ipv4Addr plus a port; why is a SocketAddrV6 different?
  • Why is the flowinfo not part of the textual representation? Ipv4Addr, Ipv6Addr, and SocketAddrV4 are all roundtrip serializable. Why is SocketAddrV6 not?
  • Also: what is the scope_id field?

But what is flowinfo, anyway?#

The best place to start looking is in the IETF Request for Comments (RFCs)2 that specify IPv6. The Rust documentation for flowinfo helpfully links to RFC 2460, section 6 and section 7.

The flowinfo field is actually a combination of two fields that are part of every IPv6 packet:

  • a 20-bit Flow Label, and
  • an 8-bit Traffic Class3.

Section 6 of the RFC says:

Flow Labels

The 20-bit Flow Label field in the IPv6 header may be used by a source to label sequences of packets for which it requests special handling by the IPv6 routers, such as non-default quality of service or “real-time” service. This aspect of IPv6 is, at the time of writing, still experimental and subject to change as the requirements for flow support in the Internet become clearer. […]

And section 7:

Traffic Classes

The 8-bit Traffic Class field in the IPv6 header is available for use by originating nodes and/or forwarding routers to identify and distinguish between different classes or priorities of IPv6 packets. At the point in time at which this specification is being written, there are a number of experiments underway in the use of the IPv4 Type of Service and/or Precedence bits to provide various forms of “differentiated service” for IP packets […].

Traffic Classes#

Let’s look at the Traffic Class field first. This field is similar to IPv4’s differentiated services code point (DSCP), and is meant to provide quality of service (QoS) over the network. (For example, prioritizing low-latency gaming and video conferencing packets over bulk downloads.)

The DSCP field in IPv4 is not part of a SocketAddrV4, but the Traffic Class—through the flowinfo field—is part of a SocketAddrV6. Why is that the case? Rust’s definition of SocketAddrV6 mirrors the sockaddr_in6 defined by RFC 2553, section 3.3:

struct sockaddr_in6 {
    sa_family_t     sin6_family;    /* AF_INET6 */
    in_port_t       sin6_port;      /* transport layer port # */
    uint32_t        sin6_flowinfo;  /* IPv6 traffic class & flow info */
    struct in6_addr sin6_addr;      /* IPv6 address */
    uint32_t        sin6_scope_id;  /* set of interfaces for a scope */
};

Similarly, Rust’s SocketAddrV4 mirrors the sockaddr_in struct. There isn’t a similar RFC for sockaddr_in; the de facto standard is Berkeley sockets, designed in 1983. The Linux man page for sockaddr_in defines it as:

struct sockaddr_in {
    sa_family_t     sin_family;     /* AF_INET */
    in_port_t       sin_port;       /* Port number */
    struct in_addr  sin_addr;       /* IPv4 address */
};

So sin6_flowinfo, which includes the Traffic Class, is part of sockaddr_in6, but the very similar DSCP field is not part of sockaddr_in. Why? I’m not entirely sure about this, but here’s an attempt to reconstruct a history:

  • QoS was not originally part of the 1980s Berkeley sockets specification.
  • DSCP came about much later (RFC 2474, 1998).
  • Because C structs do not provide encapsulation, the sockaddr_in definition was set in stone and couldn’t be changed.
  • So instead, the DSCP field is set as an option on the socket, via setsockopt.
  • By the time IPv6 came around, it was pretty clear that QoS was important, so the Traffic Class was baked into the sockaddr_in6 struct.

(Even if sockaddr_in could be extended to have this field, would it be a good idea to do so? Put a pin in this for now.)

Flow Labels#

RFC 2460 says that the Flow Label is “experimental and subject to change”. The RFC was written back in 1998, over a quarter-century ago—has anyone found a use for it since then?

RFC 6437, published in 2011, attempts to specify semantics for IPv6 Flow Labels. Section 2 of the RFC says:

The 20-bit Flow Label field in the IPv6 header [RFC2460] is used by a node to label packets of a flow. […] Packet classifiers can use the triplet of Flow Label, Source Address, and Destination Address fields to identify the flow to which a particular packet belongs.

The RFC says that Flow Labels can potentially be used by routers for load balancing, where they can use the triplet source address, destination address, flow label to figure out that a series of packets are all associated with each other. But this is an internal implementation detail generated by the source program, and not something IPv6 users copy/pasting an address generally have to think about. So it makes sense that it isn’t part of the textual representation.

RFC 6294 surveys Flow Label use cases, and some of the ones mentioned are:

  • as a pseudo-random value that can be used as part of a hash key for load balancing, or
  • as extra QoS bits on top of the 8 bits provided by the Traffic Class field.

But this Stack Exchange answer by Andrei Korshikov says:

Nowadays […] there [are] no clear advantages of additional 20-bit QoS field over existent Traffic Class (Differentiated Class of Service) field. So “Flow Label” is still waiting for its meaningful usage.

In my view, putting flowinfo in sockaddr_in6 was an understandable choice given the optimism around QoS in 1998, but it was a bit of a mistake in hindsight. The Flow Label field never found widespread adoption, and the Traffic Class field is more of an application-level concern. In general, I think there should be a separation between types that are losslessly serializable and types that are not, and sockaddr_in6 violates this expectation. Making the Traffic Class (QoS) a socket option, like in IPv4, avoids these serialization issues.

What is scope_id?#

What about the other additional field, scope_id? What does it mean, and why does it not have to be zeroed out?

The documentation for a SocketAddrV6 says that in its textual representation, the scope identifier is included after the IPv6 address and a % character, within square brackets. So, for example, the following code sample:

let addr = SocketAddrV6::new("::1".parse().unwrap(), 80, 0, 42);
println!("{}", addr);

prints out [::1%42]:80. What does this field mean?

The reason scope_id exists has to do with link-local addressing. Imagine you connect two computers directly to each other via, say, an Ethernet cable. There isn’t a central server telling the computers which addresses to use, or anything similar—in this situation, how can the two computers talk to each other?

To address this issue, OS vendors came up with the idea to just assign random addresses on each end of the link. The behavior is defined in RFC 3927, section 2.1:

When a host wishes to configure an IPv4 Link-Local address, it selects an address using a pseudo-random number generator with a uniform distribution in the range from 169.254.1.0 to 169.254.254.255 inclusive.

(You might have seen these 169.254 addresses on your home computers if your router is down. Those are link-local addresses.)

Sounds simple enough, right? But there is a pretty big problem with this approach: what if a computer has more than one interface on which a link-local address has been established? When a program tries to send some data over the network, the computer has to know which interface to send the data out on. But with multiple link-local interfaces, the outbound one becomes ambiguous. This is described in section 6.3 of the RFC:

Address Ambiguity

Application software run on a multi-homed host that supports IPv4 Link-Local address configuration on more than one interface may fail.

This is because application software assumes that an IPv4 address is unambiguous, that it can refer to only one host. IPv4 Link-Local addresses are unique only on a single link. A host attached to multiple links can easily encounter a situation where the same address is present on more than one interface, or first on one interface, later on another; in any case associated with more than one host. […]

The IPv6 protocol designers took this lesson to heart. Every time an IPv6-capable computer connects to a network, it establishes a link-local address starting with fe80::. (You should be able to see this address via ip addr on Linux, or your OS’s equivalent.) But if you’re connected to multiple networks, all of them will have addresses beginning with fe80::. Now if an application wants to establish a connection to a computer in this fe80:: range, how can it tell the OS which interface to use?

That’s exactly where scope_id comes in: it allows the SocketAddrV6 to specify which network interface to use. Each interface has an index associated with it, which you can see on Linux with ip addr. When I run that command, I see:

% ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
2: enp4s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    [...]
    inet6 fe80::11fe:b754:2233:afb9/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: enp4s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
4: wlp13s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000

The 1, 2, 3 and 4 listed here are all the indexes that can be used as the scope ID. Let’s try pinging our address:

% ping6 fe80::11fe:b754:2233:afb9
ping6: Warning: IPv6 link-local address on ICMP datagram socket may require ifname or scope-id => use: address%<ifname|scope-id>
PING fe80::11fe:b754:2233:afb9 (fe80::11fe:b754:2233:afb9) 56 data bytes

Aha! The warning tells us that for a link-local address, the scope ID needs to be specified. Let’s try that using the % syntax:

% ping6 fe80::11fe:b754:2233:afb9%2
PING fe80::11fe:b754:2233:afb9%2 (fe80::11fe:b754:2233:afb9%enp4s0f0) 56 data bytes
64 bytes from fe80::11fe:b754:2233:afb9%enp4s0f0: icmp_seq=1 ttl=64 time=0.050 ms
64 bytes from fe80::11fe:b754:2233:afb9%enp4s0f0: icmp_seq=2 ttl=64 time=0.052 ms

Success! What if we try a different scope ID?

% ping6 fe80::11fe:b754:2233:afb9%3
PING fe80::11fe:b754:2233:afb9%3 (fe80::11fe:b754:2233:afb9%enp4s0f1) 56 data bytes
^C
--- fe80::11fe:b754:2233:afb9%3 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2080ms

This makes sense: the address is only valid for scope ID 2 (the enp4s0f0 interface). When we told ping6 to use a different scope, 3, the address was no longer reachable. This neatly solves the 169.254 problem with IPv4 addresses.

Since scope IDs can help disambiguate the interface on which a connection ought to be made, it does make sense to include this field in SocketAddrV6, as well as in its textual representation.

Scope ID portability#

The keen-eyed among you may have noticed that the ping6 commands above printed out an alternate representation: fe80::11fe:b754:2233:afb9%enp4s0f0. The enp4s0f0 at the end is the network interface that corresponds to the numeric scope ID. Many programs can handle this representation, but Rust’s SocketAddrV6 can’t.

Another thing you might have noticed is that the scope ID only makes sense on a particular computer. A scope ID such as 2 means different things on different computers. So the scope ID is roundtrip serializable, but not portable across machines.

Conclusion#

In this post we started off by looking at a somewhat strange inconsistency and ended up deep in the IPv6 specification. In our case, the SocketAddrV6 instances were always for internal services talking to each other without any QoS considerations, so flowinfo was always zero. Given that knowledge, we were okay adjusting the property-based tests to always generate instances where flowinfo was set to zero. (Here’s the PR as landed.)

Still, it raises questions: Should we wrap SocketAddrV6 in a newtype that enforces this constraint? Should serde provide a non-standard alternate serializer that also includes the flowinfo field? Should Debug not forward to Display when Display hides fields? Should Rust have had separate types from the start? (Probably too late now.) And should Berkeley sockets not have included flowinfo at all, given that it makes the type impossible to represent as text without loss?

The lesson it really drives home for me is how important the principle of least surprise can be. Both Ipv4Addr and Ipv6Addr have lossless textual representations, and SocketAddrV4 does as well. By analogy it would seem like SocketAddrV6 would, too, and yet it does not!

IPv6 learned so much from IPv4’s mistakes, and yet its designers couldn’t help but make some mistakes of their own. This makes sense: the designers could only see the problems they were solving then, just as we can only see those we’re solving now—and just as we encounter problems with their solutions, future generations will encounter problems with ours.


Thanks to Fiona, and several of my colleagues at Oxide, for reviewing drafts of this post.

Discuss on Hacker News and Lobsters.


  1. This is why our Rust map crate where keys can borrow from values, iddqd, serializes its maps as lists or sequences. ↩︎

  2. The Requests for Discussion we use at Oxide are inspired by RFCs, though we use a slightly different term (RFD) to convey the fact that our documents are less set in stone than IETF RFCs are. ↩︎

  3. The two fields sum up to 28 bits, and the flowinfo field is a u32, so there’s four bits remaining. I couldn’t find documentation for these four bits anywhere—they appear to be unused padding in the u32. If you know about these bits, please let me know! ↩︎