An Open vSwitch security feature causes a security problem. Here’s how to prevent it.

Jan 5, 2024 | Blog, Europe

By Vašek Šraier

Vašek Šraier is a software engineer at Guardsquare working on the security analysis tool AppSweep. He completed his Master’s thesis, “Performance of Open vSwitch-based Kubernetes Cluster in Pathological Cases,” at Charles University in Prague under the supervision of Jiri Benc, Principal Kernel Engineer, Red Hat Czech. For over a decade, Red Hat Czech has actively collaborated with students in the Czech Republic on research, leading to more than 400 theses.

While working on my Master’s thesis as a research intern at Red Hat, I investigated networking performance problems in Open vSwitch (OVS) in Kubernetes. My goal was to determine whether specific traffic patterns could cause performance issues.

Unfortunately for OVS, I was successful.

I experimented by flooding OVS with various types of packets and found one type that stressed the system—and that the root cause resided in the design of OVS itself. Even worse, it turns out that untrusted traffic sources could, in extreme cases, cause effective denial-of-service attacks on whole clusters. In this article, I will dive into the details of the issue and, at the end, talk about the impact and recommendations.

Open vSwitch

Open vSwitch (OVS) is a multi-platform software switch often used as a network element in software-defined networks (SDN), most commonly through the use of Open Virtual Network (OVN). OVS is currently used in the OpenShift container platform through the OVN-Kubernetes CNI plugin.

Internally, OVS is a collection of multiple software components (processes), each taking care of a different level of abstraction. At the highest level, OVS communicates over the OpenFlow protocol with controllers. On the other hand, the lowest level is carefully optimized for maximum performance, because that’s where the actual packet switching occurs. The most common installation uses a kernel module for packet switching to get overhead as low as possible.

The issue described in this article manifests from the collection of all components together, so for the sake of simplicity, I will mostly ignore the components and talk about OVS as a whole. Keep in mind, however, that the focus is mainly on the lowest-level component, the kernel module itself.

The traditional MAC flood attack

Before looking into the details of OVS, let’s look at traditional network infrastructure. Classical network switches maintain a forwarding table that maps media access control (MAC) addresses to physical ports. When a switch receives a packet, its MAC address is matched against the table and sent out of the appropriate port. Importantly, if the MAC address of the packet isn’t found in the forwarding table, the packet is broadcast to all physical ports, essentially imitating the operation of a network hub.

When the switch is powered on for the first time, the forwarding table is empty. To learn about the connected devices and fill the table, the source MAC address of every received packet is stored in the forwarding table and bound to the physical port the packet was received on. And this exact behavior can be exploited in the MAC flood attack.

An attacker connected to our physical network can flood the switch with packets with random source MAC addresses. The switch will respond by filling its forwarding table with random MAC addresses. The legitimate packets then end up broadcast across all physical ports because the switch does not have a free space in the forwarding table to store the legitimate information. All of this leads to a situation where the attacker can listen to most of the traffic.

Diagram of the MAC flood attack

^{Diagram of the MAC flood attack}

This attack has been known for a long time and has multiple implemented and deployed countermeasures. One effective countermeasure is called port security, which uses a statically configured whitelist of MAC addresses for each port to prevent the use of random MAC addresses.

OVS internals

Similar to classical network switches, OVS uses a generalized forwarding table, called the flow table, to decide what to do with an incoming packet. The flow table is used to look up actions for every packet received by the switch.

Contrary to the classical forwarding tables, the lookup keys in the flow table are not only MAC addresses, but generalized data structures called flow keys which contain general metadata about the packet including MAC addresses, IP addresses, TCP and UDP ports, VLAN tags, and much more. In addition to the flow key, a row in the flow table also contains a flow mask—a bit mask indicating which bits of the flow key should be checked for a match. This allows us to implement rules matching multiple packets with common header values. This is the pseudocode describing the rule lookup for a single packet in OVS:

def lookup_rule(flow_table, packet):
    key = extract flow key from the received packet
    for bitmask in flow_table.bitmasks:
        masked_key = apply bitmask to key
        if masked_key in flow_table.flows:
            return flow_table.flows[masked_key]
    return None

As with the traditional switches, the flow table starts empty. The learning process, however, is different. When a received packet does not match any rule in the flow table, it is sent to other parts of OVS for more computationally expensive processing. The packet is evaluated against the user-provided configuration, and a new flow rule is inserted into the flow table to handle similar packets in the future.

Security feature backfiring

When you configure your Kubernetes cluster with the OVN-Kubernetes CNI plugin, the OVS is automatically set up with port security. Any container running in the cluster will not be allowed to send packets with source MAC addresses other than the one generated for its virtual network interface.

But there’s a catch. Consider a scenario where an attacker controls one container in the cluster and starts sending packets with random source MAC addresses. OVS receives the first packet and scans the flow table, but the packet does not match any rule (the table is empty). The computationally more expensive learning procedure kicks in, and OVS tries to generate a new flow rule for the packet. OVS looks at its configuration and notices the enabled port security feature instructing it to drop all packets with non-whitelisted MAC addresses. So it generates a new flow rule that will drop the packet and all other packets with the same MAC address. The second packet with a different MAC address comes and it does not match any rule, and another drop rule is generated and inserted into the flow table. Then a third packet, and a fourth packet, and so on, until the flow table fills up.

Why doesn’t OVS generate a generic flow rule that will drop all invalid packets and not only the one just received? Because the flow table can only match packets positively. Packets are checked if they match a pattern, and if they do, an action is executed. It is, however, currently impossible to express a negative rule that would execute an action for a non-matching packet.

Port security, a feature intended among other things to prevent the overfilling of forwarding tables, ironically causes the overfilling of forwarding tables in OVS.

Practical impact of the problem

OVS has an internal limit of 200k flow rules in the flow table. Whenever this is exceeded, various protection mechanisms kick in and start limiting further growth. Generation of new flow rules is limited and old rules are deleted. All things considered, OVS is capable of handling the full flow table rather well. The only unusual thing happening is increased memory and CPU usage.

However, a problem arises when the system administrator limits the CPU and memory usage with cgroups. OVS does not handle these limits well. Memory limits cause crashes and CPU limits cause ridiculously large network latencies (e.g., 20s for a packet to pass through the switch).

I have focused on OVS in the context of the OVN-Kubernetes CNI plugin and its configuration. However, the same problem manifests itself in every OVS deployment where OVS is used to block some kind of traffic. I’ve observed the same issue with VMs on top of OpenStack and packets with random VLAN tags.

Generally, every time OVS is configured to block all packets other than some exceptions, it is possible to generate traffic that will fill the flow table. And unfortunately, I’m not aware of any simple fix that would prevent the table from overfilling. I make some suggestions in the full text of the thesis, however those are only my educated guesses on how fixing it could be approached.

However, not all is lost. OVS is still a performant piece of software and with some simple precautions, pretty much all of the problems can be avoided.

Recommendation for system administrators

So, what should you as a system administrator do to prevent this issue from impacting your system?

Remove any resource limits placed upon OVS. This completely eliminates the most serious problems, and the remainder is rather minor.

Monitor the output of the ovs-dpctl show command. It prints out the number of flow rules currently in use. If that number gets close to 200k, you are very likely under an attack.

Additionally, monitor OVS’s CPU and memory usage and the number of processed packets. The attack causes a significant increase in all of these three metrics. The memory usage of OVS peaks at around 2-3 GB and never grows further. That’s still quite a jump in memory usage from normal operations.

Also, for every attacker’s CPU thread busy with blasting out packets as fast as possible, OVS needs around 2-3 threads to handle the traffic fully. With unlimited resources, a single instance of OVS can easily push tens of thousands of packets per second. The attack uses lots of small packets, so alerts about sudden increases in processed packet count about that number could notify you about an ongoing problem.

blog

QUBIP for post-quantum cryptography demos pilots for IoT, telco

By Dmitry Belyavskiy, Red Hat Principal Software Engineer The transition to post-quantum cryptography (PQC) has been one of the hottest security topics of the last several years, as expected advancements in quantum computing continue to increase the risk of quantum...

Choosing LLMs to generate high-quality unit tests for code

Student research spotlight—Alexandra Skysľaková Not all large language models (LLMs) are equally good at generating tests for all programming languages. Alexandra Skysľaková, a recent graduate from the Faculty of Informatics at Masaryk University (MUNI), focused her...

What the Massachusetts AI Hub could mean for AI innovation

High-impact AI solutions to global challenges are within reach. Here’s how Massachusetts’ big bet on equity and collaboration helps. By Orran Krieger Opportunities for AI development in open source got a big boost in December when Massachusetts Governor Maura Healy...

New solutions for drug discovery: harnessing the power of open cloud and open source AI

By Gagan Kumar The convergence of open source technology and artificial intelligence is transforming drug discovery, introducing new standards of transparency, collaboration, and innovation. On October 30th, leaders from research, industry, and academia gathered at...

Intern spotlight: Arlo Albelli, bird nerd and builder of architecture-agnostic optimizations

PhD interns at Red Hat Research’s partner universities play a pivotal role in bringing together the cutting-edge thinking of research institutions with the real-world expertise of industry. The PhD program enables long-term research partnerships that provide greater...

Fedora Linux transition for quantum resistant cryptography

By Dmitry Belyavskiy While numerous robust post-quantum (PQ) standards exist, along with various projects implementing them, widespread adoption for communication and data protection hinges on their integration into mainstream OS distributions. By incorporating these...

Student research spotlight: Jakub Suchánek studies authentication in public open source repositories

Understanding user perception and behavior is often neglected in open source software (OSS) security. Jakub Suchánek, a student of the Faculty of Informatics at Masaryk University, collaborated with Red Hat Research on a project investigating authentication in public...

Intern spotlight: Eric Munson builds guitars and Unikernel Linux

Correctness in distributed systems: the case of jgroups-raft

By José Bolina Building distributed systems is complex work, but strong primitives with well-defined guarantees and an expected behavior can make it easier. With stronger guarantees in primitives come strong safety and correctness verification requirements. In some...

Kernel Development Learning Pipeline program brings Linux to college students

By Joel Savitz The operating system is at the center of open source innovation, but a surprising number of college students lack exposure to this domain and, in particular, lack comfort with the Linux kernel. As a result, there’s an industry-wide shortage of qualified...