An Open vSwitch security feature causes a security problem. Here’s how to prevent it.

Jan 5, 2024 | Blog, Europe

By Vašek Šraier

Vašek Šraier is a software engineer at Guardsquare working on the security analysis tool AppSweep. He completed his Master’s thesis, “Performance of Open vSwitch-based Kubernetes Cluster in Pathological Cases,” at Charles University in Prague under the supervision of Jiri Benc, Principal Kernel Engineer, Red Hat Czech. For over a decade, Red Hat Czech has actively collaborated with students in the Czech Republic on research, leading to more than 400 theses.

While working on my Master’s thesis as a research intern at Red Hat, I investigated networking performance problems in Open vSwitch (OVS) in Kubernetes. My goal was to determine whether specific traffic patterns could cause performance issues. 

Unfortunately for OVS, I was successful.

I experimented by flooding OVS with various types of packets and found one type that stressed the system—and that the root cause resided in the design of OVS itself. Even worse, it turns out that untrusted traffic sources could, in extreme cases, cause effective denial-of-service attacks on whole clusters. In this article, I will dive into the details of the issue and, at the end, talk about the impact and recommendations.

Open vSwitch

Open vSwitch (OVS) is a multi-platform software switch often used as a network element in software-defined networks (SDN), most commonly through the use of Open Virtual Network (OVN). OVS is currently used in the OpenShift container platform through the OVN-Kubernetes CNI plugin.

Internally, OVS is a collection of multiple software components (processes), each taking care of a different level of abstraction. At the highest level, OVS communicates over the OpenFlow protocol with controllers. On the other hand, the lowest level is carefully optimized for maximum performance, because that’s where the actual packet switching occurs. The most common installation uses a kernel module for packet switching to get overhead as low as possible.

The issue described in this article manifests from the collection of all components together, so for the sake of simplicity, I will mostly ignore the components and talk about OVS as a whole. Keep in mind, however, that the focus is mainly on the lowest-level component, the kernel module itself.

The traditional MAC flood attack

Before looking into the details of OVS, let’s look at traditional network infrastructure. Classical network switches maintain a forwarding table that maps media access control (MAC) addresses to physical ports. When a switch receives a packet, its MAC address is matched against the table and sent out of the appropriate port. Importantly, if the MAC address of the packet isn’t found in the forwarding table, the packet is broadcast to all physical ports, essentially imitating the operation of a network hub.

When the switch is powered on for the first time, the forwarding table is empty. To learn about the connected devices and fill the table, the source MAC address of every received packet is stored in the forwarding table and bound to the physical port the packet was received on. And this exact behavior can be exploited in the MAC flood attack.

An attacker connected to our physical network can flood the switch with packets with random source MAC addresses. The switch will respond by filling its forwarding table with random MAC addresses. The legitimate packets then end up broadcast across all physical ports because the switch does not have a free space in the forwarding table to store the legitimate information. All of this leads to a situation where the attacker can listen to most of the traffic.

Diagram of the MAC flood attack

Diagram of the MAC flood attack

This attack has been known for a long time and has multiple implemented and deployed countermeasures. One effective countermeasure is called port security, which uses a statically configured whitelist of MAC addresses for each port to prevent the use of random MAC addresses.

OVS internals

Similar to classical network switches, OVS uses a generalized forwarding table, called the flow table, to decide what to do with an incoming packet. The flow table is used to look up actions for every packet received by the switch.

Contrary to the classical forwarding tables, the lookup keys in the flow table are not only MAC addresses, but generalized data structures called flow keys which contain general metadata about the packet including MAC addresses, IP addresses, TCP and UDP ports, VLAN tags, and much more. In addition to the flow key, a row in the flow table also contains a flow mask—a bit mask indicating which bits of the flow key should be checked for a match. This allows us to implement rules matching multiple packets with common header values. This is the pseudocode describing the rule lookup for a single packet in OVS:

def lookup_rule(flow_table, packet):
    key = extract flow key from the received packet
    for bitmask in flow_table.bitmasks:
        masked_key = apply bitmask to key
        if masked_key in flow_table.flows:
            return flow_table.flows[masked_key]
    return None

As with the traditional switches, the flow table starts empty. The learning process, however, is different. When a received packet does not match any rule in the flow table, it is sent to other parts of OVS for more computationally expensive processing. The packet is evaluated against the user-provided configuration, and a new flow rule is inserted into the flow table to handle similar packets in the future.

Security feature backfiring

When you configure your Kubernetes cluster with the OVN-Kubernetes CNI plugin, the OVS is automatically set up with port security. Any container running in the cluster will not be allowed to send packets with source MAC addresses other than the one generated for its virtual network interface.

But there’s a catch. Consider a scenario where an attacker controls one container in the cluster and starts sending packets with random source MAC addresses. OVS receives the first packet and scans the flow table, but the packet does not match any rule (the table is empty). The computationally more expensive learning procedure kicks in, and OVS tries to generate a new flow rule for the packet. OVS looks at its configuration and notices the enabled port security feature instructing it to drop all packets with non-whitelisted MAC addresses. So it generates a new flow rule that will drop the packet and all other packets with the same MAC address. The second packet with a different MAC address comes and it does not match any rule, and another drop rule is generated and inserted into the flow table. Then a third packet, and a fourth packet, and so on, until the flow table fills up.

Why doesn’t OVS generate a generic flow rule that will drop all invalid packets and not only the one just received? Because the flow table can only match packets positively. Packets are checked if they match a pattern, and if they do, an action is executed. It is, however, currently impossible to express a negative rule that would execute an action for a non-matching packet.

Port security, a feature intended among other things to prevent the overfilling of forwarding tables, ironically causes the overfilling of forwarding tables in OVS.

Practical impact of the problem

OVS has an internal limit of 200k flow rules in the flow table. Whenever this is exceeded, various protection mechanisms kick in and start limiting further growth. Generation of new flow rules is limited and old rules are deleted. All things considered, OVS is capable of handling the full flow table rather well. The only unusual thing happening is increased memory and CPU usage.

However, a problem arises when the system administrator limits the CPU and memory usage with cgroups. OVS does not handle these limits well. Memory limits cause crashes and CPU limits cause ridiculously large network latencies (e.g., 20s for a packet to pass through the switch).

I have focused on OVS in the context of the OVN-Kubernetes CNI plugin and its configuration. However, the same problem manifests itself in every OVS deployment where OVS is used to block some kind of traffic. I’ve observed the same issue with VMs on top of OpenStack and packets with random VLAN tags.

Generally, every time OVS is configured to block all packets other than some exceptions, it is possible to generate traffic that will fill the flow table. And unfortunately, I’m not aware of any simple fix that would prevent the table from overfilling. I make some suggestions in the full text of the thesis, however those are only my educated guesses on how fixing it could be approached.

However, not all is lost. OVS is still a performant piece of software and with some simple precautions, pretty much all of the problems can be avoided.

Recommendation for system administrators

So, what should you as a system administrator do to prevent this issue from impacting your system?

Remove any resource limits placed upon OVS. This completely eliminates the most serious problems, and the remainder is rather minor.

Monitor the output of the ovs-dpctl show command. It prints out the number of flow rules currently in use. If that number gets close to 200k, you are very likely under an attack.

Additionally, monitor OVS’s CPU and memory usage and the number of processed packets. The attack causes a significant increase in all of these three metrics. The memory usage of OVS peaks at around 2-3 GB and never grows further. That’s still quite a jump in memory usage from normal operations. 

Also, for every attacker’s CPU thread busy with blasting out packets as fast as possible, OVS needs around 2-3 threads to handle the traffic fully. With unlimited resources, a single instance of OVS can easily push tens of thousands of packets per second. The attack uses lots of small packets, so alerts about sudden increases in processed packet count about that number could notify you about an ongoing problem.

Related Stories

Hackathons power open source technology and innovative research

Hackathons power open source technology and innovative research

By Chris Tate, Principal Software Engineer, Red Hat Christopher Tate is a lead software engineer for logging, metrics, alerts, and AI/ML smart data research projects in the New England Research Cloud (NERC) environment. He is also the creator of the Smart Village...

Intern Spotlight: Christina Xu, Red Hat Research Boston

Intern Spotlight: Christina Xu, Red Hat Research Boston

At Red Hat Research, we hire creative, passionate students ready to work and learn with a global leader in open source solutions. Our interns bring fresh ideas and new connections to challenging problems in the open source community, unlocking their own potential...

Intern Spotlight: Jake Correnti, Red Hat Research Boston

Intern Spotlight: Jake Correnti, Red Hat Research Boston

At Red Hat Research, we hire creative, passionate students ready to work and learn with a global leader in open source solutions. Our interns bring fresh ideas and new connections to challenging problems in the open source community, unlocking their own potential...

Getting started with data science and machine learning

Getting started with data science and machine learning

Data science has exploded in popularity (and sometimes, hype) in recent years. This has led to an increased interest in learning the subject. With so many possible directions, it can be hard to know where to start. This blog post is here to help.

The (open) source of cutting-edge innovation

The (open) source of cutting-edge innovation

by Gordon Haff, technology advocate at Red Hat Where do people come together to make cutting-edge invention and innovation happen? One possible answer is the corporate research lab. More long-term focused than most company product development efforts, corporate labs...

Intern Spotlight: Maria Shevchuk, Red Hat Research Boston

Intern Spotlight: Maria Shevchuk, Red Hat Research Boston

This blog post spotlights Maria Shevchuk, a senior pursuing a BS in Biomedical Engineering and a BA in Computer Science dual degree at Boston University.  Maria has worked with Red Hat through student-funded opportunities associated with the Red Hat Collaboratory at Boston University and directly as a Red Hat intern.  She spoke with us about her research with the Red Hat Collaboratory at Boston University, how she has leveraged her time at Red Hat to pursue her passions in healthcare and technology, making the most of an internship, and her take on the hot dog sandwich debate.

Mastering Git with university students

Mastering Git with university students

Irina Gulina, Sr. Software Quality Engineer, RHEL for SAP Solutions, CCSP, Red Hat, and Tomáš Tomeček, Senior Principal Software Engineer, Linux Integration Engineering, Red Hat, discuss the Mastering Git course they teach at Masaryk University (MUNI) at the Faculty of Informatics (FI) in Brno, Czech Republic. The course was organized with the help of Martin Ukrop, Red Hat Program Manager, Red Hat Research.