An automated tool developed by researchers aims to decrease the mean time to detection by enabling threat hunters to automate and collaborate within a secure, stable container environment.
The automated security tools in a Security Operations Center (SOC) can handle about 80% of cybersecurity threats, leaving a substantial 20% of more sophisticated threats undetected. These are the threats likely to be most detrimental to business operations, reputation, or survival. Threat hunting is an effective, proactive method to reduce detection time and minimize the impact of a threat, but we lack tools for scaling, automating, and collaborating.
Research at Colorado State University (CSU), in collaboration with Red Hat, the Open Cybersecurity Alliance (OCA), and IBM Security, approaches this problem by developing a team threat-hunting model enhanced by automation. Our approach replaces the laborious, difficult-to-scale practice of one-person threat hunting with multiple threat hunters working largely independently in a “pack” project. It also enables the reuse (not rewriting) of existing hunting knowledge from proprietary and public hunting repositories.
Disrupting the impact timeline
The NIST Cybersecurity Framework 2.0 defines core functions for cybersecurity outcomes at a high level. As seen in Figure 1, Identify and Protect outcomes help prevent and prepare for cybersecurity incidents, while Detect, Respond, and Recover outcomes help discover and remediate cybersecurity incidents. Govern applies to all steps of the process. The impact timeline is critical: the longer the dwell time—the time between when an attack begins and when it is detected—the more damage is done. Sophisticated threats that elude SOC automated security tools avoid detection for up to 280 days.
We hypothesize that we can reduce dwell time by providing a team of threat hunters a container environment that is secure, scalable, persistent, and collaborative, using Kestrel, OpenC2, and associated open source projects at an enterprise scale. Using a container platform enables reusing the Kestrel container in different platforms, allows for easier project spawning, and provides management for threat-hunting teams while providing enterprise capabilities to improve and track our metrics.
Our research focuses on a proactive hunt model based on hypothesis-based hunts that can use Indicators of Attack (IoAs) and the tactics, techniques, and procedures (TTPs) of attackers. We align our proactive hunt model to the MITRE ATT&CK framework, a globally accessible knowledge base of adversary tactics and techniques based on real-world observations. The ATT&CK knowledge base is a foundation for developing specific threat models and methodologies in the private sector, government, and the cybersecurity product and service community. Figure 2 is an example of the MITRE ATT&CK matrix for container technologies.
We want to proactively search and examine data for unnoticed security threats and use human intelligence to create hypotheses. The steps in this process can be broken down into a clear workflow.
- Understanding the security measurements in the target environment
- Thinking about potential threats escaping existing defenses
- Obtaining useful observations from system and network activities
- Developing threat hypotheses
- Revising threat hypotheses iteratively with the previous two steps
- Confirming new threats
If we find repeatable patterns in the data, we can use this workflow to automate some of the hunt, potentially also improving DevSecOps pipelines.
Kestrel and critical components
Threat-hunting activities start with answering two questions: what to hunt and how to hunt. Any threat-hunting activity involves both types of questions, and the answers to both questions contain domain-specific knowledge. However, the domain knowledge applicable to these questions is not the same. Answers to the what question contain domain knowledge that is highly creative, mostly abstract, and largely reusable from one hunt to another. Answers to the how question guide the realization of the what and are replaced from one hunting platform to another.
These questions are both addressed by the Kestrel Project, which provides a layer of abstraction to stop the repetition involved in cyber threat hunting. Kestrel has two main components: a threat-hunting language allowing human threat hunters to express what to hunt in terms of patterns, analytics, and hunt flows, and the runtime, a machine interpreter that deals with how to hunt.
The Kestrel language offers capabilities including applying existing public and proprietary detection logic and expressing thinking across heterogeneous data and threat intelligence sources. It also allows composing reusable hunting steps, flows, and hunt books. The Kestrel runtime compiles the expression of what to hunt against specific hunting platform instructions and executes the compiled code both locally and remotely. The runtime also assembles raw logs and records into human-friendly abstractions called entities (e.g., malware or Control-and-Command attack) to enable human threat hunters to create and develop threat hypotheses.
With Kestrel, we can write a pattern to match a pattern of tactics, techniques, and procedures (TTPs). For instance, one TTP pattern describes a web service exploit where a worker process of a web service, such as NGINX or NodeJS, is associated with a binary that is not the web service. This scenario is the result of an exploit of the worker process, and the common binary to execute is a shell, for example, bash.
To provide data to Kestrel, we express the TTP in a STIX pattern (Figure 3) using STIX-Shifter, an open source Python library that allows software to connect to products that house data repositories (e.g., SIEM systems, endpoint management systems, threat intelligence platforms, and others) by using STIX Patterning. STIX-Shifter returns results as STIX Observations. We do this so all security data, regardless of the source, looks and behaves similarly. You may get results like those in Figure 4 if there are logs that match the TTP.
exp_node = GET process FROM stixshifter://linuxserver31
WHERE [process:parent_ref.name = 'node' AND process:binary_ref.name
!= 'node']
START t'2023-04-05T00:00:00Z' STOP t'2023-04-06T00:00:00Z'
Figure 3. TTP Pattern and first hunt step
The OpenC2 threat-hunting actuator profile defines the OpenC2 actions, targets, arguments, and specifiers along with conformance clauses to enable the operation of OpenC2 producers and consumers in the context of cyber threat hunting. It covers invocation of stored hunting processes (e.g., hunt books), passing of hunt parameters, selection of analytics to apply to hunt data, and the expected type(s) and format(s) of information returned by hunting processes. All of these components provide the team with a system for sharing and collaboration.
Tracking MTTD
Our efforts focus on decreasing the Mean Time to Detect (MTTD) metric, a key measure of dwell time. We use a combination of technologies: a Kubernetes distribution plus Keycloak, JupyterHub, and a Docker container. The Docker container contains all the Kestrel components: kestrel-lang, kestrel-runtime, kestrel-analytics, and tutorials.
The container can be run standalone or with the collaboration environment; however, our goal is to realize the benefits of moving from standalone threat hunting to team threat hunting. A critical function of JupyterHub, as seen in Figure 5, is enabling collaboration features so that a threat-hunting team can share a Kestrel container. We are adding the capability to share the hunt steps and flows across users, projects, and organizations. Jupyterhub provides the ability to share Kestrel hunt notebooks. Keycloak provides the identity manager capability for users and roles.
Relevant metrics, including the MTTD, mean time to contain (MTTC), and mean time to repair (MTTR), are provided by the Kestrel-as-a-Service (KaaS) dashboard to track hunt project statistics. Historical metrics supplement current ones to show improvements gained by collaboration, compared to a siloed single threat hunter with tools on a local workstation who is not sharing hunt flows and steps.
The capabilities of KaaS for team threat hunting include persistence, hunt book and hunt step sharing, threat hunt project management, threat hunt pausing/restarting, and threat hunt project statistics. These capabilities reduce the time to incident detection and can be tracked through auditing and history as variables in MTTD. Average MTTD times for a single threat hunter are shown in Figure 6; for example, 39% of cyberattacks are detected within the span of months. Adding more threat hunters and more threat-detecting capabilities significantly reduces that span. Figure 7 shows the impact of additional capability from left to right on the diagram and the impact of different sizes of collaborative threat-hunting packs (2, 5, and 10 hunters).
Get involved
The next milestone for the KaaS project is finishing the threat-hunting project collaboration features, test/use cases, including GenAI, and compliance as code. We will then dive further into the comparison and analysis of the impacts of individual threat hunting with crowd hunting to determine the impact. Previous deployments were focused on development environments with Minikube, Kubernetes and Openshift AI. We are planning production deployments with users in defense, finance, as well as others.
Everyone is welcome to participate in the Open Cybersecurity Alliance. Individuals can make technical contributions to KaaS or Kestrel; OCA repositories are on GitHub. Organizations can become OCA sponsors, receive special recognition, and gain a seat on the OCA Project Governance Board. Individuals and organizations can join the Slack channel.
I encourage you to walk through the Kestrel tutorial in a Kubernetes testing environment. Instructions for deploying a KaaS development environment with Minikube can be found in the Open Cybersecurity Alliance repository on GitHub; you can then start the tutorial, located in the Kestrel documentation.
Acknowledgments
I worked with several others to build Kestrel as a Service (KaaS) so teams of threat hunters can collaborate in development and enterprise environments on container platforms. This group includes Professor of System Engineering Dr. Steve Simske (CSU), Open Source Program Manager Claudia Rauch (OASIS/OCA), Security Research Scientist Dr. Xiaokui Shu (IBM), Head of Hybrid Cloud Platform Adoption Practices Stephane Lefrere (Red Hat), along with others. The project started with the guides and technologies to build and deploy to a development environment, Minikube, and production environment running on a Kubernetes Cluster.