Seccerts is a much-needed tool for data scraping and analysis of security certificates, but creating it was harder than expected. Here’s why.
Security certification documents from certification schemes like Common Criteria (CC) and the National Institute of Standards and Technology (NIST) Federal Information Processing Standard (FIPS) contain valuable, detailed information. Most of it, however, is not machine readable. Extracting value from these loosely structured PDF documents is a tedious, labor-intensive, error-prone manual process.
If only the documents were properly structured and automatically processable!
Our laboratory at the Center for Research on Cryptography and Security (CRoCS) aimed to write a suite of processing tools to extract basic information from public certification documents and allow for a more automated search for potentially vulnerable ones. We were soon exposed to the intricate complexities of CC and FIPS certification schemes and realized what we thought would be a quick task would take much longer. We also needed industry expert knowledge and insight.
Functions of the seccerts portal
The CRoCS laboratory first looked into CC certification documents out of necessity rather than curiosity. We found a serious vulnerability allowing factoring of RSA keys generated by cryptographic smartcards certified to levels as high as CC Evaluation Assurance Level 6+¹ and a timing side-channel attack allowing extraction of private ECDSA (elliptic curves-based digital signature algorithm) keys.² The certification documents were the most detailed source of the information we needed without signing non-disclosure agreements, but they were difficult to use.
The vulnerability disclosure process highlighted another issue: when using a composite product, will users be notified when a single component is found to be vulnerable? The implications of this problem are substantial. For example, the Estonian government learned of the vulnerability in their electronic citizenship cards less than two months before national elections despite the vulnerability in the underlying chip being privately communicated to major customers for half a year.
To mitigate these problems, CRoCS set several goals for developing a suite of tools to analyze security certification documents. Some goals were clear from the beginning; others gained importance as we dug deeper into the ecosystem. These are the most important ones:
Make existing information more available: The portal uses artifacts from existing certification schemes and the National Vulnerability Database. It provides additional insight by processing, connecting, and overlaying these data sources. Processed data is visualized and available in JSON format for further processing. Python-based API and example Jupyter notebooks are prepared for instant analysis. The seccerts portal is extensible to other schemas or databases in the future.
Provide deeper insight into certification ecosystem trends: The certification process evolves over time, with different actors adopting potentially different strategies during the certification procedure. The data-driven approach may provide insight into how items are certified, which certification claims are used more or less frequently, the type of items certified, which security and cryptographic mechanisms are used, and other factors. As a more lightweight and agile certification scheme requires us to change or omit some existing steps, understanding the security impact of existing steps is crucial.
Utilize open data and tools for better transparency and accessibility: Open source, freely available tools, and a data-driven approach provide better accessibility for end users by extracting the most relevant data otherwise hidden in the certification documents. Open availability also increases the transparency of the certification process by making it easier to verify claims and compare different certificates.
Facilitate better end user verifiability of the purchased certified product: In many cases, end users have limited options for verifying whether the product is genuine. Standard shallow identifiers, such as serial numbers, can easily be tampered with. To mitigate this, the seccerts portal may instead host authenticated forensic profiles based on harder-to-manipulate behavioral properties, such as detailed performance profiling or power consumption traces of certified devices created by a trusted authority using open tools. The user will later collect the same properties from the purchased product using the same open tools and compare them to the expected forensic template. Increased end user verifiability should increase product scrutiny by performing repeated checks over time and performed by many different users. Even new product tests can be performed and shared with others, increasing end user confidence.
Enable faster notification to end users in case of a new potential vulnerability: Pairing certified items and their dependencies (referenced certificates) with the platform identifier in the vulnerability database allows push notification for relevant changes, like the occurrence of a new vulnerability, in the user-selected set of certificates. The seccerts portal provides notifications thanks to an extracted graph of references and National Vulnerability Database (NVD) mapping.
Aid vulnerability research: Information aggregated from multiple sources allows a more exhaustive and efficient search for clues about related and potentially vulnerable (certified) products and provides better insight into a specific vulnerability’s impact. A discoverer of the vulnerability quickly identifies other companies for targeted responsible disclosure.
Open availability also increases the transparency of the certification process by making it easier to verify claims and compare different certificates.
While not all documents and certification artifacts are publicly available, those that are provide a trove of interesting technical data commonly used in domains listed above, but in a rather ad-hoc and often labor-intensive way. Our goal is to systematize and make these efforts easier while learning something along the way.
Results achieved, issues encountered
To tackle these issues, we had to process available source documents and build large sets of regular expressions for topics of interest (e.g., certificate IDs, cryptographic algorithms, protection profiles, security standards, security functional and assurance claims, defenses used, and many more). We then had to devise heuristic methods to pair related documents, including records from different databases. During the process, we obtained some insight into the pain points of the certification process.
To determine the accuracy of the automated extraction and analysis, we had to rely on some ground truth and expert knowledge. We utilized many years of our own experience with security analysis of cryptographic smartcards, which constitutes a significant fraction of all devices certified under the CC scheme.³ Our collaboration with Red Hat engineers allows us to get deep insight and possibly verify data for several other categories, including operating systems. We also manually labeled large testing and validation datasets to pair certified products and records in the NVD.
These are the most critical systematic issues we identified:
The human element: Certificate reports are written by humans for humans, and it shows. Documents are in different languages with human-made typos. Typos hinder automatic processing, forcing the use of heuristics to correct the errors—always missing some and possibly introducing new ones. Even the format of a fundamental piece of certificate information—unique certificate IDs—is left to separate national schemes. Some have well-structured but mutually different naming schemes containing the year of certification, incremental number, and optional revisions, while others adopt only the incremental number. This makes it hard to unambiguously reference and automatically extract these references.
Fuzzy boundaries of certified items: The Target of Evaluation (ToE) – that is, the part of the item that actually undergoes certification – is specified in a somewhat informal way, making it difficult for automatic processing to establish the ToE boundaries. The software equivalent of the Bill Of Materials (BOM) was proposed as a partial solution, but the task is more complicated: ToE covers not only the parts of the certified product but also the circumstances and environment of use. Fuzzy boundaries make it difficult for end users to verify whether the product used is in the certified configuration or to properly pair and evaluate the vulnerabilities reported.
Vulnerabilities: A connection of certified items to external sources, especially vulnerability databases, is ambiguous at best and almost impossible to establish at worst. We manually labeled thousands of certificates just to train the mapping heuristic classifier of a certified item to the Common Platform Enumeration (CPE) record used in the National Vulnerability Database, still resulting in only 90% accuracy.
Design for replicability: Evaluation labs function as trusted mediators that allow assessment of certification claims based on proprietary (non-public) documents. The steps taken, tools used, and results obtained by evaluation labs are frequently confidential. Moreover, the level of expertise found in specialized evaluation laboratories is not available to all end users. Public documents, like the maintenance reports, also frequently lack the data used during issuance. As a result, certification is primarily a one-off exercise not independently verifiable later. And because vendors are sponsoring the evaluation of their own products, conflicts of interest may arise.
Problems for product release
Overall, security certifications help raise the bar of product quality. However, there are at least two ways that they can make product releases more difficult. First, frequent changes in standards and requirements in otherwise rigorous certification processes make keeping certifications up to date challenging, both from a technical perspective and for planning a release timeline. In the case of FIPS 140-2/140-3, the problem is how slow the cryptographic module validation process became. It’s not uncommon for modules to stay on the so-called Modules in Process (MIP) list for many months, leading to many new uncovered vulnerabilities while the module is in the review queue.
For example, Red Hat’s strategy is to keep CC certificates updated with every Extended Updates Support (EUS) release. The default lifecycle of the non-EUS release does not align with the certification process, as the release is supported only for six months. A product must receive a certificate within six months; however, testing has to be performed on the final General Availability code level. The standard six-month lifecycle is not long enough to finish the certification within these boundaries. Any delays in the certification process increase the risk of important vulnerabilities and can jeopardize the whole project.
The second problem is that many certification schemes essentially require products to contain no known vulnerabilities. The National Information Assurance Partnership (NIAP) requires a vulnerability search no older than 30 days, which may cause last-minute changes in the evaluated product.
Just as significant, many found CVEs can be justified in one of several ways:
- CVE is already fixed.
- CVE could be mitigated by the evaluated configuration.
- CVE might not be applicable; for example, affected hardware is not available in the evaluated configuration.
Hence the naive interpretation of this data is almost impossible and may lead to many false positives. For example, the ToE for a product may be a tiny subset of the standard product offering. But as ToE boundaries and CVE record scope are difficult to establish automatically, the seccerts project outputs some false positives. Similarly, a human evaluator must filter the vulnerabilities report during the certification period.
Making certification better, faster, and cheaper
Addressing these issues will be difficult but necessary to make certification usable for frequently changing products. We have a few suggestions:
- Provide data in standardized, automatically processable formats—sanitized, normalized, and directly available.
- Document boundaries of certification with a clear, human- and computer-readable structure that enables running and verifying a system in the same environment as the certified one.
- Include a fully automated installation process in guidance documentation to ensure the system is remediated into the appropriate configuration.
- Proactively assign a CPE record in the NVD to every issued certification, using unique and robust identifiers.
- Make available tooling (ideally open source) and complete documentation of the configuration and parameters used by evaluation labs.
Automation of certification testing has already proven to be a viable means of improving the certification processes. FIPS’s Automated Cryptographic Validation Testing System (ACVTS) is one of the first real, production-ready attempts to automate security certifications. It replaced the semi-manual Cryptographic Algorithm Validation System (CAVS) in June 2020. A client-server infrastructure using the ACVP protocol (JSON-based) has been added on top of the old-style CAVS test harness, and algorithm certificates are issued automatically. ACVP (the protocol) only covers the algorithm-testing part of FIPS 140-2/140-3 validations and even alone can speed up the validation process. Turnaround to get the algorithm certificates required for module validation is almost instant; module validation is still the biggest portion of validation.
We remain convinced that our goals are worth pursuing further, despite being significantly more challenging than anticipated. Interest from national certification bodies, industry, security researchers, and the portal webpage has grown, resulting in discussions that explain some of the problems and open new questions
No silver bullets exist for such a complex environment. Still, based on our experience, we believe data analysis provides compelling insights and highlights issues that, when solved, will help improve the studied certification schemes. We believe more transparent and available certification data are helpful to all parties involved—especially vendors, regulators, and end users—despite their different interests. We hope the seccerts project provides useful base data to facilitate improvements to existing certification schemes to make them faster, cheaper, and more accessible. It can also help end users using certified products right now by getting more from the promised benefits of security certifications.
Finally, and maybe most importantly, certification bodies should conduct periodic assessments of the impact of certification on the security of products certified, just as we planned to do as academic researchers. The recommendations above would then become a natural prerequisite for completing such evaluation and hopefully result in a more transparent certification process with more value added.
1. ROCA vulnerability, CVE-2017-15361. See Matus Nemec, Marek Sys, Petr Švenda, Dusan Klinec, and Vashek Matya, “The return of Coppersmith’s attack: practical factorization of widely used RSA moduli,” in 24th ACM Conference on Computer and Communications Security (CCS’2017), p. 1631-1648, 2017.
2. Minerva vulnerability, CVE-2019-15809. See Jan Jancar, Vladimir Sedlacek, Petr Švenda, and Marek Sys, “Minerva: The curse of ECDSA nonces (systematic analysis of lattice attacks on noisy leakage of bit-length of ECDSA nonces),” in IACR Transactions on Cryptographic Hardware and Embedded Systems, 2020(4): 281–308.
3. The category “ICs, Smart Cards, and Smart Card-Related Devices and Systems” contains around 35% (568 out of 1606) of all currently active certificates. Common Criteria, Certified Products List—Statistics, commoncriteriaportal.org/products/stats, June 06, 2022.