Historically, usability studies have looked mostly at end users, doing focus groups or user testing with customers or the general public. This process often neglected developers, system administrators, and other IT professionals and the systems they use day to day. Our research focuses on the usability of Transport Layer Security (TLS)—specifically, handling the X.509 certificates—for IT professionals, investigating library APIs, command-line interfaces, manuals, and documentation. Cooperating with developers of these tools, we aim to make them more secure through better usability.
The vast majority of usability studies of security focus on end users who lack extensive IT experience. They revolve mostly around passwords or other forms of authentication, mental models of security, mobile app permissions, or browser warnings. When looking at the headcount, these users do form the majority of the user base, by far. However, the impact of their security mishaps usually involves only themselves. System administrators and support engineers, although much smaller in number, have a much greater influence. If they err, tens or hundreds of end users are usually affected. The impact gets even higher when we look at endpoint software developers, and higher still with the library or OS developers’ decisions (and possible failures), which influence millions. Security being usable for them is, therefore, of utmost importance.
An example of unusable security
Let’s look at the API of cURL, the ubiquitous library for transferring data with URLs. When initiating a secure connection with a server, cURL needs to verify the server’s authenticity. This is most commonly done by validating the server’s X.509 certificate. In the cURL API, two flags are controlling the certificate validation process:
‘CURL_SSL_VERIFYPEER’ configuring if the certificate should be validated at all, and ‘CURL_SSL_VERIFYHOST’ specifying in what way to compare the certificate subject name with the server hostname. The devil, however, hides in the details.
Consider the PayPal Payments SDK, a major worldwide online payment system. In early 2012, the production code had a bug and—incorrectly and very insecurely—set both ‘CURL_SSL_VERIFYPEER’ and ‘CURL_SSL_VERIFYHOST’ to ‘FALSE’. This meant that secure connections from PayPal SDK using cURL were not checking the server’s identity, opening the door to many attack vectors. Fortunately, the bug was spotted and “fixed”: on April 27, 2012, the developers set both these cURL flags to ‘TRUE’.
Where is the problem then? Well, while ‘CURL_SSL_VERIFYPEER’ is indeed a boolean, ‘CURL_SSL_VERIFYHOST’ is an integer, where the setting of zero disables the hostname verification, the setting of one is a non-enforcing debug option, and the setting of two enables the full hostname verification. And in cURL, since it’s written in C, the value of ‘TRUE’ is implicitly converted to one, effectively disabling hostname verification and allowing connections to servers with valid, but possibly stolen, certificates.1
Who is to blame? Is it the PayPal developers making a mistake? They were definitely not alone: at the time, similar bugs were present in ZenCart, Amazon Flexible Payments, Apache HttpClient, and Trillian. Is it the cURL developers for the inconsistent (and slightly counterintuitive) interface? The documentation clearly stated that the flags work this way. Is it the designers of the language of C for allowing silent coercion of variables? In fact, probably all of them share a bit of responsibility. Nevertheless, this and other similar examples have shown the developer world the extent of security consequences that can be caused by lousy usability for developers.
The world of certificate validation
Our usability security research revolves around X.509 certificates, their generation, validation, and understanding. Why? Nowadays, most developers need secure network connections somewhere in their products. Today, that mostly means using TLS, which, in turn, most likely means validating the authenticity of the server by validating its certificate.
Furthermore, it turns out that understanding all the various quirks and corners of certificate validation is far from straightforward. OpenSSL, one of the most widely used libraries for TLS, has almost 80 distinct error states related only to certificate validation. Managing such an error landscape gets complicated and thus not all the errors convey the explanation and security consequences well enough.
Proper understanding of errors is, however, essential. Imagine you are attempting a TLS connection and the certificate validation fails with the code of ‘X509_V_ERR_PERMITTED_VIOLATION’. You look it up in the documentation only to learn that “the permitted subtree was violated.” If you wave it off as something unimportant, you risk connecting to a malicious server. What does the error mean? The issuing CA was constrained to issue certificates only for a given (sub-)namespace (“subtree”), and this particular one violates the restriction. Thus, the error may even indicate suspicious activity at the CA level!
Of course, you might claim that developers would not continue connecting with a certificate error. In that, however, you would be wrong. Multiple studies (including our own, described below) show that users, including developers, routinely bypass certificate warnings and errors if it’s possible. And to determine which errors will offer the user the possibility of clickthrough and which will not, the developers need to understand the errors in the first place.
Usability of certificate handling tools
Our first study took place in 2017, among the developers attending DevConf.CZ, an open source community conference in Brno, Czech Republic, organized by Red Hat. We set up a booth and asked developers, administrators, and other IT professionals passing by to generate and validate a handful of certificates using command-line OpenSSL. While on it, we were watching where they struggled and what resources they used.
The usability of OpenSSL turned out to be far from ideal. This is supported by the participants’ subjective opinions—many avidly said they hate interacting with OpenSSL—as well as the objective measures of the task results. For example, 44% of the participants were unsuccessful in generating a self-signed certificate, while thinking they had succeeded. In the validation task, 71% of the participants misconfigured or omitted the inclusion of the root trust store of the operating system, even though they were explicitly instructed to consider those roots as trusted.
Documentation, such as manuals, tutorials, or Q&A forums, appears to matter a lot. The majority of participants used both online sources and manual pages to solve the task. The Stack Overflow forums were a repeatedly used resource (73% of our participants used it), but it’s not the only one. It seems that any well-written tutorial can be widely used: in our task, the most visited tutorial page was a semi-random page in the knowledge base of the University of Wisconsin (40% of the participants), just because it covered one of the tasks well and scored high in search results. The importance of tutorials becomes even more prominent when we realize that even developers tend to copy-paste the suggested commands without further adjustments (in our study, only 9% of the participants altered the copy-pasted command).2
Understanding and trusting certificates
Our follow-up experiment at DevConf.CZ 2018 investigated how much developers trust flawed TLS certificates. Participants were put in a scenario of improving the conference website to allow registration using federated identities. However, the connection to authentication servers failed with certificate validation errors. We then asked the participants to investigate the issue, assess the connection’s trustworthiness on a given scale, and describe the problem in their own words.
The results clearly show that trust decisions are not binary. Even IT professionals do not entirely refuse a certificate just because its validation check fails. In the case of an expired certificate, the expiry duration plays an important role: certificates expired yesterday were mostly considered as “looking OK.” In contrast, a certificate expired two weeks ago “looks suspicious,” and the one expired a year ago seems “outright untrustworthy.” Certificates of different subjects were regarded differently: flaws were less likely to be tolerated for big, established companies.
Even more importantly, some certificate cases were overtrusted. For example, 21% of the participants considered the self-signed certificate as “looking OK” or better, and 20% saw the certificate with violated name constraints as “looking OK” or better. The mean trust in both cases was comparable to that of an expired certificate. We find this quite concerning: the self-signed certificate does not have any identity assurances (literally anyone could have created it), and name constraints violation hints at misconfiguration or even malicious activity at the subauthority level.
In the spirit of positive change, we were curious to find if better error messages and documentation would improve understanding and trust perception. For our next step, half of the participants interacted with the real OpenSSL errors and the other half with our redesigned version. Seeing our reworded errors and documentation, both self-signed and name-constrained cases seemed significantly less trustworthy and required less time and less online browsing to understand. These results confirm one more time that usable documentation is a crucial part of software design.
When investigating how to direct programmers to a useful documentation source right from the error, we experimentally included a documentation URL directly into the CLI error message. To our surprise, 71% of the participants clicked this link. Unusual as it is, it suggests a viable way of directing developers to a helpful resource recommended by the library designers.3
Usable errors and documentation
In light of our research results, we decided to make certificate validation errors and corresponding documentation more usable. Currently, there are many different libraries used for handling TLS connections and validating certificates. Plurality is welcome, but the differences in these tools complicate knowledge transfer and transitioning the project from one library to another. In the long term, we aim to simplify and unify the ecosystem by standardizing the validation errors and providing reliable developer-tested documentation. Our work in progress is already available at https://x509errors.org.
First, we are mapping the landscape of certificate validation errors in multiple libraries, starting with OpenSSL (openssl.org), GnuTLS (gnutls.org), Botan (botan.randombit.net), and mbedTLS (tls.mbed.org). Their errors vary vastly in number, granularity, and documentation. To ease debugging for software developers, we started generating and publishing example certificates exhibiting every individual error. As of now, we have 34 errors covered by automatically generated certificates for public use in software development.
Second, we are trying to identify the corresponding errors in different libraries. For example, a certificate with the aforementioned OpenSSL error ‘X509_V_ERR_PERMITTED_VIOLATION’ will get a ‘CERT_SIGNER_CONSTRAINTS_FAILURE’ in GnuTLS and a rather general ‘X509_BADCERT_NOT_TRUSTED’ in mbedTLS.
Third, seeing all the errors and the corresponding pieces of documentation in one place will enable us to design a unified taxonomy of certificate flaws. To be able to add reliable documentation to this taxonomy, an active discussion with developers is needed. In January 2020, we conducted another study with participants of DevConf to look into the matter. We designed alternative documentation for three errors and asked the IT professionals for feedback regarding its content and structure. After we finish analyzing the results, we aim to propose a draft of the new documentation.
Making the world a bit more usable
Usability is, in general, difficult to achieve in systems as complex as TLS. Furthermore, the upstream changes are complicated by the need for compatibility preservation. Nevertheless, we propose at least the smaller, ready-to-adopt changes. In cooperation with the OpenSSL developer community, we have already got upstream several patches regarding documentation and plan for more in the future.
Notes:
1 Many other examples of usability flaws of SSL APIs can be found in the article “The Most Dangerous Code in the World: Validating SSL Certificates in Non-Browser Software,” by M. Georgiev et al., published in the Proceedings of the 2012 ACM Conference on Computer and Communications Security.
2 Research from this section was published in the paper “Why Johnny the Developer Can’t Work with Public Key Certificates” at the RSA Conference 2018 Cryptographer’s track.
3 Research from this section was published in the paper “Will You Trust This TLS Certificate? Perceptions of People Working in IT” at the Annual Computer Security Applications Conference (ACSAC) in 2019
Acknowledgements:
Thanks are due to Vashek Matyáš for supervision, Pavel Žáčik for his dedicated work, Red Hatters Jan Pazdziora and Nikos Mavrogiannopoulos for the initial support, and Milan Brož for long-term facilitation of the academic cooperation of Red Hat Czech and the Faculty of Informatics at Masaryk University.
More detailed results of this research project can be found on the Red Hat Research website at https://red.ht/3dxBnpa.