Student research spotlight: Jakub Suchánek studies authentication in public open source repositories

Jun 5, 2024 | Blog

Understanding user perception and behavior is often neglected in open source software (OSS) security. Jakub Suchánek, a student of the Faculty of Informatics at Masaryk University, collaborated with Red Hat Research on a project investigating authentication in public open source projects. In this blog post, Jakub explains how he utilized data from Red Hat’s Project Aspen (Sandiego) to analyze user behaviors and trends when contributing to open source projects, uncovering insights into behavior that could lead to security breaches.

What are the motivation and goals of this research project?

The research is focused on authentication in public open source projects used by commercial companies as a source for their internally maintained repositories. Rather than focusing on security purely through cryptography, it seeks to analyze users‘ perceptions and behavior on GitHub, the largest open source code platform. It attempts to bridge the gap between what is theoretically secure and what provides usable mechanisms for users to behave securely. Since it’s pointless to have a cryptographically secure system in which every user chooses the password “password,” users‘ behavior has consequences on the security of projects. GitHub accounts have been a target of malicious attackers, such as the 2022 Dropbox Hack, for quite some time. GitHub accounts are lucrative targets for gaining access to private repositories; however, attackers can also inject malicious code into public repositories through a trusted account, possibly with permissions in said repository.

How did you approach this problem?

There have been two main approaches: the first is through objective data, and the second is through subjective responses from GitHub users. For the objective approach, we have utilized the data collected by Project Aspen (Sandiego), a Red Hat Open Source Program Office (OSPO) project, built on the dataset generated from the Augur Project

[Learn more about Project Aspen and the Augur Project in “Measuring open source success: developing analysis for actionable insights,” RHRQ 4:4, Feb. 2023.]

Since the dataset does not contain all GitHub projects, we decided to focus on Red Hat-affiliated projects, meaning either projects with contributors who have a Red Hat email or projects with Red Hat publicly listed as a company. While this approach did not allow us to analyze users’ perceptions or usage of authentication as such, it allowed us to analyze users’ behaviors and trends when contributing to open source projects.

Can you give us a sneak peek into the results?

I would love to. As mentioned, when it comes to the more objective approach, we utilized the Augur dataset, which at the time was at an earlier stage of development. At the prototyping stage, we mostly focused on visualizing trends in open source projects. Since we could only focus on publicly available information, we could not analyze the main objective, which is user authentication and security of open source projects. However, we could analyze other related information. For example, we have looked at the distribution of contributors within projects, which showed an exponential distribution.

Another example is the rate of pull requests (PRs) made by members (or the owner) of a project compared to nonmember contributors. The following graph shows the ratio of PRs made by members of the given project. We can see that there are a lot of projects with only member PRs, however there is a lot of variety. There are also some projects without any PRs made by project members. These, however, could also be projects that allow directly pushing commits to the project without making a pull request.

Can you give an example of what this might mean?

In the example of number of contributions per contributor in a project, a certain ratio could suggest that there might be people who have rights in the project but have only used them to contribute a few times. These accounts may have unnecessary rights, which could lead to a security breach. In the event of a compromised account, the project would be compromised as well.

In the second example, I don’t believe there is a clear interpretation of the data. It does, however, serve the purpose of giving insight into the behavior in a variety of projects. In the earlier stages of this research, gaining insight was the main goal. 

You also mentioned a second approach: investigate developers’ attitudes. Could you elaborate on that?

Yes, the other approach was about users’ perceptions and behavior rather than objective statistics. We focused on GitHub as the platform of choice, as most developers are familiar with and have an account on GitHub. For that, we have conducted a survey of 110 participants at DevConf.CZ 2023, an in-person open source conference. Since the data gathering was conducted at DevConf, the study sample consisted of IT professionals, who have been shown to have a more significant impact than regular users. The survey was focused on two-factor authentication, which GitHub was about to start enforcing around that time. Users’ perception of this enforcement was mostly positive. One of the goals of the study was to find out how users’ perception affects their behavior and security-related decisions. A paper on the study is forthcoming at the 2024 TrustBus conference, so if you’re interested in more details, make sure to check it out. 

How did you get to participate in this research?

I took part in the Competition for Talented Students organized by the Association of Industrial Partners at the Faculty of Informatics, Masaryk University. The competition allows students to participate in research projects throughout the university and with its industrial partners and to receive scholarships for their efforts. There were several possible tasks to solve, each tied to a position that first-year students had a chance to compete for. I wrote an essay on the importance of open source, since it has been my interest for a long time, and I have contributed to some open source projects. After the essay passed the first round, I had an interview with members of the CRoCS laboratory and we talked about research. I was chosen for the research position focused on authentication in open source projects. While the research topic itself was discovered by pure chance, it aligned with my interests and became a great fit.

Related Stories

Intern spotlight: Eric Munson builds guitars and Unikernel Linux

Intern spotlight: Eric Munson builds guitars and Unikernel Linux

PhD interns at Red Hat Research’s partner universities play a pivotal role in bringing together the cutting-edge thinking of research institutions with the real-world expertise of industry. The PhD program enables long-term research partnerships that provide greater...

Correctness in distributed systems: the case of jgroups-raft

Correctness in distributed systems: the case of jgroups-raft

By José Bolina Building distributed systems is complex work, but strong primitives with well-defined guarantees and an expected behavior can make it easier. With stronger guarantees in primitives come strong safety and correctness verification requirements. In some...

Hackathons power open source technology and innovative research

Hackathons power open source technology and innovative research

By Chris Tate, Principal Software Engineer, Red Hat Christopher Tate is a lead software engineer for logging, metrics, alerts, and AI/ML smart data research projects in the New England Research Cloud (NERC) environment. He is also the creator of the Smart Village...

Intern Spotlight: Christina Xu, Red Hat Research Boston

Intern Spotlight: Christina Xu, Red Hat Research Boston

At Red Hat Research, we hire creative, passionate students ready to work and learn with a global leader in open source solutions. Our interns bring fresh ideas and new connections to challenging problems in the open source community, unlocking their own potential...

Intern Spotlight: Jake Correnti, Red Hat Research Boston

Intern Spotlight: Jake Correnti, Red Hat Research Boston

At Red Hat Research, we hire creative, passionate students ready to work and learn with a global leader in open source solutions. Our interns bring fresh ideas and new connections to challenging problems in the open source community, unlocking their own potential...

Getting started with data science and machine learning

Getting started with data science and machine learning

Data science has exploded in popularity (and sometimes, hype) in recent years. This has led to an increased interest in learning the subject. With so many possible directions, it can be hard to know where to start. This blog post is here to help.