Understanding user perception and behavior is often neglected in open source software (OSS) security. Jakub Suchánek, a student of the Faculty of Informatics at Masaryk University, collaborated with Red Hat Research on a project investigating authentication in public open source projects. In this blog post, Jakub explains how he utilized data from Red Hat’s Project Aspen (Sandiego) to analyze user behaviors and trends when contributing to open source projects, uncovering insights into behavior that could lead to security breaches.
What are the motivation and goals of this research project?
The research is focused on authentication in public open source projects used by commercial companies as a source for their internally maintained repositories. Rather than focusing on security purely through cryptography, it seeks to analyze users‘ perceptions and behavior on GitHub, the largest open source code platform. It attempts to bridge the gap between what is theoretically secure and what provides usable mechanisms for users to behave securely. Since it’s pointless to have a cryptographically secure system in which every user chooses the password “password,” users‘ behavior has consequences on the security of projects. GitHub accounts have been a target of malicious attackers, such as the 2022 Dropbox Hack, for quite some time. GitHub accounts are lucrative targets for gaining access to private repositories; however, attackers can also inject malicious code into public repositories through a trusted account, possibly with permissions in said repository.
How did you approach this problem?
There have been two main approaches: the first is through objective data, and the second is through subjective responses from GitHub users. For the objective approach, we have utilized the data collected by Project Aspen (Sandiego), a Red Hat Open Source Program Office (OSPO) project, built on the dataset generated from the Augur Project.
[Learn more about Project Aspen and the Augur Project in “Measuring open source success: developing analysis for actionable insights,” RHRQ 4:4, Feb. 2023.]
Since the dataset does not contain all GitHub projects, we decided to focus on Red Hat-affiliated projects, meaning either projects with contributors who have a Red Hat email or projects with Red Hat publicly listed as a company. While this approach did not allow us to analyze users’ perceptions or usage of authentication as such, it allowed us to analyze users’ behaviors and trends when contributing to open source projects.
Can you give us a sneak peek into the results?
I would love to. As mentioned, when it comes to the more objective approach, we utilized the Augur dataset, which at the time was at an earlier stage of development. At the prototyping stage, we mostly focused on visualizing trends in open source projects. Since we could only focus on publicly available information, we could not analyze the main objective, which is user authentication and security of open source projects. However, we could analyze other related information. For example, we have looked at the distribution of contributors within projects, which showed an exponential distribution.
Another example is the rate of pull requests (PRs) made by members (or the owner) of a project compared to nonmember contributors. The following graph shows the ratio of PRs made by members of the given project. We can see that there are a lot of projects with only member PRs, however there is a lot of variety. There are also some projects without any PRs made by project members. These, however, could also be projects that allow directly pushing commits to the project without making a pull request.
Can you give an example of what this might mean?
In the example of number of contributions per contributor in a project, a certain ratio could suggest that there might be people who have rights in the project but have only used them to contribute a few times. These accounts may have unnecessary rights, which could lead to a security breach. In the event of a compromised account, the project would be compromised as well.
In the second example, I don’t believe there is a clear interpretation of the data. It does, however, serve the purpose of giving insight into the behavior in a variety of projects. In the earlier stages of this research, gaining insight was the main goal.
You also mentioned a second approach: investigate developers’ attitudes. Could you elaborate on that?
Yes, the other approach was about users’ perceptions and behavior rather than objective statistics. We focused on GitHub as the platform of choice, as most developers are familiar with and have an account on GitHub. For that, we have conducted a survey of 110 participants at DevConf.CZ 2023, an in-person open source conference. Since the data gathering was conducted at DevConf, the study sample consisted of IT professionals, who have been shown to have a more significant impact than regular users. The survey was focused on two-factor authentication, which GitHub was about to start enforcing around that time. Users’ perception of this enforcement was mostly positive. One of the goals of the study was to find out how users’ perception affects their behavior and security-related decisions. A paper on the study is forthcoming at the 2024 TrustBus conference, so if you’re interested in more details, make sure to check it out.
How did you get to participate in this research?
I took part in the Competition for Talented Students organized by the Association of Industrial Partners at the Faculty of Informatics, Masaryk University. The competition allows students to participate in research projects throughout the university and with its industrial partners and to receive scholarships for their efforts. There were several possible tasks to solve, each tied to a position that first-year students had a chance to compete for. I wrote an essay on the importance of open source, since it has been my interest for a long time, and I have contributed to some open source projects. After the essay passed the first round, I had an interview with members of the CRoCS laboratory and we talked about research. I was chosen for the research position focused on authentication in open source projects. While the research topic itself was discovered by pure chance, it aligned with my interests and became a great fit.