Red Hat Research Quarterly

Making machine learning accessible across disciplines

Marek Grác

Marek Grác operates at the intersection of open source development, academic research, and AI regulation. As a member of Red Hat Research and academia, he pushes the boundaries of what AI can and should do by investigating challenges of AI security, including SBOM/AIBOM.

Article featured in

Red Hat Research Quarterly

November 2021

Download PDF

Subscribe now

Machine learning has been driving research breakthroughs in many fields. Now there is an open source curriculum designed to help non-specialists build the skills they need to use it.

Machine learning is an increasingly important competency in a growing number of fields. Biochemists are using it to create models for protein engineering. Economists are using it to predict the price of cryptocurrencies. And linguists have applied it to the problem of fake news. Computer science is not traditional domain knowledge for these disciplines, but massive data processing is often a requirement for cutting-edge research in natural sciences, social sciences, and the humanities.

That is why we created the Applied Machine Learning course at Masaryk University, in Brno, Czech Republic: to help students outside the field of computer science learn more about machine learning without making them into computer scientists first.

Focusing on what students need to know

The field of machine learning is changing on a daily basis. Keeping up with trends is difficult, even for computer science professionals. Interested laypeople have two basic choices. They can join one of many massive online courses, but they will have support only from peers for guidance. They will likely also be learning in isolation, separate from their local community. For those who are still students, they can join one of the courses provided by their university’s computer science department. There is a catch, however. University courses are good for learning about theory and background. Modern technologies and tools, however, often develop too quickly to be added to syllabi while they are still new. In the context of machine learning, “modern” might be just a few months old.

We designed the Applied Machine Learning course to fill the gap between a theoretical and a practical approach. We wanted to show students that they can work with machine learning without the ability to rewrite the complex methods currently used by software professionals. For the majority of learners, understanding basic concepts and being able to use machine learning methods is enough for their needs.

Our target audience for the course is not students of computer science but future experts in different disciplines. To advance in their fields of study, it is important for these students to understand the capabilities of machine learning and be able to communicate with machine learning experts. Because machine learning is about correlation, not causation, domain knowledge helps us to distinguish between them correctly. In our first run, we were able to attract students from computational linguistics, computer science, chemistry, biology, and economics. In fact, the use cases mentioned above were taken directly from these students’ projects.

Advantages of an interdisciplinary classroom

Putting all of these students together created a special environment. Our online, synchronous sessions were interactive, in small groups that were randomly shuffled before each task. By working in frequently changing groups, students were forced to talk differently so students from other disciplinary backgrounds could understand them, whether they were talking about machine learning or their domain expertise. For their projects, we put students in groups of three, and each group purposely contained students from at least two different disciplines. Because the students had different backgrounds and goals, they could bring different views to the problem they were solving.

We wanted to give students an opportunity to learn new things without getting into a routine of simply opening an editor and coding something, so we had students creating annotation guidelines, fixing bugs in existing source code, trying to create better rules for simple games, creating chatbots, or trying to figure out which parameters might influence whether a person will get a mortgage. The different types of tasks allowed students to play multiple roles, so they were not experts all the time. They needed to help each other in order to understand and complete the tasks.

Slides from the applied machine learning course illustrate a process of working with data from chocolate bar ratings. — ^{Student task slides}

Practicing routine coding and gaining knowledge of the libraries (e.g., Scikit, PyTorch, Keras) are an important part of the course, but those skills do not have to be exercised in groups. Thanks to our partner, DataCamp, we were able to give students access to relevant routine exercises. These exercises allow students who don’t fully understand the science of how coding works to learn the craft of writing in a programming language. And students made good use of them—on average they have spent more than 40 hours on this platform.

This course environment, along with COVID constraints, led to an approach where the teacher was more curator than creator. We put together a set of videos for relevant topics from existing resources, including not only prestigious universities but also people from industry or even amateur science bloggers. Thanks to this approach, every week was presented by a different speaker, which helped us avoid the stereotypical instructor-centered—or “sage on the stage”—type of class.

Opening the course to any university

Our pilot group had only fifteen students, but it is not efficient to create such complex scenarios for a small group if there will be major changes every year. That is one reason why we open sourced all our materials on GitHub, with links to videos, our group tasks, and a methodological explanation of why they were selected. And, of course, feedback from students should lead to improvements in future courses.

We would like to open our “franchise” to other students, so if you believe that your local university might be interested, we will be happy to support them and work together on these study materials to make them better, just as we have done with open source software for so long.

Acknowledgements

I would like to thank Z. Hladká and D. Hlaváčková, who offered to let us create such a course at the Faculty of Arts. We would also like to thank Hugh Brock, who supported us from Red Hat Research. It would not be possible without their belief in our idea.

SHARE THIS ARTICLE

Efficient runtime verification for the Linux kernel

Daniel Bristot de Oliveira

If safety-critical systems fail, they can cause significant damage, including loss of life. In this article we consider methods to verify their behavior in production.

Feature

Optimizing Kubernetes service selection

Daniel Bachar

Is there a way to implement load balancing in multicluster environments that won’t increase resource usage? New research suggests the answer is yes. Multicloud providers and microservice-based applications across clouds are becoming increasingly popular. Organizations that use them enjoy the benefits of high availability, performance improvements, and cost effectiveness. However, as microservices communicate with each […]

Feature

Don’t blame the developers: making security usable for IT professionals

Martin Ukrop

Historically, usability studies have looked mostly at end users, doing focus groups or user testing with customers or the general public. This process often neglected developers, system administrators, and other IT professionals and the systems they use day to day.

Feature

Yuga: A tool to help Rust developers write unsafe code more safely

Sanjay Arora

Baishakhi Ray

Vikram Nitin

Some bugs in unsafe Rust arise from errors that are so easy to make that they are easily overlooked. Researchers have developed a new analyzer to find them. By Vikram Nitin, Anne Mulhern, Baishakhi Ray, and Sanjay Arora Rust, a programming language that did not exist just 10 years ago, is now well known and […]

Feature

Look to the Horizon: Europe’s increased focus on funding open source research is creating new opportunities

Luis Tomás Bolivar

Carlos Camacho

Josh Salomon

Three principal software engineers and sought-after research collaborators share their insights on this critical EU innovation incubator. In February 2021, the European Union launched Horizon Europe, the next phase in its flagship Framework Programme for Research and Innovation. Horizon Europe, which will fund research from 2021 to 2027, was created to drive innovation, research, and […]

Feature

Demystifying real-time Linux scheduling latency

Daniel Bristot de Oliveira

This is the third of a series of three articles about the formal analysis and verification of the real-time Linux® kernel. Read the first article in RHRQ 2:3 and the second article in RHRQ 2:4.

Feature

A thread model for the real-time Linux kernel

Daniel Bristot de Oliveira

The recent advances in AI and telecommunications are enabling a new set of complex cyber-physical systems, including those for safety-critical applications.

Feature

Mental models: Qualitative research to design for Red Hat OpenShift users

Carl Pearson

Brian Dellascio

Sarahjane Clark

To design effectively for our users, we need to learn more about them. If we don’t, we may make a product that our users can’t be efficient in, or worse, a product that our users have no need for in the first place.

Feature

Open source authentication exposed: how open source developers perceive user authentication

Agáta Kružíková

Ensuring security in open source software starts before a line of code is written. What role should communities and developers play? Open source projects are used in commercial products by many companies, from Microsoft and Google to Red Hat. The developers behind these projects and their user accounts are the first element in the supply […]