Intern Spotlight: Christina Xu, Red Hat Research Boston

Sep 20, 2023 | Blog, Internship, North America

At Red Hat Research, we hire creative, passionate students ready to work and learn with a global leader in open source solutions. Our interns bring fresh ideas and new connections to challenging problems in the open source community, unlocking their own potential while contributing to the innovative power of open development.

This month, we highlight the work of Christina Xu, a 2023 graduate of Boston University with a BA in statistics and a minor in computer science. Her work at Red Hat Research has focused on data science, machine learning, and differential privacy. In May 2023, Christina was part of the team presenting the workshop AI/ML Intelligent Applications at the Edge at the Red Hat Summit in Boston. We spoke with her about her interest in large language models (LLMs), why open source is the best space for AI/ML research, and why action is better than perfection.

Check out Christina’s Red Hat Research people profile to see her final presentation, “Ask Project Nexodus Docs / Project Aspen.” Her presentation is in two parts: a demonstration on how LLMs can be leveraged to answer questions based on Project Nexodus customer documentation assets, and analyzing the health of open source projects according to bus factor: the minimum number of contributors a project can lose before it stalls. You can also find some of Christina’s work in her GitHub repository and see more of her background on LinkedIn.

What have you been working on during your time as a Red Hat Research intern?

Over the summer, I led Ask Project Nexodus Docs, an exploratory research project that leveraged LLMs for question-answering to improve the customer and engineer troubleshooting experience for Project Nexodus documentation. I experimented with extractive, abstractive, and generative question-answering strategies, as well as fine-tuning a flan-t5-base model with LoRA (Low Rank Approximation), with the goal of understanding the capabilities and limitations of LLMs. For my final deliverable, I developed a web application via Streamlit to automate the process of having a user input a question and the selected strategy output an answer.

Additionally, I contributed to Project Aspen, which aims to enable open source stakeholders and participants to make data-driven community and business decisions by measuring the health of open source projects. I implemented and developed metrics to better understand the risk to a project should the most active people leave.

What makes you interested in open source research?

Open source research makes better machine learning models. Not only does it democratize the AI/ML space, but it also helps mitigate bias that could potentially lead to detrimental consequences for users. We rarely have access to the training data of proprietary models. While proprietary models aren’t inherently harmful, the historical data they are trained on can be potentially biased towards a particular group or demographic. As a result, the model can potentially become a feedback loop that worsens discrimination. Open source models address this problem because they are publicly accessible and because they are often released alongside their source code and publicly available training data. While they are not guaranteed to be unbiased, users have the ability to audit them for bias.

Who were your mentors? How have your project mentors helped you?

I’ve had several mentors at Red Hat, all of whom I admire and am incredibly grateful for. Last summer, when I worked on an anomaly detection/prediction project, Audrey Guidera (Senior Principal Software Engineer) was my primary mentor. Despite not having any prior experience in databases, I successfully developed connection-pooling methods with her guidance. She’s so inspiring—not only is she a woman in a traditionally male-dominated field, but she paves the way for other women to succeed by leading the Red Hat Women in Data Science group.

This summer, Sanjay Arora (Senior Principal Software Engineer) mentored me on Ask Project Nexodus Docs. Every time I talk to him, I feel so inspired yet defeated at the same time. It reminds me that I have so much more to learn. He is incredibly knowledgeable about AI/ML. If one looks inside his brain, I imagine they’ll find a universe. He’s also offered me a lot of guidance throughout my project. There were periods when I felt overwhelmed with the different avenues I could pursue on top of keeping up to date with the most relevant research on LLMs. Sanjay helped me to clarify goals as well as the action steps needed to achieve them.

What are your longer-term career and research goals, and how will this internship help you?

I want to make better machine learning models with the goal of protecting user privacy. This internship has helped me better understand the risks of using LLMs, as they can hallucinate or produce nonsensical or incorrect outputs. You can imagine how harmful the consequences would be in a business decision or healthcare context, which is why I want to develop algorithms that can detect and/or predict anomalous outputs. I’m happy to say that I’ll be continuing my internship with Red Hat into December to undertake this research.

What advice would you give to a new Red Hat Research intern?

Have a bias towards action. I think this is especially important in research—when you’re attempting to solve novel problems, it’s really easy to get caught up trying to devise the perfect roadmap to a solution. However, research rarely goes according to plan, which is why action is better than perfection. When facing challenges such as an experiment gone awry, take responsibility, reframe it as a learning experience, and find solutions which can include asking your mentor(s) or others for help.

Embrace a generalist and diverse mindset. The most brilliant people I have met in both academia and industry are multidisciplinary. Make the most of your internship experience by taking on tasks and challenges outside your comfort zone, job description, or school major.

You are not an imposter. Sometimes you’ll be working in teams where you are the go-to expert on a particular subject matter. You did not fool everyone into thinking you are smarter than you actually are. Rather, you are there because your team believes in your capabilities and expertise. But keep in mind that it’s impossible to know everything, even if it is your area of expertise, so embrace being a life-long learner.

What are your talents or hobbies?

Here are some things we can nerd out about: AI ethics, philosophy, literature, and visual culture. I’ve also been getting back into running lately. It’s such a great hack for my mental health and preventing burnout.

blog

Intern spotlight: Eric Munson builds guitars and Unikernel Linux

PhD interns at Red Hat Research’s partner universities play a pivotal role in bringing together the cutting-edge thinking of research institutions with the real-world expertise of industry. The PhD program enables long-term research partnerships that provide greater...

Correctness in distributed systems: the case of jgroups-raft

By José Bolina Building distributed systems is complex work, but strong primitives with well-defined guarantees and an expected behavior can make it easier. With stronger guarantees in primitives come strong safety and correctness verification requirements. In some...

Kernel Development Learning Pipeline program brings Linux to college students

By Joel Savitz The operating system is at the center of open source innovation, but a surprising number of college students lack exposure to this domain and, in particular, lack comfort with the Linux kernel. As a result, there’s an industry-wide shortage of qualified...

Co-design research lab accelerates innovation in non-traditional and specialized hardware

By Ahmed Sanaullah In 2023, Red Hat Research announced the launch of the Co-Design (CoDes) research lab during the Massachusetts Open Cloud (MOC) Alliance Workshop. Our goal was to build an ecosystem that could deliver on the immense value proposition of...

An Open vSwitch security feature causes a security problem. Here’s how to prevent it.

By Vašek Šraier Vašek Šraier is a software engineer at Guardsquare working on the security analysis tool AppSweep. He completed his Master's thesis, "Performance of Open vSwitch-based Kubernetes Cluster in Pathological Cases," at Charles University in Prague under the...

Intern Spotlight: Red Hat course helps students unleash the power of Git

University partnerships fuel the generation of new ideas and opportunities in open source research. In addition to developing research collaborations and assisting with student theses, the Red Hat Research team facilitates teaching opportunities for our engineers....

Intern Spotlight: Jake Correnti, Red Hat Research Boston

Getting started with data science and machine learning

Data science has exploded in popularity (and sometimes, hype) in recent years. This has led to an increased interest in learning the subject. With so many possible directions, it can be hard to know where to start. This blog post is here to help.

The (open) source of cutting-edge innovation

by Gordon Haff, technology advocate at Red Hat Where do people come together to make cutting-edge invention and innovation happen? One possible answer is the corporate research lab. More long-term focused than most company product development efforts, corporate labs...