At Red Hat Research, we hire creative, passionate students ready to work and learn with a global leader in open source solutions. Our interns bring fresh ideas and new connections to challenging problems in the open source community, unlocking their own potential while contributing to the innovative power of open development.
This month, we highlight the work of Christina Xu, a 2023 graduate of Boston University with a BA in statistics and a minor in computer science. Her work at Red Hat Research has focused on data science, machine learning, and differential privacy. In May 2023, Christina was part of the team presenting the workshop AI/ML Intelligent Applications at the Edge at the Red Hat Summit in Boston. We spoke with her about her interest in large language models (LLMs), why open source is the best space for AI/ML research, and why action is better than perfection.
Check out Christina’s Red Hat Research people profile to see her final presentation, “Ask Project Nexodus Docs / Project Aspen.” Her presentation is in two parts: a demonstration on how LLMs can be leveraged to answer questions based on Project Nexodus customer documentation assets, and analyzing the health of open source projects according to bus factor: the minimum number of contributors a project can lose before it stalls. You can also find some of Christina’s work in her GitHub repository and see more of her background on LinkedIn.
What have you been working on during your time as a Red Hat Research intern?
Over the summer, I led Ask Project Nexodus Docs, an exploratory research project that leveraged LLMs for question-answering to improve the customer and engineer troubleshooting experience for Project Nexodus documentation. I experimented with extractive, abstractive, and generative question-answering strategies, as well as fine-tuning a flan-t5-base model with LoRA (Low Rank Approximation), with the goal of understanding the capabilities and limitations of LLMs. For my final deliverable, I developed a web application via Streamlit to automate the process of having a user input a question and the selected strategy output an answer.
Additionally, I contributed to Project Aspen, which aims to enable open source stakeholders and participants to make data-driven community and business decisions by measuring the health of open source projects. I implemented and developed metrics to better understand the risk to a project should the most active people leave.
What makes you interested in open source research?
Open source research makes better machine learning models. Not only does it democratize the AI/ML space, but it also helps mitigate bias that could potentially lead to detrimental consequences for users. We rarely have access to the training data of proprietary models. While proprietary models aren’t inherently harmful, the historical data they are trained on can be potentially biased towards a particular group or demographic. As a result, the model can potentially become a feedback loop that worsens discrimination. Open source models address this problem because they are publicly accessible and because they are often released alongside their source code and publicly available training data. While they are not guaranteed to be unbiased, users have the ability to audit them for bias.
Who were your mentors? How have your project mentors helped you?
I’ve had several mentors at Red Hat, all of whom I admire and am incredibly grateful for. Last summer, when I worked on an anomaly detection/prediction project, Audrey Guidera (Senior Principal Software Engineer) was my primary mentor. Despite not having any prior experience in databases, I successfully developed connection-pooling methods with her guidance. She’s so inspiring—not only is she a woman in a traditionally male-dominated field, but she paves the way for other women to succeed by leading the Red Hat Women in Data Science group.
This summer, Sanjay Arora (Senior Principal Software Engineer) mentored me on Ask Project Nexodus Docs. Every time I talk to him, I feel so inspired yet defeated at the same time. It reminds me that I have so much more to learn. He is incredibly knowledgeable about AI/ML. If one looks inside his brain, I imagine they’ll find a universe. He’s also offered me a lot of guidance throughout my project. There were periods when I felt overwhelmed with the different avenues I could pursue on top of keeping up to date with the most relevant research on LLMs. Sanjay helped me to clarify goals as well as the action steps needed to achieve them.
What are your longer-term career and research goals, and how will this internship help you?
I want to make better machine learning models with the goal of protecting user privacy. This internship has helped me better understand the risks of using LLMs, as they can hallucinate or produce nonsensical or incorrect outputs. You can imagine how harmful the consequences would be in a business decision or healthcare context, which is why I want to develop algorithms that can detect and/or predict anomalous outputs. I’m happy to say that I’ll be continuing my internship with Red Hat into December to undertake this research.
What advice would you give to a new Red Hat Research intern?
Have a bias towards action. I think this is especially important in research—when you’re attempting to solve novel problems, it’s really easy to get caught up trying to devise the perfect roadmap to a solution. However, research rarely goes according to plan, which is why action is better than perfection. When facing challenges such as an experiment gone awry, take responsibility, reframe it as a learning experience, and find solutions which can include asking your mentor(s) or others for help.
Embrace a generalist and diverse mindset. The most brilliant people I have met in both academia and industry are multidisciplinary. Make the most of your internship experience by taking on tasks and challenges outside your comfort zone, job description, or school major.
You are not an imposter. Sometimes you’ll be working in teams where you are the go-to expert on a particular subject matter. You did not fool everyone into thinking you are smarter than you actually are. Rather, you are there because your team believes in your capabilities and expertise. But keep in mind that it’s impossible to know everything, even if it is your area of expertise, so embrace being a life-long learner.
What are your talents or hobbies?
Here are some things we can nerd out about: AI ethics, philosophy, literature, and visual culture. I’ve also been getting back into running lately. It’s such a great hack for my mental health and preventing burnout.