Red Hat Research Quarterly

Where will we find the data scientists?

Red Hat Research Quarterly

Where will we find the data scientists?

about the author

Jennifer Wood

 Jen Wood is the manager of the Centre for Doctoral Training in Cloud Computing for Big Data at Newcastle University. She is also the University Liaison Manager for Newcastle University’s partnership with the Alan Turing Institute in London. She has been working in data science for over six years and has a master’s degree in Culture and Difference from Durham University.

Article featured in

Universities play a primary role in developing data skills, but traditional education alone can’t close the skills gap fast enough.

The mismatch between the widespread need for strong data skills and the current workforce is an obstacle for nearly every sector of the economy, which means no single sector can solve it. Collaborative partnerships among universities, businesses, and government agencies have the potential to make the changes needed to bring data science into the mainstream. In this article, I’ll describe how the data science program at Newcastle University has worked with government and industry support to find multiple solutions for the data skills gap.

Addressing the data science shortage

Vast amounts of data are now collected across all areas of academia, research, industry, and the public sector. The challenge has shifted from collecting data to extracting value from it. 

Everyone needs certain skills to succeed in a data-rich environment, starting with basic data literacy, and the shortage of these skills threatens the ability of businesses and even countries to drive innovation to create economic growth. The World Economic Forum 2020 report found that only 61% of the working-age population in the United Kingdom has digital skills. The UK’s skills gap runs across the entire learning journey, from early years to senior positions in academia and industry. 

The challenge has shifted from collecting data to extracting value from it. 

The COVID-19 pandemic highlighted this data skills gap by revealing our reliance on technology. While this reliance created problems for workers who lacked needed skills, it also created the impetus to accelerate digital transformations and demonstrated how technology can lead to success. It’s not controversial to say that developing data skills is fundamental to a thriving economy. Industry can’t close this gap alone. Universities have a critical role to play, in partnership with both industry and government. The Newcastle Data community (part of Newcastle University) was formed in part to respond to these challenges. It focuses on three main goals:

  • Using data to transform research across the university
  • Training the next generation of leaders in data science
  • Exchanging data expertise with those outside the university, for the benefit of society, the economy, and the university’s own research and teaching

The Newcastle Data community coordinates work across research, teaching, and engagement to create a virtuous circle in which research and work with external organizations—from startups to government and public sector organizations to FTSE 500 companies—keep our teaching up to date. These connections generate a pipeline of talent that feeds both academic research and external organizations.

A cohort model helps create a talent pipeline

PhD research can be a solitary experience—quite different from how real-world practice usually works. The Centre for Doctoral Training (CDT) model established in the United Kingdom was explicitly designed to encourage collaborative, interdisciplinary research that addresses global problems, supported by both government and industry funding. The Centre for Doctoral Training (CDT) in Cloud Computing for Big Data at Newcastle works with partners to realize this goal as well. Rather than funding single studentships, the CDT funds between eight to twelve students per year and trains them as a cohort. Funding initially allowed us to recruit five cohorts, but thanks to generous industrial sponsors, we’ve been able to recruit eight cohorts. We now have a dynamic community for both students and staff that prevents learning from being a passive and isolated experience. 

Students and industry representatives meet in the new Catalyst space

Solving real problems with data requires people with deep knowledge and skills in both computing science and statistics. Practical experience in cloud computing and handling real data sets is also required. Traditional single-subject programs don’t provide graduates with this combination of skills. The great advantage of CDTs is that they encourage interdisciplinary work, which gives us the freedom to design our CDT to target these areas.

Students gain a great deal academically from this collaborative approach. Working on real-world problems gives an authentic purpose for their research and opens it to a broader audience. Because they are co-supervised by colleagues in computing, maths, statistics, and other disciplines, they get expertise and perspective from fields other than their own. A cross-disciplinary view enhances students’ research in many ways, enabling creativity and innovation but also helping them understand the limits of their knowledge.  

Throughout their teaching and research, students share a dedicated office space that enables them to take advantage of the range of backgrounds, knowledge, and skills from across all cohorts. Students also undertake several modules that strongly emphasize group projects, working on a current problem for one of our industry partners. This program design paves the way for future collaborations and working partnerships, and it helps the students develop robust conflict management and relationship-building skills. 

It’s not just about technical skills

The CDT also provides the flexibility for universities to help students build business leadership and entrepreneurial skills—again, something that is not common in standard graduate research programs. These abilities are much needed. The recent Quantifying the UK Data Skills Gap report identified that around a quarter of businesses said graduates who work with data need to develop their leadership and communication skills. 

We designed our CDT with this in mind. When we began in 2014, we wanted to develop a program that would produce future leaders in data analytics. This requires not just technical knowledge but also the ability to generate and pursue new business opportunities, either through start-ups or in existing companies. 

The Newcastle Helix serves as a hub for collaborative
public-private research in data science.

Universities are seeing a demand for graduates with a skill set that includes core professional skills such as critical analysis, communication, and creativity. We developed a successful collaboration between Newcastle University, the National Innovation Centre for Data (NICD), and AkzoNobel (a Dutch multinational company) on a data-driven innovation module. The module equips students with commercial awareness around the use of data and AI through a ten-day incubator where students are immersed in an industry setting, collaborating on real business problems. Using the business model canvas (an entrepreneurial technique) as the foundation for problem solving, the students work through several iterations of a solution to offer fresh, creative perspectives on a traditional company.

At the end of the two weeks, students pitched their solutions to stakeholders within AkzoNobel. According to Mo Chowdhury, AkzoNobel Innovation Incubator Project Lead, “The dedicated and sprint-like mentality provided us with business models that would have taken much longer to produce. Each idea was truly transformative.” Given the potential in using entrepreneurial techniques to foster innovation, this program has also been adapted to suit undergraduate students from non-technical programs.

Widening participation 

The skills gap creates excellent opportunities to diversify and support students from groups historically underrepresented in tech. Over 70% of the 1.5 million roles at risk of automation—including artificial intelligence and future technologies—are held by women. The inequitable impact of COVID-19 on women and Black, Asian, and minority ethnic (BAME) communities has slowed progress on diversity and inclusion in all sectors. Meanwhile, progression into postgraduate training is as low as 11% for Black students and 8.4% for disabled students. 

To strive toward correcting this, Newcastle used funding from the UK government’s Department for Digital, Culture, Media, and Sport (DCMS) and the Office for AI (via the Office for Students) for a project to widen participation in data science and AI. The project includes forty-five Master of Science scholarships for historically underrepresented groups in the field, with a focus on female, Black, and registered disabled students, students from POLAR Q1 and Q2 (a UK measure of educational participation by locale), care leavers (i.e., a person who spent time in foster or residential care), estranged students, Gypsy/Roma/traveler students, refugees, children from military families, veterans, and partners of military personnel. Dr. Matt Forshaw, a senior lecturer in data science at Newcastle, used what we’ve learned from combining skills in computing and statistics and developed the suite of MSc courses in data science.

Unlocking the potential of the current workforce

Universities play a vital role in creating a talent pipeline of graduates, but with 80% of 2030’s workforce already in employment, we need different solutions for short-term change. Reskilling and upskilling the existing workforce is imperative. One difficulty is the widespread belief that there is only one pathway to working in fields like data science. A 2020 Europe-wide YouGov survey (commissioned by Red Hat) highlighted the misconception that only those with data-related qualifications can pursue a career in data or tech. 

Bridging the skills gap among people already employed in other fields requires industry, academia, and government to build new pathways and make them achievable. In the United Kingdom, for example, an Apprenticeship Levy creates funds to support employer-based apprenticeships that teach employees of any age and career stage new skills, from data literacy to data analysis and AI/machine learning. 

One of the successful strategies developed at Newcastle is supporting organizations by helping their existing workforce gain the skills and knowledge that has traditionally been the purview of academic experts. The NICD is capitalizing on the new Catalyst facility, located in the Newcastle Helix, which was specifically designed to bring together researchers and businesses to share the wealth of skills and knowledge currently locked within universities. 

Bridging the skills gap among people already employed in other fields requires industry, academia, and government to build new pathways and make them achievable.

A technical team from the NICD, including several CDT PhD and data science MSc graduates, works alongside organizations facing data science challenges, addressing specific needs or data problems. Unlike a traditional consultancy, the NICD technical team works both to find tangible data-driven solutions for clients and to upskill employees of clients’ organizations. As a result, the organization’s workforce will be able to tackle the next data project themselves. 

The future of data skills training

A great deal remains to be done to solve the skills gap in the United Kingdom and elsewhere, but partnerships like those created via Newcastle’s data science program will play an essential role in meeting the need for data skills at all levels. We’ve seen the benefit of a cohort model and, with the development of the Europe RIG, we expect our relationships with partners like Red Hat and other businesses to grow. 

Mark Little, who is both Vice President of Middleware Engineering at Red Hat and a visiting professor at Newcastle University, leads the Research Centre at Newcastle. Recently named a Fellow of the Royal Academy of Engineering, Professor Little points to a history of success achieved by the joined forces of academic, industry, and government: “Red Hat and Newcastle University have worked together for many years with a track record of successes including five-star rated EU projects, PhDs, upstream open source projects that have been adopted by various companies and academic institutions, and creating new leaders in R&D for Red Hat and other organizations. As hybrid cloud, edge/IoT, and data science research opportunities continue to grow, it is these kinds of successes which we should build upon and strengthen our partnership.”


More like this