Red Hat Research Quarterly
Clouds on the horizon: shared cloud computing resources make research more accessible and more powerful
Red Hat Research Quarterly
Clouds on the horizon: shared cloud computing resources make research more accessible and more powerful
Dr. Michael Zink is Professor of Electrical and Computer Engineering at the University of Massachusetts, Amherst. In addition to publishing and teaching, Dr. Zink has participated in several projects providing distributed systems and virtual networks for research and education, including GENI and ExoGENI (2007-2021), Cloud Lab (2014-2021), and now the Open Cloud Testbed (OCT) since […]
Article featured in
Dr. Michael Zink is Professor of Electrical and Computer Engineering at the University of Massachusetts, Amherst. In addition to publishing and teaching, Dr. Zink has participated in several projects providing distributed systems and virtual networks for research and education, including GENI and ExoGENI (2007-2021), Cloud Lab (2014-2021), and now the Open Cloud Testbed (OCT) since 2019. The OCT, a collaboration among researchers from Boston University, Northeastern University, and UMass Amherst, was recently awarded $5 million from the National Science Foundation (NSF) to develop a testbed for new cloud computing platforms, combining research and production cloud capabilities in shared testbeds, as well as new features such as programmable FPGAs for cloud developers.
Heidi Picher Dempsey and Dr. Zink are long-time colleagues who originally met working on the NSF GENI project. She interviewed Dr. Zink about his research and teaching career, the promise of making computing more accessible and efficient, and the social impact of science.
Heidi Picher Dempsey: I want to start with your interest in sensors and your secret history as an electrical engineer, which some may not know. How did you start there, and how did that grow into cloud-related projects?
Michael Zink: After graduating from high school, not knowing what to do, I thought electrical engineering seemed like fun, so I started as an undergrad in electrical engineering. Actually, all my degrees are in electrical engineering. But electrical engineering and computer science are not so far apart. For example, when I was an undergrad in Darmstadt, Germany, we had a lab where we operated a radar—I think that was my first foray into sensors. When I graduated with my PhD, I looked for a postdoc position. I got one here at UMass, in the NSF Engineering Research Center for Collaborative Adaptive Sensing of the Atmosphere (CASA) that does low atmospheric weather observations.
We built a closed-loop weather observation system. At that time, I had worked on multimedia streaming for my PhD, so I got a lot more expertise in distributed systems and networking, but I also understood the engineering side. I was considered someone who fits in nicely because I knew about atmospheric sensors, and I knew how data was shipped around and processed. That was a beautiful experience for me because I learned to work with many principal scientists.
That’s what I still do today: make those connections between scientists’ work and their compute needs. I just came back from lunch with two professors from the physics department speaking about the cloud and using ESI (Elastic Secure Infrastructure). The BU folks and the UMass folks are providing an upgrade for the ATLAS NET2 node, which does the data processing and distribution for the Large Hadron Collider in Switzerland. The idea is that we will use ESI to make this hardware available not only for their compute needs, but for other researchers also.
Heidi Picher Dempsey: Let’s back up and explain ESI for those unfamiliar with it. ESI is one of the collaborative projects on the OCT. It aims to make it possible for datacenters and research groups to share bare metal machines. As demand fluctuates in different research centers, you can share resources. You won’t find researchers running out of compute time, memory, or network to do their work, which unfortunately happens now.
Michael Zink: Exactly.
Learn more about the ESI project in Gagan Kumar’s article “The elastic bare metal cloud is here,” RHRQ 3:3 (November 2021).
Heidi Picher Dempsey: OK—back to your work in radar. You looked at the amount of land you could cover with one of these radars, and you had to guess where to put them or move them based on the data you were getting. Are you still looking at those kinds of problems? How involved do you stay in that research while you’re starting all this other work?
Michael Zink: The new weather-observation area I’ve been looking into is drones. Unpiloted aerial vehicles are becoming more and more popular. What we hear from people working in this environment is, “When there’s weather, we don’t fly.” I’m not an economist, but I think that’s not a good business model. We’ve been spending a lot of time in path planning based on weather observations with our systems. We’re working with folks from the business school to discover if, when you have certain stochastic information, you can carry out a flight successfully.
Information has to become available much more quickly because drones often fly shorter distances. You don’t have the luxury of flying large diversions, as you can with a Boeing 747. We’ve been looking into that issue, until recently, using a lot of the ExoGENI infrastructure. Now we are looking into the Chameleon testbed, an OpenStack-based community testbed hosted by the University of Chicago and the Texas Advanced Computing Center at UT Austin. Especially with Chameleon at the edge, we hope to see how edge-to-cloud architectures, with the compute resources provided on that spectrum, can make these applications possible.
Heidi Picher Dempsey: Edge computing has gotten a lot of attention recently, but you’ve been working on it for a very long time.
Michael Zink: We were forced to do this with radars because we had high data rates. We were compressing this data then sending it from a remote radar to a location where we could do more processing. Now, with the Internet of Things (IoT) and connected autonomous devices, we see that some processing has to happen at the edge, because there’s no way you can ship all this data back to a central location. That will be even more important in the future.
Heidi Picher Dempsey: You mentioned ExoGENI and the NSF GENI program, which started more than ten years ago. So you had the experience of being involved in a project at the beginning and building an infrastructure that didn’t yet exist, to provide high data processing rates, very flexible networking, high throughput, and dynamic composition. All of these things are now in the commercial cloud. With the recent shutdown of ExoGENI, you’ve seen the end of an infrastructure project, but all of the research topics have not been solved. How does it feel to spend that many years on a research effort but then not be able to finish?
Michael Zink: That’s life, I guess. <laughs>
Heidi Picher Dempsey: Philosophy in computer science!
Michael Zink: In most sciences, right? You always have very high goals, which is good because that motivates people. If we achieve all these goals, that’s more than perfect—we will not always be able to do that. But if many good discoveries come out along the path, that’s wonderful.
For me, it has been an incredible experience, being there from the beginning, seeing these crazy ideas. Some didn’t materialize, but others, which you wouldn’t think were possible, did. And the human network we built around it may be the biggest outcome. The research is important, but so are the students who had the experience. If I look at my students who are now at Apple or Akamai, bringing their expertise back in these companies—that are not necessarily cloud companies, but they’re using all the same approaches—that’s an incredible impact. That’s just two examples from maybe thousands of students who went through that project at the time. To a certain extent, that’s why we see some of these technologies now as commercial offerings.
Heidi Picher Dempsey: And if you’re coming up with questions that generate more questions, you’re almost guaranteed never to get to the end, if it’s a nice rich project with a lot of meat to it. A lot of the value is in the questions, which is hard to get across to new students.
Michael Zink: Oh, yes. Sometimes it’s hard to keep students focused on their initial question because, once they get going, they have so many more questions. Sometimes you have to say, “You have to answer at least one question before you go to the next one.”
Heidi Picher Dempsey: That brings us to another thing I wanted to ask you about. I’ve always admired your teaching style and how you get students working on real systems. As people are experimenting with infrastructure that gets more and more complex, have you found ways of dealing with that complexity and getting people started more easily? Or is it still one-on-one mentoring that makes the difference?
Michael Zink: Thank you—I wish all my students would’ve listened to that first part! That’s a struggle. There’s so much knowledge about certain software, systems, and technologies that we have accumulated over time. Even if they are the smartest people in the world, students can have a hard time catching up. The approach I take is to limit the set of tools they need. It’s not totally under my control, but if possible, I’d have them use a single testbed throughout their career as a graduate student. If they use five testbeds, they have to learn all the details about all five and spend all their time on that.
But what do you do when things change, like ExoGENI going away? The funding agencies have to help by ensuring that the new testbeds are not radically new. This is a little bit of an oxymoron, right? We always want to have the latest and greatest.
But from a grad student perspective, some continuity is good. We all have to be a little bit creative. For example, FABRIC is a new large-scale research infrastructure that some see as a follow-up to GENI. Could there have been virtualized GENI slices in FABRIC that work like the GENI technology we had in the past? If we had done that, students would have had an easier life, and they could have transitioned over time.
Heidi Picher Dempsey: Something that might help, too, is if the tool they interface with stays the same. You are one of the early users of Jupyter Notebooks to help students interact with the infrastructure. Do you think that kind of tool is helpful and likely to stay around?
Michael Zink: I’ve often been proven wrong, so I hate to make predictions, but these tools are amazing. At UMass, we run a cluster that has Jupyter Notebook as a front end. You can put an undergrad in front of that cluster and have them use multiple GPUs, whatever they want, to do computing they could never do in the past.
This is a unifying thing. I teach a sophomore class right now and if they need more resources than they have on their laptops, they can use Jupyter Notebook and do stuff there. They can go on our cluster. That is a standard that becomes more and more important. We see people from the social sciences now using that cluster and using Jupyter Notebook because it’s more accessible. People who you’d never think of using a cluster ten years ago are using it because it comes with an interface that makes it much easier to use.
Heidi Picher Dempsey: That’s a really exciting trend. We’re starting to see that in Boston University projects, too: We can reach out beyond computer science, math, and physics to people in other fields who can use the compute power. Do you think this will make students less interested in Linux in the long run because it’s now further down the stack?
Michael Zink: I hope not, because we still need those students!
I hope it will just cause more curiosity. I’m sure there are already high schools using Jupyter Notebooks. After getting them in touch with this technology, maybe there’s five or ten percent who want to know what the underlying mechanisms are and get into Unix and Linux, even down to architecture in some cases.
Heidi Picher Dempsey: You can send them all our way!
Michael Zink: No, no, no! I need them myself.
Heidi Picher Dempsey: Let’s talk about your recent award from NSF for the OCT project. Why do you think that project’s important, what do you think attracted NSFs interest, and what do you hope to do?
Michael Zink: We have to make clear that this is for cloud computing research, right?
The sharing aspect is vital. We cannot afford to give every researcher their own testbed.
A researcher with a need for compute resources can often use those on campus, or they can get it through the three big providers—that’s perfect. But those are closed environments. We don’t know the inner workings of Amazon cloud or Google cloud, for example. We want to provide testbeds that allow researchers to run their own software, all the way down to the operating system, and have much more freedom on the networking side. In some instances, we also have technology that’s not been made openly available. For example, the field-programmable gate arrays (FPGAs) we put in the OCT are something we see a need for in the research community.
The sharing aspect is vital. We cannot afford to give every researcher their own testbed. My goal is to make a testbed available where researchers can go down to the hardware and, without damaging anything, do what they need to do to perform the research, have control, and make it reproducible. This is a kind of stigma we have in our sciences compared to the natural sciences: it’s hard to repeat an experiment. OCT is providing these mechanisms and helping the community to do the work they want to do.
Heidi Picher Dempsey: When you’re trying to give people as much flexibility as possible, how do you also consider security?
Michael Zink: For a research testbed, you have to have a certain level of freedom, but you don’t want to end up on the front page of the Boston Globe because someone used it to mine Bitcoins. One way to achieve that is through authentication and identity management. The community, with the help of Internet2 and industry, has done a great job working on this.
I think we have reasonable mechanisms in place. Thanks to the work Orran Krieger did at BU, this is on our minds. If we share, for example, bare metal servers, how can we make sure a new researcher gets an uncompromised system? We’ve been forced to think about this already with the FPGAs. If you reboot the host, you don’t necessarily reboot the FPGA, so you might leave stuff behind for the next user. We’re working hard on ensuring that every time a new user comes on, we’ve scrubbed the FPGAs, so there’s nothing that either shouldn’t be exposed to someone else or could cause any problems. That’s important because if people don’t feel a certain level of security is provided, they won’t use it. They need to feel confident.
Another struggle is preserving the data. You’re working hard on an experiment, collecting all this data, then suddenly it’s gone. You have to provide mechanisms or at least awareness about where the data resides, what you have to do to secure data, because that upsets people a lot.
Heidi Picher Dempsey: Exactly—and upset is a nice word some days! You and I have both lost data. It’s a delicate balancing act to allow people to change the things they need to change, to explore things and ask new questions and answer them, but still make sure that all the people sharing the testbed don’t stomp on each other.
Michael Zink: It comes back to education. When students use the testbed for the first time, we have to tell them that their actions can have consequences. They need to think a little bit before pushing a button and shutting down the internet.
Heidi Picher Dempsey: One of the things we are trying to do in the Mass Open Cloud is to make it possible for people to collect and share data about how the systems are working—basically, open telemetry. Is that one of your goals for the OCT too?
Michael Zink: One thing we did already for the Cloud Lab project is measuring the power consumption of systems. Datacenters use an incredible amount of resources. This has a societal impact, for example contributing to climate change. We have to continue to make computing more power efficient. By making data available, when a researcher runs an experiment, they can say, “If I run it this way, I know I use so many kilo or megawatt hours. If I run it that way, I use thirty percent less.”
We want to make this available in the OCT also, and it’s relatively straightforward to implement. We don’t have much influence over how computation is happening, but we can observe and make data available as much as possible.
Datacenters use an incredible amount of resources. This has a societal impact, for example contributing to climate change. We have to continue to make computing more power efficient.
Heidi Picher Dempsey: How do you deal with recording the hundreds of variables that might be affecting that data at the time that it ran? For example, the versions of all of the pieces of the stack that were running at the time
Michael Zink: It’s a nightmare!
Heidi Picher Dempsey: Are you going to solve a nightmare for us?
Michael Zink: This has been a topic forever. It’s easy to say, “This server uses so much power at this instance.” OK, but what was it running? What operating system? What CPU? How do you collect all this information? That’s a tough problem that—I’m sorry, don’t tell anyone, but I’m not going to solve it.
Heidi Picher Dempsey: Well, we have to leave a few challenges for everyone else.
Michael Zink: Yes. Let’s let someone else do that!
Heidi Picher Dempsey: I agree, though, especially when you include the application levels of the stack, where the data is private, it’s incredibly challenging.
Michael Zink: I would add one small thing: It’s important to think about the minimum set of information I can live with to do my analysis. We often think the other way: I’ll take as much data as I can. That’s not good. Other sciences work so hard on measuring just one variable. We should be smart and consider when less is more. Getting the data is always easy; maintaining it and making sense of it is the biggest problem.
Heidi Picher Dempsey: I have a fun fact about you: you’re one of the few computer scientists I know who’s been in a commercial.
Michael Zink: What?!
Heidi Picher Dempsey: There were commercials advertising UMass, and they included a view of what your students were doing and a few words from you. You don’t remember this at all?
Michael Zink: I don’t remember this! I’m so sorry.
Heidi Picher Dempsey: Well, we can see that Professor Zink is completely dedicated to his academic goals and cares not for the rest of the world.
But you do have to promote your work to attract students. I used to work at the Woods Hole Oceanographic Institute, and Bob Ballard, a famous scientist at Woods Hole, had a unique point of view on this. He said, “I love science. I’ve dedicated my life to science. But I don’t get a kid sitting in their bedroom excited about science by discussing it on an abstract level.” So he made a great effort to make the underwater remotely operated vehicles something students could interact with. Think of how huge of a challenge that was in the 1990s. Making sure to capture the imagination of others was just as important as the science he was doing.
Michael Zink: For a big child like me, that’s super exciting. In the Massachusetts Green High Performance Computing Center (MGHPCC), our mission is to provide compute resources for scientists coming up with new discoveries. For example, today I met with the guys in physics who crash particles into each other to make discoveries all the time. There are structural biologists who have crazy microscopes to see molecule structures.
You can find shipwrecks at the bottom of the ocean at depths that used to be unimaginable, find new forms of life. That’s the essence of it. The Resource Reservation Protocol (RSVP) work I did many moons ago was researching allocation to stream video in a better way. Now we have Netflix and Disney+ and all this streaming content. It’s for the greater good, I hope, because it entertains a lot of people.
Heidi Picher Dempsey: It’s like we built the matchstick. You need that other person to light it.
Michael Zink: That leads to another important point. I’m talking to two women for this interview today [RHRQ editor Shaun Strohmer also attended the interview], and that’s awesome. But the demography of our field is not that diverse. Studies have shown, and we hear on campus a lot, especially from minorities, that students want to contribute something for their peers.
If they become engineers, they want to be able to do something that benefits their community as well. We often miss that in our message. Yes, it’s important to figure out how black holes are composed, but that can be too abstract. It’s more important that people in a food desert can find a store with good, affordable food.
The work we do provides tools and resources that have a far-ranging societal impact. That can only grow as we can make them available to more types of researchers and more communities.
SHARE THIS ARTICLE
More like this
Václav Matyáš, Professor with the Centre for Research on Cryptography and Security at the Faculty of Informatics at Masaryk University.
Sanjay Arora is a data scientist at Red Hat and a member of the Greater Boston Research Interest Group with particular interests in AI and machine learning. For RHRQ he interviewed Kate Saenko, a faculty member at Boston University and consulting professor for the MIT-IBM Watson AI Lab, about managing bias in machine learning datasets and the problems that remain unsolved.
We spoke about the importance of data sharing and privacy preservation, in both scientific and computer technology domains, with James Honaker and Mercè Crosas, two of Harvard’s leaders in these fields.
Research Director and RIG leader for Israel Idan Levi speaks with Anat Bremler-Barr, Professor in the School of Computer Science and Vice Dean of the Efi Arazi School of Computer Science at the Interdisciplinary Center, Herzliya, Israel (IDC).
We invited Red Hat Principal Kernel Engineer Toke Høiland-Jørgensen to interview Anna Brunström, currently a Full Professor and Research Manager for the Distributed Systems and Communications Research Group at Karlstad University, Sweden. Prof. Brunström has a background in distributed systems, but her main area of work over the last years has been in computer networking. Their wide-ranging conversation covers programmable networking, open data, diversity in IT fields, and more.
Red Hat Research University Program Manager Matej Hrušovský interviewed Barbora Buhnová, Associate Professor and Vice Dean for industrial partners at Masaryk University, Faculty of Informatics in Brno, Czech Republic. She is also the chair of the Association of Industrial Partners of Masaryk University, Faculty of Informatics, and is a co-founding and governing board member of […]
RHRQ asked Professor Ayse Coskun of the Electrical and Computer Engineering Department at Boston University to sit down for an interview with Red Hatter Marcel Hild. Professor Coskun is one of the Principal Investigators on the project AI for Cloud Ops, which recently won a $1 million Red Hat Collaboratory Research Incubation Award. Their conversation […]
RHRQ asked Brno research manager Matej Hrušovský and Red Hat quality assurance engineer Pavel Tišnovský to talk with long-time collaborator Tomáš Černý, a native of the Czech Republic now teaching at Baylor University in Waco, Texas. Prof. Černý was in Brno recently as part of his highly successful student research initiative, which brings Baylor students […]
Security researcher and professor Daniel Gruss is an internationally known authority on security vulnerabilities. Among the exploits he’s discovered with his research team are the Meltdown and Spectre bugs, and their software patch for Meltdown is now integrated into every operating system. Frequent collaborator Martin Schwarzl, a PhD student in Daniel’s CoreSec group at Graz […]