Red Hat Research Quarterly

Future vision: on the internet, technopanic, and the limits of AI

Jason Schlessman

Jason Schlessman is a Principal Software Engineer at Red Hat Research, focused on novel artificial intelligence and machine learning innovations that lead to pragmatic and feasible solutions. He especially targets projects that serve the well-being of humanity, fostering ethical uses of technology. He can be found online @ EldritchJS.

Walter Scheirer

Related Projects

Disinformation Detection at Scale

Article featured in

Red Hat Research Quarterly

February 2024

Download PDF

Subscribe now

In this issue

From the Director

Investing in open source research

Hugh Brock

News

2024 Collaboratory awards promote innovation in the cloud

Interview

Future vision: on the internet, technopanic, and the limits of AI

Jason Schlessman

Feature

Passive network monitoring with eBPF

Simon Sundberg

Simone Ferlin-Reiter

Toke Høiland-Jørgensen

Anna Brunstrom

Feature

QUBIP and the transition to post-quantum cryptography

Gordon Haff

Feature

Anchored keys: scaling of in-memory storage for serverless data analytics

Tristan Tarrant

Column

Focus on artificial intelligence and machine learning | February 2024

Sanjay Arora

Everyone has an opinion on misinformation and AI these days, but few are as qualified to share it as computer vision expert and technology ethicist Walter Scheirer. Scheirer is the Dennis O. Doughty Collegiate Associate Professor of Computer Science and Engineering at the University of Notre Dame and a faculty affiliate of Notre Dame’s Technology Ethics Center. In December 2023, he published A History of Fake Things on the Internet (Stanford UP), an exploration of the history of “fake news” and the technical advances that make new forms of deception possible. Professor Scheirer is also the longtime friend and research partner of Red Hat engineer Jason Schlessman. Together, they worked on the Red Hat Research-supported project “Disinformation detection at scale,” which used image forensics and machine learning tools to develop a toolkit for identifying image manipulation in a scalable way. Schlessman interviewed Professor Scheirer for RHRQ about his work in computing, where ethics come into play, and why AI and the internet aren’t the bad guys.

Jason Schlessman: I want to start with technology ethics, which is highly relevant in our industry. Could you talk about your work in that space?

Walter Scheirer: I think we can agree that the use of technology has generated all sorts of dilemmas. For example, the internet has become pervasive and opened up a lot of interesting avenues, but it’s opened up some dangerous ones as well. That drives a lot of the discussion around fake stuff on the internet, which is the major topic of my book. Some of that fake stuff is problematic. But the trouble I see in technology ethics is everybody just wants to talk about it through a partisan political lens, and you have to think more broadly. It’s a much broader topic than just politics.

Jason Schlessman: Historically, it seems like the focus on ethical usage has had to do with data. Would you agree?

Walter Scheirer: That’s a big driving issue. There are lots of legitimate things to be concerned with: Who’s collecting this data? What is the harvesting process? How is that data being used? Is it being sold without the user knowing?

Jason Schlessman: As you point out in your book, the ethical use of data was not as heavily focused on until around 2016, but people in our industry knew data was being mined and used well before that. In the past year or two, it seems there’s been a push to focus on AI as the ethical problem.

Walter Scheirer: These things are not mutually exclusive, especially when we think about contemporary AI. Machine learning requires data. We’re moving to this paradigm of programming with data, solving problems we can’t solve with procedural programming. But where did the data come from? Do the users know their stuff is ending up in chat GPT or whatever big model is out there?

Jason Schlessman: That’s a good point, but I’m wondering whether people are just focused on AI as a concept, so these important issues are getting overshadowed by a general panic.

Walter Scheirer: Sure, one way to read this is that the panic is a coordinated strategy by a handful of powerful tech companies to distract people from real problems. That’s a narrative out there in the news, whether or not it’s true. There’s a huge degree of information asymmetry: it requires a lot of technical knowledge to understand how AI works these days, and an average user is not going to have that level of understanding. That’s a problem.

A lot of companies rely on data harvesting to make money, and if that’s your business, of course you want to protect it. But is that in the public’s best interest? These are big questions serious people in technology ethics are asking.

Jason Schlessman: You mentioned the problem of political partisanship in technology, but does legality get into this, with regulations and legislation?

Walter Scheirer: I get this every time I do an interview for my book: what do we do about regulations? But when we’re talking about fake things on the internet, regulations are not feasible. That may seem surprising, but think about what this really is asking for. It’s asking for control of speech on the internet. In the United States, specifically, there are strong guarantees for freedom of speech. I don’t see a feasible path, going through the government. There’s talk of antitrust maneuvers against big platforms, and that’s probably more feasible because those regulations already exist.

There’s a huge degree of information asymmetry… that’s a problem.

Jason Schlessman: What about openness? Is there talk about the openness of generative AI and LLM models, for example, with respect to provenance and stewardship? I would think ethically, it just makes sense to have things as transparent as possible.

Walter Scheirer: There is a vigorous debate right now about the open source aspect of this. Some companies are very much in favor of openness. For instance, Meta has been vocal about the need for open foundation models. The AI Alliance, which Notre Dame and Red Hat are part of, along with Meta and several other businesses and universities, is also trying to move towards a more open source environment. Others don’t want to release as much, and they argue that these things are their intellectual property. There will be some interesting court cases in the near future to try to sort this out.

Jason Schlessman: What made things click for me with open source was understanding how important observability and traceability can be. If you can rebuild an entire application, you can know at any given point what’s happening in that application. It’s the same with open source models: I could make this model if I had money to train it.

Walter Scheirer: That’s a good point too—the scale of hardware required. Even if you had all the pieces, you can’t, in many cases, actually replicate the setup. Some organizations are trying to address this, like EleutherAI. On the other hand, for example in the computer vision community, there’s an emphasis on small models. How far can you get using something you can train on a single GPU? There’s a lot of interest in that for low-power, smaller-compute applications. There’s this myth that you need enormous cloud infrastructure to train one foundation model to solve these problems. I’m seeing a growing number of papers showing that’s not the case. In some cases, you only need a fraction of those parameters to get the job done, and it’s gratuitous to go beyond that.

Jason Schlessman: Does your work in ethics look into this idea of carbon footprint? Is it ethical to impact the environment even further by having my enormous datacenter churning for months to make this model that might be outdated six weeks after it’s released?

Walter Scheirer: I don’t think anybody’s got a good handle on what that footprint really is. When you look at the carbon footprint of different industries, the datacenter is tiny. The big problems are cars and aircraft and power plants, which just dwarf datacenters. I think the good outweighs the bad in this case. That said, I could be completely wrong because no one knows how big these datacenters are. No one knows how big Amazon’s or Google’s cloud infrastructure is. These things are trade secrets, so even trying to guess is difficult.

The interdisciplinary advantage

Jason Schlessman: We’ll get back to your book, but first let’s talk about how you got into this field. What got you interested in working in technology and computation?

Walter Scheirer: It’s always been a hobby, since middle school.

Jason Schlessman: It’s worth pointing out that when you say “middle school hobby,” you’re talking about trying to teach yourself networking. When I was in middle school, I was just playing video games.

Walter Scheirer: The web was very new and websites about technology were pretty thin in terms of their technical concepts. My mom would take me to this technical bookstore, and the store people would eye me suspiciously— why does this kid want these books on TCP IP networking?

You could also go online to try to ask people, but they were horrible. Back then in tech, there was kind of an aggressive culture where you had to prove yourself before somebody would help you. All of this has become more accessible with Stack Overflow and now even large language models.

Jason Schlessman: Is it safe to say this exploration led you to working with Linux?

Walter Scheirer: Linux was on my radar right away. I knew from reading all of this material that Unix was the workhorse of the internet—that was the serious operating system. Windows was popular on the desktop, but if you really wanted to understand the internet, you needed Unix. And Linux was free. You didn’t need a fancy Unix workstation to run it; you could install it on your PC.

Jason Schlessman: Since then, what areas have you worked in, and what are you focusing on now?

Walter Scheirer: When I started, I was a systems person interested in networking security. Then I was thinking about how to combine that with machine learning and computer vision, so I drifted into human biometrics. That led to more questions about core computer vision research areas like object recognition and scene analysis segmentation. Then I was interested in understanding biological vision, but for AI. Basically, could you build biologically inspired algorithms that correspond more closely with what happens in the brain? My lab is still doing a bit of that work, especially models that incorporate elements of visual psychophysics.

I was also interested in media forensics as an application—combining security and stuff from systems and visual problems. When I came to Notre Dame, I was doing all of this plus getting into the history of technology and technology ethics. It’s a lot of stuff, but it all fits together.

*Computer vision tools can help make old texts searchable for scholars.*

Jason Schlessman: You left out one thing: your work in digital humanities.

Walter Scheirer: How could I forget? My work also has applications in classics, medieval studies, and text analysis for Latin poetry. How do you find allusions across different texts? If you have high-resolution digital images of old documents, how do you transcribe them into Unicode? That would be really useful for scholars so they can do searches, and it’s a challenging computer vision problem. It allows us to bring to bear a lot of state-of-the-art tools. You’re not going to see that kind of work in many places—there’s no money in it—so it’s an interesting niche.

Jason Schlessman: And your undergraduate background is as much in the humanities as in computer science.

Walter Scheirer: I was interested in international relations, things like war and politics.

Jason Schlessman: Did that humanities background shape your research? Your book has a lot of references to classics and poetry.

A lot of people in computing think they have a solution to all the world’s problems, but they don’t have a good understanding of even how to ask questions in another discipline.

Walter Scheirer: Absolutely. As an undergrad, I did a liberal arts degree, and computer science was a secondary major in a liberal arts course of study. I took several philosophy courses, which shaped my thinking. In digital humanities, a lot of times somebody wants to do computing, but they just don’t have the background. But the opposite tends to be more problematic. A lot of people in computing think they have a solution to all the world’s problems, but they don’t have a good understanding of even how to ask questions in another discipline. Because of my background, I’ve been able to work across these boundaries and do it credibly. I can speak the language of the various disciplines I’m working with because I have a bit more training than the average computer scientist.

I’m seeing this becoming a bit more common: at Notre Dame, we have a new BA degree in computer science and engineering that requires you to pick up another discipline as a focus area. I’ve advised students who have been doing computer science and English, for example. To solve some of the big problems out there these days, you need training in two areas—not superficial training, but very deep training.

Don’t fear the fake

Jason Schlessman: Your most recent book is called A History of Fake Things on the Internet. How does that come out of the work we’ve talked about so far today?

Walter Scheirer: The book talks about specific technologies implicated in fakery in some way. AI is a key theme, and it talks a bit about the dilemmas we’re facing on social media and elsewhere related to misinformation. I try to treat this in the most realistic way possible. Coming back to my interdisciplinary background: I can bring to bear the tools of a number of different fields in terms of analysis. Some of the chapters are ethnographies: I interviewed some of the most interesting and strange people out there, especially from the early internet period, people who were faking the news, computer hackers, people involved in digital art, people involved in the early days of media forensics, to understand their thinking at that time and how that corresponds to our thinking today.

*Popular and creative internet memes from* A History of Fake Things on the Internet

The major idea in the book is that the internet is filled with fake things, but that’s mostly good. The internet is a creative space and was created to be that. That’s why people like it. But there’s this mainstream media narrative from the 90s that the internet is this “information superhighway,” or that it’s a database of facts that got polluted with all this fake stuff.

Jason Schlessman: We both know that, even going back to the BBS (Bulletin Board System) days, text files of fabricated information were being traded. This has been going on for a while. You say it took a flood of real-life content on social media to put the field of media forensics into the spotlight. Could that be generalized to say it took a flood of people being inconvenienced for media forensics to come into the spotlight?

Walter Scheirer: Definitely. The history of the internet is fascinating. Those of us in tech have been told a story that in the late 60s the US government, through the Defense Advanced Research Projects Agency (DARPA), created this distributed network to withstand a nuclear assault from the Soviet Union. That was the purpose of the internet. Then large tech companies realized it was useful for commerce. It would be useful for education. That’s where you get this idea of the information superhighway in the 90s, during the handoff from the defense world to the corporate world.

But that is not really where the internet came from. If you go back a bit further, you have the ideas of the media theorist Marshall McLuhan, a famous professor at the University of Toronto associated with the 60s counterculture. He’s talking about information networks in a radically different way, saying this is going to create a global village and let the users of this infrastructure project their imaginations to other users.

He anticipates all these creative software tools we have now, everything from facial filters and tools like Photoshop that rework visual information to generative AI and tools like Midjourney that allow us to hallucinate interesting scenes by combining visual information in novel and surrealistic ways. That’s the internet McLuhan is telling engineers to build. And if you look at who built the internet, you have these hippie figures at Bell Labs working on Unix interested in multi-user operating systems because they bring people together. There’s this communal aspect to the internet completely missed in the conventional narrative.

The internet is filled with fake things, but that’s mostly good.

By the time you get to the so-called information superhighway era of the internet, engineers are still thinking about McLuhan. In the first issue of Wired magazine, the editors dub McLuhan the patron saint of the internet. As you pointed out, as soon as computer networks became popular, there were fake text files and computer hackers telling stories through these technical manuals. But along the way, it wasn’t all fun storytelling and culture building. You see malicious actors moving in, and when something goes wrong, that’s when media forensics feels relevant.

Jason Schlessman: You say in the book that we continue to blame technology for longstanding social problems instead of confronting the unethical behavior that nourishes them. For example, Chat GPT made generative AI more accessible to a large number of users. You wouldn’t want to starve that creativity. You want to starve the bad intent that’s under the surface of that particular field and has been for a while.

Walter Scheirer: This is a complicated issue. A lot of social problems in the physical world have simply moved over to the internet. I mentioned political polarization: that’s not small, and technology is being used to perpetuate it. A lot of people don’t want to confront the root issue because they can’t seem to resist arguing about politics on the internet.

I have been writing about dialogue as a response to this. This is an idea going back to Plato: instead of having a heated debate about a controversial topic, we talk through it in a reasonable way, bringing diverse ideas that are perhaps in conflict with one another but not arguing about them—instead, trying to understand them. What we’ve done is develop a framework for this type of dialogue on the internet specifically. I have another book coming out later this year called Virtue and Virtual Spaces that makes this case.

Jason Schlessman: My favorite part of the book was where you talked about a study on predicting the future of climate change. Could you discuss that?

Walter Scheirer: A chapter of the book talks about a generative AI model that will give you visual depictions of the future. It’s designed to predict what a certain geographical location will look like 50 years in the future after climate change has taken effect. This seems kind of scientific and useful on its surface. We know climate change is a problem. We know there are lots of sophisticated models to predict what the climate is going to look like. But then the reasonable person steps back and says wait a minute, this oracle is showing me the far away future. Is that really possible? I interrogate this idea and look at this long-standing human belief in predicting the future and why in nearly all cases that is impossible. We’d all be rich and successful if we could do that.

*A simulation of a possible climate change outcome*

Jason Schlessman: That was also the section with my favorite quote from the book: “Ascientific work is being cast as science.” How can we address this?

Walter Scheirer: What’s interesting about this specific AI model is it came out of Yoshua Bengio’s lab. Bengio is a Turing Award winner, one of the three fathers of deep learning, and one of the most-cited researchers in computer science. Yet a bit of work coming out of his lab is not scientific in any way. We put a lot of faith in experts, but experts get things wrong from time to time. It’s a classic problem in thinking: when you’ve achieved a great success rate, it’s not hard to fool yourself into thinking you can solve an impossible problem. This is not the only case of that kind of error, but it was worth writing about because it is well situated in a long-standing misunderstanding about what is possible and impossible.

Building on open source

Jason Schlessman: That gets back to the open source movement. We can all benefit by being open to criticism, making mistakes, and taking other people’s thoughts and input. At least in theory, none of us wants to be the be-all, end-all; we want to be part of a community.

You said Linux and Unix were initially important to you, but here we are decades later and you’re still a strong proponent of open source. What value do you see?

Walter Scheirer: Open source has had a tremendous positive impact on the industry. It’s improved the quality of software drastically. In my own career, if you look at the impact the field of computer vision’s been having, it’s huge. A big piece of its success has been the move to Open Access about a decade ago. Every time we do a paper, there’s a repo associated with it. We want people to be able to use this stuff. It’s not just about replication, it’s about other people who want to use this and build on it. It should be a stepping stone for bigger work.

Jason Schlessman: What impact have you seen in making things more open in academia? Are you seeing more best practices being incorporated in your lab as a result of interacting with folks like us at Red Hat?

Walter Scheirer: Absolutely. The repos are important, and good documentation as well. Academics still have this problem of thinking, “We open sourced it, we’re done.” What am I supposed to do with it? Is there a useful Read Me for this? Can I get it up and running? Is there a process to reach out and get some help? All of that stuff is important. We want people to understand how to run this stuff, and also, just for general awareness, give people a digestible explanation. Technical papers can be challenging to read. We did a Medium post with our code and tried to explain it in a clear way. You have to combine all these elements.

Jason Schlessman: For the sake of readers: you’re referring to a successful upstream contribution in the form of an image forgery detection toolkit Python package. Even with that, I found that maintenance is a big thing: you put the repo out there, but what if people need help or want to contribute? I’ve seen, on the academic side, increasing openness to that mindset. If somebody has a pull request, instead of seeing it as “We already published the paper, what do we care?” it’s more, “Wow, we have people getting involved, and we can revisit this.” You once said that revisiting these problems also opens the doorway to thinking up new research problems and new innovations.

Walter Scheirer: Exactly. It’s not just about the paper. It’s also about improving that code base over time. Some of the most successful research contributions in AI-related fields are those projects where there was a useful software package that became indispensable over time, and new features were being added, bugs were being fixed. Maintenance behind the scenes really made it better.

Jason Schlessman: Where do you see your collaborations with industry heading in the future?

It’s a huge advantage to have computing professionals working with academics.

Walter Scheirer: So many projects we have in the lab right now would benefit from exactly what we’re talking about. It’s a huge advantage to have computing professionals working with academics. A graduate student’s job is basically to come up with cool ideas and implement those ideas in a proof-of-concept way. But if you want to make a project successful, you have to go beyond that. You have to remember there are users on the other side, even if it’s still in an academic context.

Our work started in media forensics, and we will continue doing that, but there are a number of other really interesting AI areas, perhaps more fundamental stuff. My lab works a lot on open world recognition problems, which cross a lot of different application domains. There’s a big need for open source software for a fundamental operation like that, and other interesting application areas to explore, so there’s a big space for us to continue.

Jason Schlessman: Very good. Thank you for your time, Walter. I’m looking forward to future collaborations.

SHARE THIS ARTICLE

Open source cybersecurity and the next generation of computer scientists

Mike Bursell

Václav Matyáš, Professor with the Centre for Research on Cryptography and Security at the Faculty of Informatics at Masaryk University.

From the Director

Untangling complex systems

Hugh Brock

It is by now well understood that we humans are capable of creating systems that are more complex than we can understand.

Feature

When machine learning meets big data processing: From human-native tasks to machine-native tasks

Ilya Kolchinsky

Since the inception of artificial intelligence research, computer scientists have aimed to devise machines that think and learn like human beings. What else could AI do?

Column

Better together

Beverly Kodhek

Thoughts on open source and open collaboration from the Greater Boston Research Interest Group (RIG).

News

Research at Devconf.us: Optimizing and automating the foundations of computing

Research at Devconf.us: Optimizing and automating the foundations of computing.

Feature

Finding bugs in parallel programs with heavy-duty program analysis

Vladimír Štill

Parallelism promises to make programs faster, yet it also opens many new pitfalls and makes testing programs much harder.

Feature

Mental models: Qualitative research to design for Red Hat OpenShift users

Carl Pearson

Brian Dellascio

Sarahjane Clark

To design effectively for our users, we need to learn more about them. If we don’t, we may make a product that our users can’t be efficient in, or worse, a product that our users have no need for in the first place.

Feature

A thread model for the real-time Linux kernel

Daniel Bristot de Oliveira

The recent advances in AI and telecommunications are enabling a new set of complex cyber-physical systems, including those for safety-critical applications.

From the Director

Managing large-scale systems

Hugh Brock

I have been spending a lot of time lately thinking about all the hard problems involved in managing large-scale systems. Why? Well, it turns out to be a really important topic for Red Hat Research and for the Red Hat engineering community that we hope to serve.