September 6, 2022

Transcript

Ariel Jolo

Episode 4: transcript

“DGW: Some people just want to download the software and make porn with it. And if they don’t know how to program, and there is that restriction, that stops them. That’s a meaningful impediment. It’s a speed bump. It doesn’t stop you going down the road, but it makes it harder, it makes it slower, and it stops some harm.

Even within the dictates of open source and free software, I think that we can think a little bit more creatively about how we can build restrictions about what users we think are immoral or unethical into our software, and not see it as black and white.”

[INTRODUCTION]

[00:00:37] SM: Welcome to Deep Dive AI, a podcast from the Open Source Initiative. We’ll be exploring how artificial intelligence impacts free and open-source software, from developers to businesses, to the rest of us.

Deep Dive AI is supported by our sponsor, GitHub. Open source AI frameworks and models will drive transformational impact into the next era of software, evolving every industry, democratizing knowledge, and lowering barriers to becoming a developer. As this evolution continues, GitHub is excited to engage and support OSI’s Deep Dive Into AI and open source and welcomes everyone to contribute to the conversation.

[00:01:18] ANNOUNCER: No sponsor had any right or opportunity to approve or disapprove the content of this podcast.

[INTERVIEW]

[00:01:23] SM: Today we’re talking with David Widder, a Ph.D. student in the School of Computer Science at Carnegie Mellon University. He’s been investigating AI from an ethical perspective, and specifically is studying the challenges that software engineers face related to trust and ethics in artificial intelligence. He’s conducted this research at Intel Labs, Microsoft, and NASA’s JPL, Jet Propulsion Lab.

David, thanks for joining us today to talk about your research, and what you’ve learned about AI and ethics from the developer’s viewpoint. Welcome, David.

[00:01:55] DGW: Thank you so much, Stefano. I’m excited to be here. And I am grateful for the opportunity.

[00:02:00] SM: Tell us about why you chose ethics in AI as the focus of your research. What do you find in this topic that is so compelling?

[00:02:08] DGW: We might all agree that, for better or worse, AI is changing our world. And as we begin to think about what ethical AI means, especially as a lot of the, as I see it, ethical AI discourse is driven by powerful companies, governments, and elite universities, I think there’s a risk in the way this discourse plays out.

The problems we study are not those which affect the most marginalized, which are often left out of the decision-making of tech companies and things like that. They’re the problems that are faced by these systems of power. They’re are the problems that are most salient to these people. And the solutions we make are barely going to sort of threaten these powerful interests. There are things that sort of are important and make meaningful changes, but don’t sort of levy fundamental critique.

I like taking a step back to be a little bit more critical of the narratives around AI ethics that emerge and ask, “What are we missing? What is going unsaid?” Based on what we’re focusing on, what isn’t being focused on? And that’s how I like to drive my research.

[00:03:03] SM: You recently presented one of your published papers, the one titled Limits and Possibilities of “Ethical AI” in Open Source. And you focused on deep fakes with your coauthors, Dawn Nafus Dabbish and James Herbsleb. For those of us who need a little bit of background, what is a deep fake? And give us a little bit of examples of how this technology is used for good or bad reasons.

[00:03:29] DGW: Essentially, a deep fake is a video where the likeness of one person is superimposed or swapped, faked, onto the body of another person. So, you put the face from one person on to the face of another and they can be in a video. So, sometimes this is sort of more innocent for fun uses. You can – It’s parody, political parody. You might have seen deep fake Obama or deep fake Tom Cruise. Sometimes it’s for art. There’s actually some overlap between what we think of as a deep fake and what is computer graphics and like the movie avatar and things like that. And these are more interesting uses.

Now, where it gets a little bit more tricky is sort of in the middle, where we start thinking about what is political parody and what is fake news? There might be what might seem like parody for one person might actually fool other people, if you’re parodying powerful politicians or leaders.

What I think is actually understudied is the really difficult uses, the really nasty uses, the really damaging uses, which, unfortunately, constitute the vast majority of deep fakes. One study found that the vast majority of deep fakes portray women in non-voluntary pornography. So, they’re superimposing the likeness of someone you might know, like a celebrity, onto a pornographic actor. Then this can lead to anxiety, job loss, and health issues, and employment issues.

[00:04:43] SM: It’s terrible, because if I understand correctly, also, this technology is becoming so much better that it’s hard to distinguish the original from a fake. So, gullible or inexperienced people might be tricked into believing it’s true.

[00:04:59] DGW: Totally. When we talk about this technology getting better and like becoming more convincing, you’re totally right. It is increasingly easy to use. It’s increasingly accurate in the way it spoofs or fakes the video. And I think that raises an important question. It’s hard to think of how to make the tech better in a way that changes the way it harms because the harm is inherent in the way it’s used.

It’s not like a technical improvement to the tool is going to like reducing bias or fixing privacy leak doesn’t really make sense in this context. It’s how it’s used. When it’s getting better, it’s hard to think of how that also fixes ethical issues or how that also addresses ethical issues.

[00:05:33] SM: Okay, let’s talk about this ethical issue. Like, your paper is focused on open source. But these deep fake technologies are available in both proprietary and open source forums. And the ethical ramifications seem to play out a little bit differently, whether we are in a corporate situation, proprietary, versus within the open source community. What do you see are some of the contrasts between these two approaches? And let’s start from the corporate scenario and focus on big tech. What are the responsibilities of these technology companies have when they develop or use? Or what kind of power and control do they have?

[00:06:12] DGW: They have a lot of power and a lot of control even when they perhaps don’t want to acknowledge that. A good example is Google’s project, Maven, which was a contract they had with the Department of Defense to use, I believe, computer vision to help improve the targeting of drones, warfighting drones. And I think we often – These big tech companies try and conceive that their technology is neutral. And we just provide tools for people to use for good or bad. But this is a case where like they knew what it was being used for. They knew, even if it wasn’t technically being used to kill, they knew that it could help make this more efficient. And I think a good example of tech workers mobilizing and speaking up was the backlash to this project.

Now, I since learned that I think it has been reinstituted in some way. And we’re seeing this, too, in the ways that technology companies are increasingly investing in responsible AI and ethical AI research. Trying to find ways to remove bias from systems. Trying to find ways to make these systems more understandable and more interpretable. And I think that’s good. But I think we also have to pay attention to how these systems are ultimately used, rather than just how they are built or how they’re implemented.

To summarize, there’s a lot of control and a lot of power they have. And I think we have to be careful to investigate where they choose to use that control and where they choose to invest in these kinds of ethical AI questions.

[00:07:29] SM: To clarify, project Maven was not a deep fake technology. It was more of a general AI computer vision tool.

[00:07:36] DGW: Yeah, yeah, absolutely. And to talk about – I mean, that’s a fair point. And to talk about the difference between corporate or centralized proprietary technology in the deep fake context, there are plenty of closed source proprietary deep fake services, running as software as a service, where you can upload a video and upload some source imagery and have a fake provided to you without being able to access the source code, or being able to access sort of the intermediate steps there.

And because it’s centralized, there actually is the opportunity for some more control in that case. They can put in filters for pornography and such in a way that isn’t felt in open source. And I think we’re going to get to that.

[00:08:14] SM: Basically, proprietary systems and corporations have the choice to pick who they partner with, who their customers are, and also select the possibilities inside the tools themselves, like the kind of output they can prepare.

In contrast, an open source developer community, that includes a numbers of volunteer developers. Your research has uncovered a quandary here. It seems that the open source licenses limit the developer’s sense of responsibility and control of how their software is used.

[00:08:46] DGW: There’s artifacts of the licenses and sort of the way people conceive of open source and free software licenses, the non-discrimination for fields of endeavor, non-discrimination to people and groups, and free software is freedom zero, that sort of legally dictate with licenses that follow these mandates, legally dictate that you can’t discriminate or can’t control how downstream developers or downstream users use your software. So, this is different because, oftentimes, software companies that don’t license their software as open source, they have that contractual control over how and who uses their software, and often for what even they can control that.

Whereas an open source that control isn’t felt by virtue of these licensing strategies common in open source software. And this can be good for a lot of things. For example, and one of the proprietary deep fake tools was found to have a crypto miner embedded in it that was stealing power and cycles from the people who used it. With open source, it’s not all bad news for this. But it illustrates an important distinction and what kinds of harms can be prevented in open source versus in a proprietary company context.

And in the open source case, I think it’s important to realize like what starts as a legal dictate in the licensing world often will then filter into the culture. And it’s not just me pointing this out. But it filters into the culture in a way that developers don’t feel like they can or should even be able to control how their software is used by virtue of the licenses that they’re used to using.

[00:10:11] SM: Which is an argument that we often make at the Open Source Initiatives, that the legal mandates inside legal licenses and copyright licenses are just one layer that incorporates social norms and collaboration norms that are cultural inside the projects, rather than mandated legally. And actually, the licenses are just the tool. But your research also addresses two sentiments that some of your respondents and about how they have this limited agency on what they produce. The first is the notion of this technical inevitability. Like, a sentiment that basically the genie’s out of the bottle.

[00:10:49] DGW: That’s a powerful one. The idea that – I’ll read a quote from my participant. One said, “Technology is like a steam engine. It’s just getting better, faster, and more powerful.” And as if this happens naturally. As if this happens without any human making it happen. This idea that technology naturally gets better or naturally improves, the historians of technology, philosophers of technology, have critiqued this as a thing that is used to remove one’s sense of personal agency or that limits one’s sense of personal agency, technological inevitability.

In the open source case, in particular, some of my participants, when they were building this deep fake tool, were saying things like, “Deep fake software will only continue to get better.” And there are competing projects. There are other open source projects that are trying to do the same thing we are. So, even if I chose to stop working on this, even if I withheld my labor, this algorithm or kind of system will continue to improve. Deep fake realism will continue to improve. Even our project without even my labor.

And I think that’s true to an extent, but it might not improve as fast. And it might not improve as quickly. And there might not be these kinds of technological innovation. So, I think that we have to be critical of the idea of technological inevitability because of the way it can seem to limit one’s own personal agency. You know as well as I do that a lot of open source projects need more labor. They need more help. So, if they chose to take that labor and chose to take that volunteer effort and put it somewhere else, that actually can make a difference.

[00:12:13] SM: This is a very interesting topic because it also overlaps with my memory of the conversations around PGP, Pretty Good Privacy algorithms and tools, that were for a long time considered weapons by the United States. And even encryption was considered a weapon, right? For a long time, the commerce was not safe outside of the boundaries of the United States, because higher-level encryption schemes were not available.

And as of today, there’s still that dichotomy between the people who want to have encryptions on their phones to balance it out with threats of terrorists using the same kind of technology. But the other perception that you have identified among the developers is that of technological neutrality. So, another perception that you have identified among developers is the technological neutrality, the notion that if someone paints something offensive, you can’t blame the paint manufacturer.

[00:13:11] DGW: Guns don’t kill people. People kill people. These are real cultural issues in the United States, right? The idea of technological neutrality is a real – Not just in the United States, but an idea that looms large and the political discourse. Guns don’t kill people. People kill people. We’ve heard that before. That’s a nice sort of soundbite, but I think we need to think harder about it. Because guns are designed to do a certain thing very well. So even if they don’t literally kill someone, I think they make certain things easy.

[00:13:35] SM: Automatically, yeah. Autonomously. Right.

[00:13:38] DGW: Yeah, exactly. Which, if that happens, we have a whole another set of problems. But they are designed to make certain things easier and make certain things harder. An example I like to give is you can throw a gun as if it were a Frisbee, but it’s a pretty bad frisbee, and it’s not going to be very fun. And you can kill someone with a Frisbee. But it’s not designed to do that. And it’s going to be pretty hard.

The idea of technological neutrality is, I think, in many cases false. I’m not the first to point this out. There’s been a lot of scholars of science and technology studies who asked Do Artifacts Have Politics? Langdon Winner. And what it comes down to, and at least in my view, is that there’s a connection between the way you design a thing or the way you implement a system, and what kind of uses are afforded by that system.

Affordances, the connection, the glue in the middle, the things that you design, the way you design your system to make certain things easier and certain things harder affects how something is used. And that then unpacks and challenges the idea of technological neutrality.

In the deep fake example, or in the open source example more generally, I heard people argue that even if we were to put, for example, technical pornography restrictions into our software, because it’s open source, anyone could go and then just take those out. Anyone could go and remove those. Now, that’s true. In a literal sense, open source is open. And if you have the programming knowledge that can take out restrictions that are built into code.

But as one of my participants pointed out, not everyone has that knowledge. Some people just want to download the software and make porn with it. And if they don’t know how to program, and there’s is that restriction, that stops them. That’s a meaningful impediment. It’s a speed bump. It doesn’t stop you going down the road. But it makes it harder, it makes it slower. And it stops some harm.

Even within the dictates of open source and free software, we can think a little bit more creatively about how we can build restrictions about uses we think are immoral or unethical into our software, and not see it as black and white. It’s not always going to be we stop all misuse, or we just don’t stop any. But there are shades of grey in the middle. Challenging the idea of technological neutrality is the way we begin to see those shades of grey.

[00:15:42] SM: This is a great point, because so many times I talk to developers who have that black-and-white approach. They’re trained to think in mathematical terms. Like, if A happens, then B is the consequence.

[00:15:54] DGW: And I think the norms will differ in every community. I think some communities will be more comfortable taking a more restrictive approach. Leaning more into the sort of trying to help guide people to certain socially beneficial uses and away from certain socially concerning uses. And I think that just acknowledging the broad spectrum of gray that is there is going to be really, really important. I think you’re right to route our discussion in concrete examples of harm or concrete examples in the world because that makes the stakes feel appropriately high.

If we adopt the, “Well, we can’t stop all harmful use. So, we may as well just leave the ethics up to the user and not try,” I think that is concerning. Because even if we stop a few or like 10% of harmful uses, that still appreciably changes the harm that is wrought to individuals. Like, for women who have had nonvoluntary pornography made of them, that’s a big number. If that stops you from having a fake porn made about you, and that stops you from losing your job or developing anxiety, like that’s a real thing. Thinking of framings that aren’t we stop all harm, or we don’t even try, is probably a good way to develop the conversation in this area and get away from the idea of technological neutrality.

[00:17:08] SM: Your research also reveals an interesting dichotomy, how transparency and accountability of open source may differ between implementation and use. With respect to the ethical AI, why is the open source great for implementation purposes, but not so great with respect to use?

[00:17:26] DGW: This is something we’ve all kind of known for a while, but not sort of named. I’m not going to like pretend I thought of it. But I think what we name in our paper is a spectrum between – Or a continuum, as we call it, between implementation-based harms and use-based harms.

Implementation-based harms are things you can fix by building the software differently. And a good example in an open source case is the idea of recidivism prediction algorithms. Algorithms which seek to predict whether someone who’s accused in the criminal justice procedure will recidivate, will recommit a crime.

And these systems are in many cases – Well, most cases, I’d say, biased. Whether because of the data they use, or the way they train their algorithm or the way they’re even just employed in a certain context. And if they’re the sort of implementation-based harms, if there are data issues, or implementation issues in the code or bias issues in the code, making these open source can allow more people to scrutinize them. Can allow more eyes to ask questions of this system and find implementation harms. And in wider – Not just the recidivism case. But in wider cases, perhaps find privacy leaks, or find unchecked bugs, ethical bugs so to speak, that can be fixed by increased scrutiny.

Because of the way that open source is open and freely inspectable and freely – Like, anyone can, in many cases, submit a pull request or submit a change to fix these kinds of bugs for implementation harms or harms from implementation, I think that open source does particularly well.

[00:18:58] SM: Definitely, this is true. It has been true for a long time when we talked about software in the very clean software sense, like 90 styles and non-AI, because we’ve been talking the inspectability or the reproducibility of code. The fact that you can download the source code and recompile it on your own and prove that it actually get the same deterministic results after running it.

When I talk to developers of AI systems, I get that fuzzier answer to that, because the inspectability itself becomes a little bit more convoluted depending on the model or the algorithms. I totally understand your point. In general, the systems we’d like them to be – We should be able to inspect them and review and make sure that especially before we put them in charge of making decisions for us.

[00:19:42] DGW: I agree. I think we should maybe contrast this with use-based harms. Sort of the other side of the continuum, open source allows transparency in the source code. And therefore, accountability for implementation harms. You can know who added what feature. And if there’s an issue, you can fix it.

For use-based harms, where open source is released online for anyone to use for any purpose, there’s not a lot of traceability into who uses it for what. There’s not transparency into uses. And there’s not accountability for those uses. That filters out back to the people who developed it.

An example of that would be the deep fake case, right? It’s not a problem with how the tool was built, or how it was implemented. It’s how it’s used. And because it’s open source, anyone can use it for harm. And so therefore, this is where it’s a little bit more problematic, a little bit more concerning. Open source allows some use-based harms to go unchecked without the same level of transparency into how it’s used and the level of accountability into harms arising from those uses.

[00:20:38] SM: Then there are the terms about transparency. Like, we don’t have, right now, norms or legal obligations yet from this. There is some research going on. The European Union has already started looking into AI a little bit more, just like the European Union has been looking at data mining and starting to regulate it. These transparency and accountability issues that float around AI and art, and they are object of your studies too, they are being regulated.

[00:21:09] DGW: I’m glad there’s more focus on these because I think that we have a very narrow and rehearsed view of what transparency and accountability might mean.

[BREAK]

[00:21:19] SM: Deep Dive AI is supported by our sponsor, DataStax. DataStax is the real-time data company. With DataStax, any enterprise can mobilize real-time data and quickly build the smart, highly scalable applications required to become a data-driven business and unlock the full potential of AI. With AstraDB and Astra Streaming, DataStax uniquely delivers the power of Apache Cassandra, the world’s most scalable database, with the advanced Apache Pulsar streaming technology in an open data stack available on any cloud.

DataStax leads the open source cycle of innovation every day in an emerging AI everywhere future. Learn more at datastacks.com.

[00:21:59] ANNOUNCER: No sponsor had any right or opportunity to approve or disapprove the content of this podcast.

[INTERVIEW CONTINUED]

[00:22:03] SM: You shared with me also, contrasting your research, another paper you wrote with NASA. Comparing the deep fake paper and the NASA paper gave you another perspective on how developers see AI and how they end up trusting. Tell us a little bit about that.

[00:22:20] DGW: A brief sort of highlight of the paper is I had the privilege of being at a NASA site while they were developing and beginning to use an auto coding tool, a tool to build software automatically for an upcoming space mission. I was a fly on the wall and very grateful for that privilege.

But there’s a question of trust there, right? Because if you’re trying to use this new tool that automates some programmers’ labor, does it in an automated way, do you trust it? These people, as am I, they’re space nerds, right? Like, we have pictures of like the past missions on our walls. We have, you know, statues. And sort of the idea of wanting this mission to be successful is extreme. NASA is what some literature call a high-reliability organization. The stakes are big, you know?

They were developing this framework, which by the way was open source. And then also, trying to use it for an upcoming space mission. And I think this is useful in contrast to the deep fake case, because in the NASA case, everyone can agree what a harm would be. If the spaceship blows up, if it crashes, if it stops working when it’s outside the orbit of Earth, that’s a bad thing. And that looks bad for me. That looks bad for you. That looks bad whether you’re the one creating the software or the one using it. There’s normative agreement around what is good and what is bad. It helps illustrate the deep fake case because there wasn’t normative agreement around what was good and what was bad.

The community developing the tool had strongly set norms that you’re not allowed to use this for non-voluntary pornography and other harmful uses. But there’s not normative agreement in every case between the people developing the tool and the users of that tool. They weren’t able to control or necessarily even engage with reached normative agreement with the users of the tool.

What is bad for the community, and certainly, they took steps to set these norms in a positive direction, which I think is great and a thing that the wider open source community can learn from. But they weren’t able to always reach normative agreement with the myriad users and nameless users, many users who weren’t in their organization and thus can’t be reached in the way that they could at NASA. And there was that normative agreement at NASA.

[00:24:33] SM: The fact that they had this control, and they knew who the users are.

[00:24:38] DGW: The fact that they were just like already on the same page about what was a good use and what was a bad use to begin with. Maybe there’s someone way out there. Maybe a different state actor might disagree that an American space mission succeeding is a good thing, and they might seek to damage it. But at least within the organization developing and using the software, there was normative agreement around what was a good use and a bad use, or a good outcome and a bad outcome, in a way that didn’t always exist between the developers and users in the deep fake case.

[00:25:05] SM: The fact that, now, the community has been talking about coding assistive technologies, like Copilot or Code Whisperer. And NASA already had something that was even more production ready. It’s actually shipping code that goes into flight missions. Not just a prototype.

[00:25:22] DGW: Absolutely. I know, right? I think that’s super exciting. And that is part of the reason why they really needed to trust it, is because like – I mean, I’m sure you and I have experimented with Copilot and written a few lines, and it’s fun. And the stakes are high. Like, it really needs to work. This trust is especially important in these kinds of contexts, trust in your tools. And that’s what we were trying to study in that paper.

[00:25:45] SM: For the record, I’m not a developer. My experiments with Copilot are pretty much the same as fooling around and trying to see whatever it spits. But I’m not able to judge.

[00:25:55] DGW: And while we’re on the thing of Copilot, this is a little bit of an aside, but I think Copilot is concerning for open source. And I mean, I think it’s exciting in many ways. But I think there’s particular concerns I have. Because is it valid? Is it okay to train a proprietary system on open source code? If you don’t, then license it using the terms of the data you are training it on. And what I mean by that is, in many cases, Copilot will generate licensed text verbatim. MIT license, permissive licenses, copyleft licenses, like GPL. But it will generate a license text verbatim, which shows that it can spit out licensed code verbatim in a way that may or may not respect the licenses that it’s spitting out. This is a legal gray area right now. I’ve talked to lawyers who are much smarter than me and also are actually lawyers. But that’s a much wider conversation of like what does the idea of Copilot, especially in the code sense, hold for open source when it’s unclear whether it’s following the license restrictions?

[00:26:58] SM: Those are really important questions that the community is asking about Copilot, Whisperer and other tools that I’m sure are being in development that we don’t know yet. I guess, fundamentally to me, it is a fairness issue. When a developer wrote code and published it, made it available to the world, and adopted a shared agreement of saying, “I give it to the world with the promise that also other users receive the same rights.” Then we didn’t know. None of these developers in the past or anybody who wrote anything and published it on the Internet had any understanding that their body of text, their creation, would be used to train a machine that would be doing some other things. Whether it’s DALL-E with pictures and images, or Copilot, or whether it’s GPT 3 spitting out poems and short web pages. And it’s a new thing. It’s a new right. There is this new right of data mining that has been codified by the European Commission already. In the US legal system, I don’t think there is an equivalent. But probably there will be something that looks like. Whatever we have contributed in the past, if we don’t want it to be available for corporations or anyone to use for training data sets, we have to make an action.

And what goes with all the pictures that we have uploaded in the past on data sets like Flickr using the norms that we thought was fair? Like, with Creative Commons, we said, “Okay, Creative Commons Attribution Share-Alike. I give it this picture to you, as long as you keep it the same as it is and you share it with the same attributes with the same rights to others. And now, that picture is my face, is being used to train a system that detects myself going shopping or going to a protest in the street. Is that fair? Honestly, I don’t have an answer for that. And I think we, as a society, we need to ask ourselves, what have we done? What kind of world do we want to live in? What are the conditions to balance the power of regular citizens with those of developers and other actors?

[00:29:09] DGW: You raise a really interesting question about the difference between laws and norms. Because oftentimes, norms are things we all feel and experience and kind of expect. And that may or may not be in line with the current legal regime, or they’re just, as you raised, may not be a settled matter in law.

When I uploaded – When I first got Facebook, I don’t know, I was 13 or something. And I was uploading pictures of like a birthday party or something. Am I okay with ClearView AI using that to build a facial recognition system and sell it to law enforcement and other agencies? Like, I think if you’d asked me then I would have gone, “What’s facial recognition?” But also, no. I mean, I hadn’t heard of that case. But I think it’s really scary if like the default is going to be set towards like companies and governments have the right to scrape your data and use it for whatever, unless you take a specific action not to. Because like you and I know about this, right? But I think the average person who doesn’t have the luxury of free time to talk on podcasts.

[00:30:13] SM: But also, it’s almost impossible to exercise that right to opt-out, because so many megabytes and gigabytes of pictures have already been uploaded. I lost track of all of where I put my pictures. And then the services that used to exist, now don’t exist anymore. Where did it go? It’s an interesting world that we live in.

Going back to yourself 13 years ago, I mean, when you were 13, were you even aware that you were basically training a machine by uploading a picture and adding a tag of your friends and building a history of your faces changing over the ages too?

[00:30:48] DGW: Well, was I like, explicitly aware? Probably not. But I will point out that I was quite a nerd. So, every time I was drawing a box around my face to like, tell Facebook, I was in it every time I was drawing a box around my friends’ faces to get them to know that I put a photo on them, there is that kind of question about, again, talking about what kinds of solutions companies will develop and why? There’s kind of that question where it’s like, “Yeah, this is kind of useful to me, and that I can like know which pictures I’m in.” It’s going to be useful for the people developing the feature too. It’s going to be useful for Facebook to know who my social network is so they can sell ads to me that are more precise. So, maybe not as 13-year-old David. But like, there’s always that kind of like, “What’s going on here?” And that I think we all tend to have in our curiosity.

[00:31:32] SM: That’s why I think it’s important to talk about ethics in AI because the responsibilities of corporations and the control that they have also means that they have power that needs to be balanced out.

[00:31:45] DGW: I think you’re totally right. As we begin to talk about ethical AI, if we let companies only drive this conversation, and we don’t look to open source, and we don’t look to public sector organizations, then I think we’re going to get a very particular idea of what ethical AI is and what kind of problems there are, that is going to be driven by the interests of big tech.

[00:32:06] SM: For the broader open source community, what do you think are the key takeaways? As we frame the discussion around AI and ethics, what are your thoughts about how to bring the best future for AI and helping it become more trustworthy for us?

[00:32:22] DGW: The big question that I hope to raise by the papers is that I think we need to start this conversation. I think we don’t know yet. And I’m not going to pretend that my paper has a definitive answer to your question. But I’d be happy to start. I like to think of this kind of as good news and bad news. And we’ll start with the bad news, because it’s nice to end on an optimistic note. I think we have to realize that putting software that can be used for harm online and letting anyone use it for anything can be concerning, can cause harm in ways that proprietary closed source software does not. And I think we need to talk about that more. I think we need to recognize that.

And the devil is in how you actually, once you recognize that, what you do about it. But I think, again, going back to our earlier discussion about the gray area, it’s not black and white. It’s not like you either make it closed source and write contracts for who can use it for what and license it that way, or you make it open source. There’re areas in the middle. As a community I studied, that you can set norms. Even if you’re completely open source, you can set norms about how you find it that your community is okay with the software being used. You can elevate socially beneficial cases and educate about harms arising from harmful cases.

There’s also the ethical source movement, which is using licenses to bar certain kinds of uses. And there’s a lot of discussion about whether this constitutes an open source – Technically an open source license or not. But I think that the higher-level takeaway I take from that movement is that you can use licensing in many ways, or you can at least use licenses to influence norms in many ways. And I think that’s just something for further discussion. I don’t think it’s something to be cast out of hand outright.

What are ways that we can find to influence, if not outright control, how the open source software release in the world can be used? And I think that’s sort of the bad news, acknowledging that there is harm from the way we release possibly harmful software that can be used for harm freely available online for everyone to use.

Now, towards the optimistic case. By virtue of focusing on open source in this conversation, I think we haven’t talked about some of the harms from big tech in this case. There’s a profit incentive. And I can cite so many studies. So much great research has shown the difficulty of doing ethical AI when you’re driven by a profit motive. When you’re working with a manager who wants you to do certain things, not others. When you don’t have the ability to address an ethical harm or change norms in a way that you think would be helpful.

I think the open source strength in this area is what it’s always been, which is the broad diversity of communities that can set their own norms, that can refashion these norms, as a way to experiment on what ethical AI might mean in a way that is not dependent on the for-profit context in private companies. This radical sense of experimentation in open source is also a promising way to think about what ethical AI mean, or what it could mean in different contexts.

[00:35:09] SM: The experimentation, I’m all about that. I’m all in favor. And I think that we are in the early stages of new things. And if we don’t play, if we don’t play with different variations, if we don’t feel ourselves as flexible, then we’re not going to be making much progress. Any closing remarks or something that you want to share? Like, what’s something you’re working on for the future?

[00:35:31] DGW: I would love to discuss this research with anyone listening. I want this to start the conversation. I’m under no conception that I have all the answers I want to learn from everyone. I would also love to connect on Twitter. And that’s where I discuss a lot of my research, my art, my activism. I’m Davidthewid on Twitter. And I’d love to learn from you there and engage with you there.

And as I’m still doing my Ph.D., I don’t have an escape to that yet. And I’m beginning to study what AI ethics might look like in a supply chain. Acknowledging the fact that software is not developed all at once in one organization. You remix stuff. You take bits from there. You take modules from there. And that all comes together. And that means that AI ethics has to account for that reality in the fact that it’s a supply chain problem, the same way ethics has always been a supply chain problem in the physical product space. That’s what I’m working on. If anyone has thoughts on that, too, I’d love to talk.

[00:36:24] SM: Thank you. Thank you, David.

[00:36:26] DGW: I’ve loved this conversation. Thank you.

[OUTRO]

[00:36:27] SM: Thanks for listening. And thanks to our sponsor, Google. Remember to subscribe on your podcast player for more episodes. Please review and share, it helps more people find us. Visit deepdive.opensource.org where you find more episodes, learn about these issues. And you can donate to become a member. Members are the only reason we can do this work.

If you have any feedback on this episode, or on Deep Dive AI in general, please email [email protected] This podcast was produced by the Open Source Initiative, with the help from Nicole Martinelli. Music by Jason Shaw of our genetics.com under Creative Commons Attribution 4.0 International License. Links in the episode notes.

[00:37:09] ANNOUNCER: The views expressed in this podcast are the personal views of the speakers and are not the views of their employers, the organizations they are affiliated with, their clients, or their customers. The information provided is not legal advice. No sponsor had any right or opportunity to approve or disapprove the content of this podcast.

[END]

Episode 4: transcript

Keep up with Open Source

About

Licenses

Open Source AI

Community