A Little Bit of Rust Goes a Long Way with Android's Jeff Vander Stoep Artwork

Security Cryptography Whatever

Some cryptography & security people talk about security, cryptography, and whatever else is happening.

Security Cryptography Whatever

A Little Bit of Rust Goes a Long Way with Android's Jeff Vander Stoep

October 15, 2024 • Season 4 • Episode 4

You may not be rewriting the world in Rust, but if you follow the findings of the Android team and our guest Jeff Vander Stoep, you'll drive down your memory-unsafety vulnerabilities more than 2X below the industry average over time! 🎉

Transcript: https://securitycryptographywhatever.com/2024/10/15/a-little-bit-of-rust-goes-a-long-way/

Links:
- https://security.googleblog.com/2024/09/eliminating-memory-safety-vulnerabilities-Android.html
- “Safe Coding”: https://dl.acm.org/doi/10.1145/3651621
- “effectiveness of security design”: https://docs.google.com/presentation/d/16LZ6T-tcjgp3T8_N3m0pa5kNA1DwIsuMcQYDhpMU7uU/edit#slide=id.g3e7cac054a_0_89
- https://security.googleblog.com/2024/02/improving-interoperability-between-rust-and-c.html
- https://github.com/google/crubit
- https://github.com/google/autocxx
- https://en.wikipedia.org/wiki/Stagefright_(bug)
- https://security.googleblog.com/2021/04/rust-in-android-platform.html
- https://chromium.googlesource.com/chromium/src/+/master/docs/security/rule-of-2.md
- https://www.usenix.org/conference/usenixsecurity22/presentation/alexopoulos
-https://kb.meinbergglobal.com/kb/time_sync/ntp/ntp_vulnerabilities_reported_2023-04
- https://blog.isosceles.com/the-legacy-of-stagefright/
- https://research.google/pubs/secure-by-design-googles-perspective-on-memory-safety/
- https://www.youtube.com/watch?v=QrrH2lcl9ew
- https://source.android.com/docs/setup/build/rust/building-rust-modules/overview
- https://github.com/rust-lang/rust-bindgen
- https://security.googleblog.com/2021/06/rustc-interop-in-android-platform.html

"Security Cryptography Whatever" is hosted by Deirdre Connolly (@durumcrustulum), Thomas Ptacek (@tqbf), and David Adrian (@davidcadrian)

Speaker 1: 0:12

Hello, welcome to Security Cryptography, whatever I'm Deirdre.

Speaker 2: 0:17

I'm not David Adrian.

Speaker 1: 0:18

Thanks, that's Thomas, and we have a special guest today. We have Jeff Vanderstep, who is from the Android team. Hi, jeff hey everyone.

Speaker 2: 0:29

Hi, he sounds really excited to be here.

Speaker 1: 0:32

Hi. He's doing us a big favor by joining us late at night his time. We invited Jeff on because he helped co-write a blog post from the Google Security blog recently entitled Eliminating Memory Safety Vulnerabilities at the Source and to TLDR. It had some very interesting results from deploying and adopting Rust in practice in the Android project and how basically walling off your old memory unsafe code with a nice abstraction boundary and only writing new code in memory safe languages like Rust and others significantly reduces your vulnerabilities over time in a sort of non-obvious way, which does not involve rewriting the world in Rust or rewriting the world in Kotlin, and so we just wanted to ask him a whole bunch of questions about this.

Speaker 2: 1:32

Jeff real quick. You guys are still writing some C++ code or whatever Like. If you look at the graphs and stuff, that's. One of the counterintuitive things about the results here is that, despite the increase in memory unsafe code, you're seeing a you know a metric decrease like a sharp decrease in you know memory safety. But that was a reasonable summary of the of what you're up to.

Speaker 3: 1:53

Yeah, I think so. As far as like what, what code we're still writing? Yes, we are still writing and touching some C and C++ code, and you know we expect that to continue to happen over time, but also to decrease over time.

Speaker 1: 2:09

Okay, that was not obvious to me when I first went through this. What memory unsafe parts of your code base are you adding to and how are you restricting that over time to align with sort of this like new perspective on how you approach adding to your code base over time to mitigate and minimize vulnerabilities?

Speaker 3: 2:36

Yeah, so we have like kind of recommendations and best practices for things that teams should be doing, best practices for things that teams should be doing, but also we're not trying to be over pedantic on things or put too many restrictions in place in order for teams to be productive. Instead, what I would say is that we are encouraging and incentivizing teams to use memory, safe languages and, of course, as more and more teams switch, the bar and the impedance to switching goes down over time, right, and so, instead of trying to put in a bunch of rules and restrictions in place, instead teams are kind of switching naturally, right, and as more code is in Rust or in Java or Kotlin or whatever, then that actually encourages the same.

Speaker 2: 3:32

So can I get a better sense of the threat model here? So I have like an intuition for what the browser threat model is right, like what you're, and also like a sort of sense of what code is getting written there and like what the security hotspot code is there and all that right, and I feel like I don't have as good of a sense of like where new code is written, like what kinds of places in the Android code base tend to be implicated in vulnerabilities, that kind of stuff, like as a starting point like what does that footprint look like?

Speaker 3: 4:14

Like your starting point for this. Like what is that? What does that footprint look like? Like your starting point for this was what was that? Like you know, network interfaces which kind of similar to a browser, are like an obvious entry point, right, but also like we have to be able to run untrusted code, like users can install a third-party app, for example, and so we have everything from like image parsers, network stacks, even like network stacks and firmware, for example, stacks and firmware, for example, but then on top of that we have other apis that are then reachable to third-party apps.

Speaker 3: 4:51

So, like one one escalation path that that we could see is like you exploit a messaging app with with like a malicious image file, and then from the message app, you then exploit um, the gpu kernel driver, and then, once you're in the kernel, then you basically have access to anything that you want on the device, right? And so what's kind of interesting from our standpoint is that trying to decide which of those things you spend effort on is itself a pretty large cost, and so part of what we want to do is actually spend less time doing that and just be able to say now everyone can do things that are safe, and we can just eliminate a threat and not have to worry about a threat, and that's part of what we're seeing right. So we now have different pieces of code where we just don't have this problem anymore, and it's you know, it's really nice. We just don't worry about it in those areas anymore.

Speaker 2: 5:57

So, if I'm following that and if I'm trying to, like, make a mental, I hear you saying that one of the things you're, you know, one of the benefits you're hoping to get from, you know, ramping up the use of rust code on the android platforms is to not have to have intricate models of like where the sensitive code is and where the you know, like where you could tolerate memory corruption versus where you can't tolerate memory corruption, that kind of thing. Just take it off the table is the idea, right. But like, as I'm listening to you, it's occurring to me that, like, so, if I think of what android is like, my first thought is okay, it's the kernel, um, which is really hard to get rust into. And then I would have thought, like you know, system applications and things like that you're shipping and it, like, it's occurring to me, this is also a lot of framework libraries. That isn't like, that isn't the system itself, it's code that's getting pulled into every application that gets shipped on Android, right?

Speaker 2: 6:47

But like, am I right that some of the code that we're talking about here is just like? Yeah, for instance, like an image parser library? Like I don't intuitively see that as part of the operating system, unless there's a system app that uses. Am I right that Android applications that other people write are using that code as well? So we're also thinking basically about what the libc is and what the image parser is and your libpng or whatever. That is right, that stuff.

Speaker 3: 7:12

Yes, yeah, exactly. And to give you a broader context, for example, we have maybe 100 different HALs that are running on a device. Different HALs that are running on a device. We have all of the system APIs. You know, anything that requires a permission check has to be done across the security boundary, so that's another process that that API is running in. And so there's gosh. I wish, I wish I had ADB access on my device right now, like I could do a PS and you could see that. Like there's, you know, a thousand different processes running right now on my device, only a small subset of which are actually applications.

Speaker 1: 7:50

Mm, hmm.

Speaker 2: 7:51

Yeah, I mean, one of the things that's occurring to me is I'm wondering if there is a knock on effect of the operating system work that you're doing, of the Android work that you're doing. That is also. It's kind of it's interesting to think that if you kind of took this to its logical conclusion, you could be swapping out a fair bit of application code for your developers right without them having to think about it, being memory safe, keeping the same interfaces, but they're still calling into it and now, like parts of their application are written without them having to think about it, in Rust as well.

Speaker 3: 8:20

Yeah, yeah, and you know, and Java and Kotlin, and yeah, what I would say is that when you install an application on your device, probably in most applications, most of the code that is running actually comes from the operating system and not actually what was shipped by the developer. Of course, there are large exceptions to this, like browsers being an obvious example.

Speaker 1: 8:46

So when you're talking about the security boundary, we already in like an OS or even in a browser, you already have notions of a security boundary that are kind of like baked in, where you're like oh, all of this stuff is over here, this is trusted, this is like in the kernel or you know far away, and this other area over here, like this, is definitely where this is attacker, controlled supply data, this is untrusted and you have to do something between one to the other or vice versa. How does that change, if at all, when you are starting to rewrite parts of these things in a language like Rust that just mitigates a whole class of vulnerabilities that used to kind of be encapsulated as ah, that stuff is behind that security boundary, like, do you have to evolve the notion or existing notions of a security boundary in this sort of world?

Speaker 3: 9:45

Yeah, so memory and safety is kind of a security boundary in this sort of world. Yeah, so memory and safety is kind of a unique example for vulnerabilities because they lend themselves to being chained together. And so when you have high-risk code, what we often do is we often isolate high-risk code by itself. And so as the risk of code changes, we will be making and already are making different decisions based on what level of isolation that we provide to different things. So if I can replace a memory unsafe parser with a memory safe parser, then I'm probably not going to sandbox that regardless of whether or not it processes untrusted content.

Speaker 1: 10:40

instinctual feeling that, like, if I have a parser implemented in Rust with no unsafe keywords or anything like that or any any weird exceptions under the hood, I have a pretty good feeling that this is a much safer parser and much less likely to have a you know, a high, a critical vuln versus one that's written in an unsafe language. Therefore, I feel pretty good about not putting that in a sandbox. But do you have any other kind of ways of evaluating? Like okay, the security boundary that I need for this component is different than this other component, other than sort of like yeah, seems all right.

Speaker 3: 11:22

Yeah, I mean memory and safety doesn't like, or memory safety doesn't replace good security architecture, right? So we still split things up into logical groups and this isn't just a security thing, right? This is like basic system stability and reliability and other things that we want.

Speaker 1: 11:42

Yeah.

Speaker 3: 11:43

So yeah, like we're still going to split things up into logical components, we're still going to do things like principle of least privilege for various sandboxes. Yeah, does that answer your question?

Speaker 1: 11:59

Yeah, to a degree which is basically like you kind of have to see each component for what, or each module, as it were, for what it is, because, like, if it's a memory safe parser, but the worst thing that that component can do after you've parsed some attacker controlled blob is like read. That's different than, if you like, have a parser that's written in Rust and it parses the thing, and the thing that it can do is read and write and do an inter process. Who's he? What's it like? The capability makes a difference there too.

Speaker 3: 12:35

Yeah, and what I would say for parsers is that for parsers in general I wouldn't even say in general the vulnerabilities that we have in parsers are memory safety vulnerabilities, like we're not having permission check vulnerabilities right, whereas, like you know, you could have that in like a network stack if it's screwing up like a permission check or encryption or you know, like there's lots of stuff that can go wrong in a network stack. But, for like image parsers or any other type of format parser. The vulnerabilities are memory, safety vulnerabilities, Cool.

Speaker 2: 13:06

I'm looking at the blog post. We're going to as usual, we're just going to take a shotgun left in this whole thing, right? So there's an interesting graph. Early on in the post you have kind of a simulated model of ramping up memory-safe code versus memory-unsave code, and then you have the actual empirical data from lines of code from the Android Open Source project, right? So I'm looking at a graph starting in 2019 where you're about like a third memory safe versus memory unsafe code, and then in 2024, it's now like roughly half and half, maybe a little bit more like, maybe closer to so maybe that's like 60% memory safe. There's no numbers in the graph, but that's roughly what I'm looking at, right? So I have a couple questions, right?

Speaker 2: 13:49

So, first of all, 2019 seems 2019 feels early when we think, when we talk about memory safety, like with people in the zeitgeist about memory safety right, that's Rust. Right, that's code for Rust. Right, it's like one word, for I know it's not in your case, I know, but like in the zeitgeist, when people hear memory safety, they're thinking, oh, we're talking about rewriting things in Rust, but in 2019, that couldn't have been Like you could not have been a third Rust code, right? So what does memory safe code there mean in 2019? Memory safe code there mean in 2019?

Speaker 2: 14:23

Is that Java, java, just Java? And then it looks like you've roughly doubled the amount of memory safe code since from 2019 to 2024. Like the ratio has changed. But also, if you look at the chart, it's like it looks like the amount of memory safe code in the Android OpenCache project is about double. Like how much of that is the initiative to start double? Like how much of that is the initiative to start doing? Like how much of that is stuff that wouldn't have been memory safe before and how much of that is? This is Java slash Kotlin code and we just wrote more of it because that was part of the same project.

Speaker 3: 14:56

So I think there's some of both right, like we actually do have things in place to encourage teams to shift. So, like, one of the things that we talked about in a previous blog post was that we, even in the past, we would use the rule of two in order to encourage teams. God, how do I describe the rule of two? It's essentially that, like, if you're going to process untrusted content in a memory and safe language, then you have to sandbox it. If you use a memory safe language, then you have to sandbox it. If you use a memory safe language, you don't have to sandbox it.

Speaker 3: 15:28

And so, even a few years ago, we kind of had things in place that would actually encourage teams to use memory safe languages. And so, yeah, just imagine things like that and how that encourages what teams do. Like today, we have examples where, like, no, I actually need native code to process something. In the past, what they would have done is they would have used C++, for example, and then we would have forced them to sandbox it. If a team has that decision now they're going to use Rust, because the sandbox penalty is a penalty they would like to avoid.

Speaker 2: 16:08

So the rule of two? It sounds like something that's broader than sandboxing.

Speaker 1: 16:13

Yeah, they also have it in Chromium, but yeah, but the implication of it is yeah, it actually came from.

Speaker 2: 16:18

Chromium. Okay, so I guess one question I have is you also have a graph and we'll come back to it of you know the sharply declining number of memory safety vulnerabilities over time. Right, if I'm thinking about where you guys were at in 2019, just to try and get a sense of what this whole trend looks like, right, were you guys like 100% 100% is a weird way to put it, right but like, were you broadly, kind of generally categorically, applying that? Like, if you're writing Emory safe, was all the memory safe on unsafe code sandboxed in 2019? Or did that also increase from 2019? As you, as you ticked forward from 2019 to 2020, did you also increase the number of sandboxes and stuff you had as you caught more, or were you at that point already caught up on sandboxing? You were already doing that?

Speaker 3: 17:04

Yeah, so one of the things we talked about is how we're actually so. We have written about this, and Chrome has written about this as well, which is that sandboxing has limits and it has, like, system resource limits, and so quite often we couldn't make necessarily make security decisions that we would have wanted to, because the penalty for it was too high, Yep, and so that's another thing. Nice, now right is that we don't have to make security versus performance tradeoffs. We can just do both.

Speaker 2: 17:41

That makes perfect sense, right? I'm trying to develop an intuition for what it looked like when you guys were sandboxing something right. Are we talking about like? Is the sandboxing overhead there, like IPC overhead, like we're running multiple processes, or was it like you know we're doing? You know we're instrumenting code as it runs, what is like the archetypical? You know somebody wrote a big memory, unsafe blob of stuff and we have to sandbox it Like what do you think the default is there.

Speaker 3: 18:08

Yeah, so the media frameworks Android's media frameworks is actually the perfect example of this. So before the stage fright vulnerabilities in whenever that was 2015, we had one massive media server process and it had, you know, all the camera stuff, all of the codecs, all of the image processing Like. It had access to all of the kernel drivers that it needed to for hardware. It had all of the HALs running in process, oh boy. And I think that single media server process now is something like 14 separate processes in.

Speaker 3: 18:46

Android now, where we've actually not just like split out the risky parser code into like very kind of deep, like unprivileged sandboxes, but we've also just kind of logically split up. Like the audio processing now all takes place in two processes, actually something called audio server and then an audio HAL. Similarly, like the camera is the camera server and the camera HAL, and so it's on the one hand, like everything is kind of like split out better, but also like we can't keep doing that for forever, right, like we can't take every single process and split it out into 14 processes. Yeah, okay, Like we, we, we just don't have the budget for that.

Speaker 2: 19:30

So when you're thinking, when I'm thinking about sandboxing here, like the right way to think about it is roughly the same way that you know Chrome is divided up into at this point, like microservices, yeah, and all the attendant overhead that makes that makes sense.

Speaker 1: 19:54

Okay, I'm going to switch over a little bit because this is sort of like we can take code that happens to be written in a memory unsafe language and restructure it wisely, and once upon a time it used to be. If it's in a memory unsafe language, you must sandbox it. If it's in a memory safe language, you don't have to sandbox it, like you have to like pick one. And one of the interesting things that came out of your post that seemed counterintuitive was that the old memory-unsafe code importantly matures and gets safer with time exponentially. Not just that, like you know, it will stay steady state, it will get safer with time with an exponential decay. And then if you're only writing new code in a memory safe language, the trade off becomes very clear, like the eventual, like pivot point becomes very clear in these graphs. And I think that was one of the things that was like unintuitive to someone.

Speaker 1: 20:51

Just sort of thinking about this from kind of first principles and you write that the returns on investments like rewrites of the unsafe code into a memory-safe language like Rust, the returns on that sort of investment diminish over time as the code gets older and like we can go back to that we just talked about, like rewriting things like parsers or some of these higher risk things, these like applications that you're actually the thing that you're writing. Not just that, it's a pile of you know C or C++, it's the thing you're doing with it. Those things may be riskier and you still want to rewrite them in Rust so that you don't have to worry about sandboxing them or something like that. But just a pile of unsafe code. You basically write that average vulnerable lifetimes five-year-old code has a 3.4x to 7.4x lower vulnerability density than you code. You sort of kind of gestured at this in the post of why this is true, not just that you measured it and you're able to observe it, but just the mechanisms of why. Please tell me why you think this is true.

Speaker 3: 22:01

So I think that the way I would look at it is that just the frequency that vulnerabilities are found is directly proportional to their density.

Speaker 1: 22:12

Okay.

Speaker 3: 22:13

And so if I'm doing like an Easter egg hunt, the number of Easter eggs that I'm going to find are proportional to how many that are out there in their density, right, and I think what you see in code is that, uh, people fix bugs when they find them, okay, um, you know, and and like whether those bugs are vulnerabilities or not.

Speaker 3: 22:38

Like if I, if I go back to my my easter egg analogy, right, like anyone who's ever like hidden easter for an Easter egg and is like finding them months later, like the same thing happens in code. Right, we did an analysis of kernel vulnerabilities and we found that almost half of kernel vulnerabilities are found and fixed long before they're ever discovered to actually be vulnerabilities, right, right, and what's actually happening there isn't, I'm assuming, people trying to hide the fact that they're vulnerabilities, they just don't know. They found a bug and they went and they fixed that bug, right, yep, and so I think that's actually the same thing. That's happening here is that as you touch code, like less mostly, what's happening is you're finding and fixing the bugs over time and that's just directly proportional to the density of bugs in that code. So the density is going down.

Speaker 1: 23:31

Okay.

Speaker 2: 23:33

I have so many thoughts. They're not good thoughts. These will all be bad thoughts, right? I want to say up front like you know your code, I don't know your code base. I'm an iPhone user. I have no idea how your code, I don't know your code base. I'm an iPhone user. I have no idea how your code base works. Right.

Speaker 2: 23:47

But like so I read the Alexandroclos paper, like the open source defect density, like the decay, like the half-life of vulnerabilities in code over time thing, right.

Speaker 2: 23:56

And like it's, like it's a profound statement, right, like it was a profound observation, especially, you know, I know that memory safety inside of Android is much more complicated than can we use Rust.

Speaker 2: 24:08

Right, because you also have the decision to write whole components in higher level languages anyways. Right. But like that battleground between C++ and Rust is it's all anybody is thinking about right now and it's easily the most important or most talked about issue in software security right now. Is that frontier, right? And that observation about the half-life of vulnerabilities, if that's true, says something pretty profound about what the work looks like to kind of shift over to the memory safe future. I guess the first question I have is when you're thinking about that half-life. When you're thinking about how many vulnerabilities you expect, how resilient you expect old memory-safe code to be, are you thinking about that mostly in terms of prioritizing what things to rewrite or to build in Rust first or to build in Rust first or are you thinking about it in terms of what memory unsafe code will we still have in the Android open source project 20 years from now?

Speaker 3: 25:11

Yeah, I think there's a couple of angles here. I think, first of all, I would actually look more at existing memory unsafe code as less about like, yeah, more like how are we going to? Like, what are we going to apply like our existing tool toolkit to? If we're going to spend effort, time and effort fuzzing like, where are we going to spend that effort?

Speaker 2: 25:36

Yeah.

Speaker 3: 25:37

And I think I think the other side of this is is actually what's exciting about this result is it is it tells people that doing the inexpensive thing is actually very effective.

Speaker 1: 25:48

Yeah.

Speaker 3: 25:49

And that's what's actually really exciting about it.

Speaker 1: 25:52

Yeah.

Speaker 2: 25:53

Right, because you have a really good system of incentives. Right now, it looks like my top level message from that blog post is that the incentives for AOSP are great right now. Right, like you don't have a mandate to rewrite everything, you have a sense of like how to prioritize that and also you have things like we can get rid of sandboxes and make things easier for you know we can simplify your design. It's the things you would expect to get from you. Know what languages like Rust promise there, right, I am fixated on the computer science part of this. I buy 100% the direction you guys are going and that you're in a good place right now.

Speaker 2: 26:30

The early result on the experiment of doing more Rust and AOSP sounds really positive. There's a result that was published at Usenix about the Linux kernel and OpenSSL and the half-life of vulnerabilities and like I have some just you know kind of poorly informed CS. Kind of doubt is the wrong word, right, but questions I have, for instance, that is a CVE driven result, right? So like they are looking underneath the light post for these things, right? Yep, and obviously all of us have the superficial thought of well, we've all heard about vulnerabilities that were found only 20 years after code was committed or whatever, and we know that that's not going away and all that, right. But like you have insight about your code base, right? You're not, like you're not relying just on. You know that Usenik a result of all those open-stretch projects. You're looking at your own project and your own like it seems like you're pretty confident in those half-life numbers in your code base and you're pretty authoritative on that. So what gives you that confidence?

Speaker 3: 27:33

So we did our own study of this right on our code base and any data source that we look at seems to give us this result, which is pretty compelling.

Speaker 3: 27:45

You know, there was like another study that we looked at that looked at like fuzzing and what it says is the cost of fuzzing actually goes up exponential if you want to scale it with the number of vulnerabilities that it finds, and so it's basically kind of showing the same result but in a different way, kind of showing the same result but in a different way. And so if we continue to see the same result, like any way that we look, when we look at a problem, whether it's defect rate or vulnerabilities or the cost of finding the next bug then it's probably telling us something about like a property of software development, and I think can we say that for sure? I don't know yet, but I think like part of the reason why we wanted to publish this blog post was because, like, this is really what the data is pointing us towards and we, like we keep seeing it in different ways and when we try to actually rely on this property, it doesn't let us down, which is like another great signal.

Speaker 1: 28:43

Yeah. So it seems like no matter which specific code base it can be your OpenSSL, it could be a browser, it could be an OS and if you're measuring it with CVEs which are reported and measured in one way, or like Android's own measurement of vulns and bugs and a different reporting mechanism, because you have like a bug bounty and things like that even via all these different ways they correlate with the behavior and the signal that you're seeing in terms of how they decay over time and how mitigations like adopting memory safety and I haven't read the other Usenik study in a long time but how the old, unsafe code quote cures or matures over time, as it were, in terms of vulns and buns, yeah, yeah.

Speaker 2: 29:36

I don't know. It's interesting because it's a result that I simultaneously very much want to be true and kind of don't want to be true. Right, want to be true and kind of don't want to be true, right, like I don't want to be comfortable with, you know, large amounts of memory unsafe code that we keep around just because it's kind of proven itself. It's interesting because there are other efforts obviously to you know, replace memory unsafe code right. The ISRG Proximo project is one of those right. So I think we did like an NTP rewrite.

Speaker 1: 30:05

But that's an interesting kind of different case because that is like a full-on binary and you can just full-on like it's a very well modularized and abstracted out thing. You can just do a full rewrite and the quote you know, interface boundary is literally NTP protocol, not like not swapping out something that's you know, a core system component of Android or Chromium or OpenSSL.

Speaker 2: 30:29

I don't think it's applicable to the problem that, jeff, you're working on, right, but I think it's kind of a rhyming thing and the thing that sticks out to me is, like, when Proximo announced the NTP thing, right, like I have, you know, friends that are more plugged into exploit development and vulnerability research than I am right now and they were a lot less bullish on it than I was, like I was much more.

Speaker 2: 30:50

I wrote like a couple of Hacker News comments about, like memory safety there's still a live debate about whether memory safety is good on Hacker News, right, and you know I wrote some like rah-rah comments about, like you know, doing an NTP writing and Rust like makes sense to me, right, and I got pushback for people because it's like this is not where the vulnerabilities are, right, like one thing that these people think is happening is that, like this idea that we're doing blanket rewrites of memory unsafe code in Rust or whatever, like it's kind of a huge waste of effort to them because they have a much better sense of where they're actually going to find vulnerabilities and like this isn't where they are, which which lines up with, kind of you know, the strategic thing that you're doing right, where it's, like you know, relying on the half-life of the code, not freaking out about you know older kind of proven, you know memory unsafe period that you still have other countermeasures for, and all that right.

Speaker 2: 31:40

At the same time, it's like you know there have to be vulnerabilities in there somewhere.

Speaker 3: 31:45

Yeah, so the way we look at it is, we're not actually trying to get to zero vulnerabilities. I know that it's kind of a weird thing to say, but, like, maybe a better way to look at it is that if we as, like you know, system architects, are designing systems such that a single vulnerability is the end of the world, then we're really bad at doing design right. And so, like, no matter what, we have to design systems that are robust against the existence of vulnerabilities in those designs. The problem that we currently have isn't that we can't do like good, like like system architecture and security architecture. It's that the vulnerability density is so high and memory safety vulnerabilities, in particular, are so flexible that what we just what we see is we see chaining of vulnerabilities together in order to bypass good like robust system architecture.

Speaker 3: 32:43

Good like robust system architecture. And so I think, like kind of my thought here is like, yeah, is the occasional vulnerability going to exist? Yes, and like that's where we need to actually be applying kind of you know, defense in depth. And like good security architecture, and that's actually the solution there, because we're going to have those vulnerabilities for forever and even in like Rust code right, like we're going to have those vulnerabilities for forever and even in like Rust code right, we're going to have the occasional unsafe code issue in Rust and we have to be robust yeah.

Speaker 2: 33:13

One of the really striking things in the blog post to me was there's a graph here of the new memory unsafe code and memory safety vulnerabilities, right, and you've got like a little red baseline at 70% for like an industry norm for memory safety bugs right, and you've got like a little red baseline at 70% for like an industry norm for memory safety bugs right, and so, like things have changed it sounds like pretty radically from 2019 to 2024.

Speaker 2: 33:34

But back in 2019, you guys were still playing to win. I think, like you were one of the best you know secure software teams in the world. If you just think about the amount of effort that was going into securing that platform and paying attention to software security, you're, you know, you're one of, like probably the four most important targets in all of software security. So it's not like you guys were, you know, just you know phoning it in in 2019. And back in 2019, all of the memory on safe code that you had, with all of the countermeasures you had, with all of the you know whatever your rules were about library safety idioms and how you're going to like, how you're allowed to write C++ code and this platform and sandboxing.

Speaker 2: 34:15

You're still like at the norm, right? If it stayed like that, you would get the message that there is simply no way to make memory on safe code. You know secure.

Speaker 3: 34:26

Yeah, so I joined in 2014. And shortly after I joined, we had the stage fright vulnerabilities, which was kind of a huge moment. So Ben Hawks, the former head of Project Zero, wrote a really nice article about this recently recently where he refers to the Android team's response to the stage fright bugs as throwing the kitchen sink at the issue, which I think is exactly what you're describing right. We took an all of the above approach and I think what we really saw with our approach was that we were able to make progress by some measures, and I'll give you like an example, which is that we saw kind of like external exploit prices start to go up quite a bit for Android, and so that would be like maybe that had nothing to do with what we were doing, but more likely that that is probably a reasonable validation that our approach was working. That is probably a reasonable validation that our approach was working.

Speaker 3: 35:37

At the same time, the number of our memory safety issues continued to go up, not down, you know which again could also be a result of maybe we were getting better at looking for them, maybe we were incentivizing people finding them better who knows the reasons, right, but like it kept going up, and so part of what that caused us to do was to actually like, kind of take a step back and say this approach, while it's not like useless, do we need to look at this a different way? And so we started looking at memory-safe languages as an option right, because it's kind of an obvious solution. But then of course, everyone, including ourselves, thought okay, well, this is a solution that will, if we start now, maybe will have an impact in decades.

Speaker 1: 36:25

Decades.

Speaker 3: 36:26

Because we sit on this massive pile of legacy C and C++ and no one thinks we're going to rewrite it all.

Speaker 2: 36:35

But it's not. There's like a synergistic effect here, right, which is like it's. Your big result here is like all of the work that you've done post-stage fright didn't really seem to be moving the trend line much, right, like it was making a difference, but like, if you look at the chart it didn't seem to change much and it's like you have what looks like a pretty powerful synergistic effect.

Speaker 1: 36:53

Here you add not that much memory safe code and all of the things that you're doing to keep the memory unsafe code safe what's better and to put some numbers on that from your post, the percent of vulns caused by memory safety issues went from 76% of Android vulns in 2019 to 24% in the year of our Lord, 2024, well below the 70% for the industry of percentage of vulnerabilities, to be 70% memory safety vulnerabilities and Android is down to 24%, and that's from 2019 to 24. To mitigate or to respond to stage fright between 2015 and 2019 still had you at 76% of vulns in Android being memory safety vulns.

Speaker 3: 37:54

Right, because this is where the insight really comes in, which is that the problem is that we're introducing them at high rates.

Speaker 1: 38:03

And this is one thing I wanted to kind of like really drill into is that we talked a little bit about why the old code there's the density of memory safety vulns and also like the findability.

Speaker 1: 38:16

Like the density implies more findability but also like if you're in there and touching the code and working on it or working around the edges of it, you're more likely to find more vulns.

Speaker 1: 38:28

This is one thing that's interesting to me is, like one of the things that I seem to take away from your post is that the old code, the mold unsafe code, if we one stop touching it, we like over time we will.

Speaker 1: 38:44

If we stop touching it, it kind of shakes out the vulns that we're going to find. And if we stop modifying it and stop exposing probably things that were once safe enough, but like you change how you access these lines of code that do memory unsafe things, you all of a sudden expose a new memory safety vulnerability that you wouldn't have exposed if you didn't touch that code in the first place. And so that seems to both indicate one literally stop touching that old code and then you have to like leave it, have to have an interface between the code that you're not touching, but you're still using and the new code. And if that's actually what, what holds true, there seems to be, like you know, these uh interacting dynamics between you are writing more new code and introducing new vulns and, while you're in there, finding more vulns, but then you should stop modifying the old, old code so that you aren't like exposing new vuln paths to code that was OK until you fiddled with it, or something like that. Am I making sense?

Speaker 3: 39:53

Yeah, I mean. One thing that I would say is that you know, this trend only happens if, as bugs are found, you fix them. So you can't just not touch the code.

Speaker 1: 40:02

Yeah.

Speaker 3: 40:03

It's that you are increasingly putting code into essentially maintenance mode.

Speaker 1: 40:09

That.

Speaker 3: 40:09

And instead of if you need to add a new feature, you don't add it into your old C++ code, like, okay, now I'm going to add that feature over here in Rust or Java or whatever, and that's kind of how you shift teams away from it. Yes, and yeah, I mean.

Speaker 1: 40:28

That's what I mean, but I didn't say it very well.

Speaker 2: 40:31

Is there something here, like you know, if you're looking at that synergistic effect on, like we're keeping a lot of memory on safe code around but then the counter vulnerabilities is dropping quickly. Is there something here about like the idea of you know, vulnerability scaling not just with the amount of memory unsafe code you have, but with, like, the number of interfaces between memory unsafe things, like the links between modules, library calls, inter-process communication, whatever it is right, like the number of different, like individual blobs of code that are linked together somehow. And if you break that graph up, if you put more memory safe things in between memory safe code, then if the code is more straightforward to reason about, or it's easier to confine or sandbox, or it's easier to test. Is there a graph-theoretic way of looking at this?

Speaker 3: 41:18

So I don't know. I think that's a good theory. One thing that we're already noticing when we look at the kernel is that certain C APIs are, just like not safe to be called from Rust ever, and so what you see in the kernel is that they're actually needing to make adjustments on both sides of the language boundary, because what they want to do is they want to do. You know what you're supposed to do with an abstraction between C and Rust, which is that it is impossible for the Rust code to cause something unsafe to happen due to the C code. So, to your point, they're having to make the C APIs safer in order to do that, and so are we seeing some of that elsewhere. Yeah, I don't have an answer for you but probably Okay.

Speaker 2: 42:12

So, like the internals, like the implementation as, versus the interface, right, the implementation stays largely the same and largely memory unsafe, but the interface there has to get better just to make it work with Rust Exactly. That's super interesting. That's not a thing I would have thought of. Hey, a dumb question about the vulnerability count stuff that we're working with here, right? Do these include internal? What is the level of qualification for a memory safety vulnerability accounted here, right? Yep, I assume it's not all the way to CVE. I assume, like, there are internal findings here.

Speaker 3: 42:43

So it probably is to CVE, in that these are through Android's vulnerability rewards program, and what that means is that we released a vulnerability, and probably what I would say is probably about oh God, let me give you a very unscientific estimate, which is that probably about half of our vulnerabilities are internally discovered and about half are externally reported, and, of course, that's going to shift all over the place, and that figure would count both.

Speaker 2: 43:16

Yes, If you refactored something and foreclosed you had a bunch of vulnerabilities but they were never discovered as vulnerabilities because you refactored them and they were like that code was vulnerable. For this is not a. You don't retrospectively go back to code that was vulnerable before. I'm sure you do in a sense right, but there's also it depends, because we have support windows for Android releases.

Speaker 3: 43:38

So you will see like you can go, look at the Android security bulletin right and you might see a vulnerability that applies to Android 14 but doesn't apply to Android 15. And we actually one of the really fun things was they switched the ultra-wideband stack to a Rust stack and within days we had released this right and I think four memory safety vulnerabilities were reported on the old one, and so we still had to report.

Speaker 2: 44:09

You should see Deirdre's face right now.

Speaker 3: 44:12

We still had to report those, but they didn't impact the latest version of Android, so they weren't you know, they didn't have to be patched in the latest version.

Speaker 2: 44:20

Here's the thing I'm kicking myself for not having asked earlier what comes to mind for you as, like the, you know the top five big ticket rust things that are now a Android.

Speaker 3: 44:31

It's kind of starting to show up all over the place. But I think the virtualization framework is really interesting, because if we were going to look at a big chunk of high privilege code that, like applications and stuff are increasingly interacting with, that's it right? Yep, the other thing, yeah. The other thing is, yeah, the ultra wideband stack is a nice one to talk about, but what I think is actually really interesting is we're having less of these like big ticket, like here's the thing that we can talk about, and we're having a lot more of like oh, this team did their new feature in Rust, which is so there's a Rust component on a you know, tacked onto the side of a big chunk of C++, and that's just kind of becoming the norm.

Speaker 1: 45:21

The norm of like just any new thing is just sort of like the Rust component tacked on to existing stuff.

Speaker 3: 45:29

It's not the tacked on part, that's the norm, right? It's that like new things are moving to Bing and Rust instead of in C++.

Speaker 1: 45:39

Okay, good which is like the whole, the kind of like takeaway from this sort of work is like, no, really, you can get really big bang for your buck of kind of just doing that, which is, if you have something new, just write it in Rust or another memory-safe language and make it interop with the rest of your project and you will in fact get really good returns on mitigating your memory safe vulnerabilities, which is the majority of your vulnerabilities period, and you do not have to have this 10-year, 20-year plan to rewrite the world and rust to see dividends, to start paying.

Speaker 3: 46:23

Can I tell you all, one of the reasons why we did this blog post is because, you know, over the last few years we kept getting these really great like results right and they were great, but also like they were kind of like suspiciously great.

Speaker 1: 46:42

You're like this can't be right.

Speaker 3: 46:53

Yeah, there was a little bit of that, and what I think we kind of learned from that was we had done this study, that or this kind of internal analysis, and this is before that Usenix paper was even published. We looked at a couple of other code bases and so it was quite interesting to see that like okay, yeah, like the vulnerabilities really aren't like uniformly distributed across our code base, they really are in the code that we recently touched. So we had this idea that like okay, this is probably going to work better than people are expecting. But then it worked even better than we were expecting and I think I think the big part that we were actually missing was kind of this idea of, like our older code is also getting safer. Like you know, like if we have a half-life of of of two years, then in two years half the bugs have been shaken out of the code that you know, that code that was just written in in C or C plus plus and, and so you know, six years later think of the code that was written six years ago right, like it's so many bugs have been fixed and then not reintroduced because we're writing lower vulnerability density code, and so I think part of what happened was we had a little bit of like. You know, I don't want to call it like a crisis, but like you know what's going on here, why is this working so well? And we had to look into this.

Speaker 3: 48:12

And so the co-author on this blog post, alex, like, isn't on the Android team. He wasn't involved in like the work that we're doing on Android, but he was one of the main people who, you know, I was kind of working with on just like investigating, like is there an explanation for what's going on here? Yeah, so, so, like, finding, finding that paper and and also starting to find other resources that were, that were demonstrating similar things. And then, um, you know, we, we, we wrote that simulation, um, and, and like, a big part of that simulation was actually like, clearly there's a mathematical model and we can simulate it. Is the simulation going to show us the same thing that we're seeing? And, you know, imagine our relief when, like the simulation, you know pretty much showed the same thing and we're like, oh, thank God, like we now have an explanation for what we're seeing and it actually matches exactly what we would expect.

Speaker 1: 49:07

Yeah, I want to like re-emphasize what like we kind of already discussed, which is that it's patching and fixing the older unsafe code over time.

Speaker 1: 49:19

That's the only thing you're doing to touch it. Anything new is written in a memory safe language and you have to like figure out your API boundary improvements or the way that the new code, the memory safe code, will interact with the existing unsafe code, which might mean improving the C boundaries on the one side and doing whatever you need to do on the Rust side or the Kotlin side as well. And then I want to talk a little bit about like all this other stuff like fuzzing and linters and sanitizers and stuff like that. But it's like you aren't doing new development in unsafe languages and you are doing new development in safe languages is like kind of the crux, so that you keep fixing issues but like in a maintenance mode state of the unsafe code and you do new stuff including improvements, new features, not just a whole new component in the, the safe code, and those two things together like make this like explosion of reduction of loans or whatever.

Speaker 3: 50:29

So, yes, but I want to be clear, like this is this is like a journey, not a destination? Sure, yeah, and so you know. The idea of like, oh, you're no longer writing memory and safe code is like no, that's not what we're doing. We're in a transition phase and as and like that's. That's part of what we're trying to show in the graphs there. Right Is that it is a transition phase and that's part of what we're trying to show in the graphs there. Right Is that it is a transition and for us, it's been going on for six years. We still are introducing memory and safe code, and that's probably the main reason why we're still having memory and safety vulnerabilities, right?

Speaker 2: 51:03

But like the scale of if you look at the scale of the decrease of memory safety vulnerabilities, you have right. But like the scale of if you look at the scale of the decrease of memory safety vulnerabilities, you have right. That can't be coming just from replacing code, right, because you haven't replaced enough code for that.

Speaker 2: 51:16

It seems like the result that you have right now has to be saying that introducing some titration of memory safe code into a memory unsafe code base even as you implement new features in C++. Like there are large existing C++ code bases that each release need new features, right, and that stuff is not being done in Rust, it's being done in C++ to get the job done right. But despite that ongoing development, the result you have is that titration of Rust code that you have there or whatever other switches you did there, right, like you still got a sharp drop in those rolling abilities.

Speaker 3: 51:51

Yeah, and like to be clear. Like the drop matches exactly the number that we would expect based on the amount of memory and safe code that's being added or modified. Wow, it's really an interesting result.

Speaker 2: 52:05

How much of that effect do you think comes from the introduction of memory safe code? Makes it easier to focus, countermeasures, focus, like you know, sandboxing or standardization. How much of that comes from making those things more effective and how much of it do you think comes from just simply offsetting the amount of memory safe code. There is right.

Speaker 3: 52:23

So we think that most of it comes from offsetting, and we think that because we are also looking at other projects that aren't doing this or doing less of it and they aren't seeing the same result and Chrome and a lot of these big projects have invested so much into fuzzing clusters, into sanitizers, into a whole bunch of like big beefy stuff.

Speaker 1: 52:55

And you just quote you just start implementing most of the new stuff in memory-safe languages like Java, kotlin and Rust and that makes such a profound difference compared to the investment done for these other techniques. It's kind of mind-blowing, which is part of why we don't even believe what we're seeing in front of our eyes. We have to replicate this. It's amazing, and one there's some upfront investment of getting these new languages integrated into the project, into the tool chain, training up developers, getting new. There's like upfront costs to get that going, get that going. But it seems like especially Android has gotten kind of the flywheel going because once you're kind of rolling you've fronted the cost. It's not an ongoing cost of the cluster and compute and all this sort of other stuff that these other techniques bring with it. You're not paying hundreds of thousand dollars every you know billing period for your fuzzing cluster.

Speaker 3: 54:02

Yeah, so, so like, the results are really interesting, but I think the other really important part of the blog post is that it actually talks about why this scales, yeah and and like. Really, the most important reason why this scales is because it's cheap, yeah, and it's cheaper than doing the other thing. Right, like, a lot of these techniques actually scale incredibly poorly. Yeah, if, for every line of, or you know, every new function of, you add in a memory unsafe language, you then have to add a fuzzer and then now we actually need to dedicate hardware to doing that work. Right Like, these things actually scale along with the amount of code that you're writing, and that's actually really really expensive and what you find is when teams are under like deadline or shipping pressure, guess what gets dropped.

Speaker 1: 54:56

Yeah.

Speaker 3: 54:58

And so, yeah, if you can actually build in the safety to the code development process, then you've not only reduced the cost that you're dedicating the additional cost, you're actually making the cost of development itself cheaper. Right, Because your code ends up being safer. And I think this is actually one of the most important parts of what we're talking about here, because when security people tend to talk about costs, they tend to talk about costs in sometimes kind of like ways that are unproductive towards businesses that actually need to ship things, and so one of the ones that we always hear on security teams is this idea of like raising attacker costs.

Speaker 1: 55:41

Yeah.

Speaker 3: 55:42

Right, like oh, we've, you know, we've got to. We've got to make the cost of exploitation more expensive. We've got to make it harder for them to find bugs. We've, we've got to do these things Right and and like. Unfortunately, the way that we tend to raise attacker costs is by raising defender costs.

Speaker 1: 55:59

Yeah.

Speaker 1: 56:04

You know that that sounds great for job security, but it's not good for actual security, right, and it's also bad for, like your team's how they experience working on this project and maintaining this.

Speaker 1: 56:13

You know this code base, like if you're, you know, pushing a change and you have like a million processes that are like, nope, roll this back, nope, this doesn't pass.

Speaker 1: 56:23

And then you like it's a lot of work to both defend the code from an attacker, but also it's a lot of work to just work on your change and just get it in and just trying to get something done and then like one one of the awesome things that you know you mentioned is like just a bullet point, which is, you know, it's not just a bullet point, it's like languages like Rust but other memory safe languages, shift bug finding further left much earlier, before you even check in the code, so that Rust changes in the Android project are rolled back at half the rate of C++ changes getting checked into the project.

Speaker 1: 57:03

It means like when you have something working, passing, all the tests pass and it's been, you know, checked off and merged in, you're not likely to have to be like, nope, this broke. Nope, this has evolved. No, we have to back this out and like that affects your efficiency and your velocity and also your kind of developer happiness and like productivity in general, and that all costs money, because if your developers aren't happy, they're going to leave.

Speaker 2: 57:30

If you haven't noticed, Deirdre is trying to sell people Rust.

Speaker 1: 57:33

Like look at all the good stuff that comes from it. Like there's a whole bunch of good stuff. And then we haven't even talked about the efficiency of the implementation of, like, the QR code parser thingy. That's like a million times faster in terms of like it's a million times faster because they got rid of an IPC sandbox. Oh well, okay, and this sounds fast to me. It doesn't matter exactly what.

Speaker 3: 57:55

Yeah. So the other thing I was going to say is that, like, we have to get approval for like every number that we share, right, so, like, getting the like rollback rate was, was like something that we can share, but but when we look at things so let me see measures like we can't find a measure that tells us that it's taking longer to write REST code than it is to write C++. Right, like everything, like any, any metric that we can find is is showing us that teams are more productive. They're they're they're going faster, they're going faster with lower defect rates. Yeah, and you know, I think our director of what Lars, he's the director of platform programming languages on Android.

Speaker 3: 58:50

He did a talk at this about resignation, but, like we, we can't find a metric that anywhere that tells us that, like, using C++ is is better for any type of velocity or quality. And so what I think is kind of fascinating is what that means for getting teams to use Rust is, first of all, if we can get them to switch from C++ to Rust, or when we get them to switch from C++ to Rust. Teams do not want to go back to using C++. Yeah, they don't need to be like re-incentivized to using C++. Yeah, they don't need to be like re-incentivized. They have the incentives that they want because they're able to actually ship the things that they want to. You know like they're able to accomplish their work with fewer barriers.

Speaker 2: 59:31

I was going to ask right about the general developer experience here. Right, like it sounds like the experience they're having. Is you know? Here, right, it sounds like the experience we're having. Is you know? Once you get people to the point where they're shipping successfully on the routes, they tend to stick there.

Speaker 3: 59:44

Yeah, we don't need to like incentivize them to stay there. I would say that, like you know, you do still need catalysts to get people to make the switch, and so, you know, creating incentives or disincentives or whatever is still really quite useful. And you know I talked about one just because we've published about it, the rule of two right. It incentivizes teams to not add the kind of complexity and overhead of doing additional sandboxing. But there's lots of ways to both incentivize the use or to disincentivize the use of C++.

Speaker 2: 1:00:22

Was there anything surprising about getting Rust ruled out across the project, like in terms of adoption and people's experiences with it? I don't think so.

Speaker 3: 1:00:32

So I've been on the Android team for 10 years and it's the only team I've been on at Google, so I don't feel like I have like. I don't have broad experience, but like I feel like Android people are just like very practical, and so if they're told like like oh hey, like we have a better tool for you to do your job, then people are like OK, I'm willing to try that. And then, you know, more often than not they're like yeah, this is a better tool, let's keep doing this.

Speaker 2: 1:00:58

And it's generally somewhat straightforward to do like Rust and C++ in the same process, like in the same code bases just crawling across.

Speaker 3: 1:01:06

Yes, so Android is built on top of a mountain of C++, right Like there's no like pure Rust processes in Android. Everything is a combination. Even things that are mostly Rust still rely on basic system libraries, and you know Lc, libbinder. And so we did a blog post actually about interop and I talked about the Android team being like a very practical team.

Speaker 3: 1:01:33

I think kind of our blog post about this and our approach is actually kind of reflects this, which is that what we wanted when we looked at interop was we wanted something that was, you know, mostly convenient, and also we were we just kind of admitted to ourselves that occasionally teams are going to experience some inconvenience and have to handwrite a binding or something like that, and that that was okay. And so that's kind of what our blog post reflected was that we thought it looked like about 90% of the cases, some like fairly standard and not that complicated tools were going to be sufficient for you know, reasonably convenient interop between C++ and Rust. And you know, we just kind of admitted we're like, well, we're going to keep hacking away at that remaining 10%, but also we're not going to block on that remaining 10%, because the alternative is that someone who's paid to write code may be paid to write some more code. Right, and that's an okay thing to happen, yeah.

Speaker 1: 1:02:43

And that's like the kind of 10% is like working on interop tooling like Crubit and AutoCXX Is that? Are those some of those? Yeah, and that's kind of talking to both sides of the kind of binding layer of like you need to do some work to make the C side interfaces a little bit better so that you can make sure that the Rust code stays safe and with correctly defined behavior when it's calling the C code, sort of deal.

Speaker 3: 1:03:11

Yeah, what we want is that convenience, but also the safety of that convenience, right? So like? Bindgen is a good example of something that will just happily toss up an interface.

Speaker 1: 1:03:22

Lots of unsafe and other things. It may not be.

Speaker 3: 1:03:24

yeah, yeah, it's more likely than not going to not have its safety properties.

Speaker 1: 1:03:30

Yeah.

Speaker 3: 1:03:31

And so you know, and we use BindGen all over the place, and then what we have is we have people who are available to review those things for teams. But yeah, that's not the ideal state to be in. But again, I think, kind of like our approach has been that we're not going to block on the ideal things and our results are essentially that doing the okay thing is actually very effective. Doing the okay thing is actually very effective. And so you know, like what I would hope is that people aren't blocking on perfect solutions, that they're moving forward with non-perfect solutions, because the non-perfect solutions turns out work pretty well.

Speaker 1: 1:04:10

Yeah, Especially when, like especially when you have results like these that literally say not only did we not rewrite the world before we found results, not only did we just add some rust or add some memory safe languages and not like block on perfect solutions of binding generators or anything like that that are perfectly memory safe and have defined behavior or whatever, but we kept writing C and C plus in some areas of our project and we still had a major reduction in memory on safety vulnerabilities, uh, over five years. So like having having the like the extremely uh material results of this, like don't let not even don't let the perfect be the enemy of the good of like just get started. If you just get started, you will have like crazy ROI on your investment and it's and the ROI scales. It's not even like you know, it's like going to be linear indefinitely, but better than the thing you did before. It's like you do a little bit of ROI and like it's kind of plateaued in terms of the investment. It just keeps on growing.

Speaker 3: 1:05:20

Yeah, it's funny right, because you know, I think this isn't just security people but software people in general really like to be, you know, kind of pedantic. And so you all work in cryptography right and in cryptography that like that's, that's a necessity we have to be pedantic, exactly.

Speaker 1: 1:05:39

Not all software has to be pedantic.

Speaker 2: 1:05:42

Yeah, the word you're looking for is rigorous.

Speaker 3: 1:05:45

Yes, thank you. Thank you, much better word. But yeah, what I would say is that when we show these results to teams that are wondering what they should do, they really do kind of change. Like. A couple of examples are like teams who are like well, it doesn't matter if we use Rust for this thing because this other thing's in C++ and therefore all bets are off. Toss everything out the window might as well C++ for eternity. And then the other one is it's like you're almost giving teams permission to be like no, the stuff that you like, your work, your previous work, your, your, your team's investments like those are still good you get to you get to keep using those like yeah

Speaker 3: 1:06:28

we're not telling you to throw out everything that you've done, and so that I think there's also kind of like a sense of like ownership that people want to retain and and in the work that they've done, in the code that they've written, and it just ends up being very convenient that the result says that that's probably something that you should do. And, yeah, one of the things that we had originally put in the blog post was kind of a stronger statement about like, hey, like, even in memory, unsafe code bases, about a third of your bugs are non-memory safety. So you know, if you have code that, say, you could calculate that yeah, two thirds of the bugs have already been fixed in this, or more than two thirds, and you go and rewrite that, odds are you're going to reintroduce all of the non-memory safety bugs, or you know.

Speaker 1: 1:07:18

Yeah.

Speaker 3: 1:07:19

You just have high potential of doing those things Right and and so it just like.

Speaker 1: 1:07:24

maybe we shouldn't be doing that at all, and and like basically, this is evidence that you really don't have to. Like you can you get extreme bang for your buck and it makes all those things a lot easier and a lot easier to get started.

Speaker 2: 1:07:39

Yeah, I mean, the blog post undersells it, right. This is, I think it's a pretty, pretty huge case study that you've got here right, like this is, like I said, this is one of the more important security target code bases in the world. And yeah, I mean, just in terms of the ROI you guys have gotten on, you know the, it looks like kind of a practicable amount of you know, introducing Rust code. It looks like a pretty big win so far. So, yeah, I'm psyched to see how this plays out moving forward. But yeah, this is really this is a lot more interesting than it kind of than it looks like on the tin, like, yeah, we rewrote some stuff in Rust. That's not the story here. So, yeah, super interesting stuff.

Speaker 1: 1:08:17

Thank you. Thank you, Jeff. Do you have anything else that we didn't touch on that you want to send us off with?

Speaker 3: 1:08:23

I feel like you don't have to. I feel like we actually discussed most of what I wanted to talk about. You know, I'm glad we got to talk about the like kind of shifting away from it from like an attacker based mindset. I think that's one of kind of another like novel thing to to talk about, to to security folks, especially on this topic, because it's it's so just kind of like ingrained in everyone that like we look at what attackers do and our primary goal is to is to like frustrate attackers yeah, it's interesting, right.

Speaker 2: 1:08:58

It's like there's like um in, like, if you talk to like lawyers, right, like you have like a mental model of a lawyer, somebody who goes to the court but most lawyers aren't, whatever, right, there are litigators and then there are people who do like contracts and review and strategy and all that stuff, right, and here you've got like this is like you've got like this thing where you've got like you have a software engineering answer in this phone, like you're looking at as a software engineering problem, not as an adversarial process against attackers. Right, like it's. You know there is that process happening at the same time. But like you also want the nuts and bolts of things. You want to be situated well so that, like, if it ever, if you ever do come into contact with you know attackers, right, like you're in a much better position. So, yeah, I mean, I'm also a sucker for software engineering stories. Yeah, this is, it's pure software engineering. It's awesome.

Speaker 3: 1:09:43

Yeah, yeah. So you know, I talked about like raising the cost for attackers and how we do that with like exploit mitigations, but there's even like fuzzing right where it's like it's not even like that. This is a good thing for defenders to use. Fuzzing actually is a little bit problematic for defenders because obviously attackers only need to find a couple of bugs and defenders need to find more of the bugs than the attackers. But yeah, to your point about it like being a software, where answer is is that's what I think is actually really exciting about it and why it actually kind of works in Defender's favors, cause it's like we're just like ignoring the attacker, right, we're saying like this actually drives down our costs and it drives down our costs by just improving the software quality, um, and then, of course, it almost ends up being a side side effect that it actually drives up attacker costs. So it's, but by trying to make our own lives easier, we've, we've, we've made attacker lives more difficult and yeah, like it's, it's just kind of an interesting result by, instead of focusing on attacker strengths and trying to undermine attacker strengths, we actually said no, like defenders have their strengths and their strengths involve things like owning the developer environment, focus on our strengths.

Speaker 3: 1:11:22

And yeah, it's at least an interesting area where that works. And I will say like it doesn't just work in memory safety. It actually works in other areas, like you know. We mentioned like cross-site scripting or there's like SQL injection or things like that, where we've like applied the same technique and seen like similar results. And I think what's kind of exciting about memory safety is this was the one where it was like, oh sure, that works in those little areas, but will it actually work in kind of like the industry's Achilles heel? And yeah, I think so far it looks promising.

Speaker 1: 1:11:52

Yeah, and like this is sort of like the memory safety part of these languages, but like the same can be said about some of the other nice features of languages like Rust, which is like strong, cost-free typing and like a whole bunch of other stuff that just make making correct code a lot easier and just ready to go and the tooling is there for a developer to just do quote the right thing or do a good thing very easily and very.

Speaker 1: 1:12:22

You know, it's not like dragging yourself over glass to do the right thing. The language and the tooling make it very easy and helpful to just do the right thing and it's fast and it's efficient and you just get to ship good code much easier and it mitigates other classes of vulnerabilities besides memory safety and it's the same sort of effect and it just makes the software better, which happens to make it more secure and the attacker costs low and it makes a lower cost to the business who's actually doing the project or anything like that, so you can sell your CISOs or whoever on. Like this is a good idea because it'll save you business money and money, money and costs for the attacker too. Yeah, jeff, thank you so much. This is awesome. Thank you so much for for letting us pick your brain with lots of annoying questions.

Speaker 3: 1:13:14

Yeah, thanks, thanks, thanks for having me.

Speaker 1: 1:13:17

Totally Security, thanks, thanks. Thanks for having me totally uh, security, cryptography, whatever is a side project from dear to collie, thomas tachek and david adrian. Our editor is netty smith. You can find the podcast online at scw pod and the host online at durham, brusselham, at tkbf and at david c adrian. You can buy merch online at merchsecuritycryptographywhatevercom if you like the pod. You can give usa five-star review wherever you get your podcasts. Thank you for listening.