Security Cryptography Whatever

E2EE Storage Done Right with Matilda Backendal Jonas Hofmann and Kien Tuong Trong

Season 4 Episode 9

It seems like everyone that tries to deploy end-to-end encrypted cloud
storage seems to mess it up, often in new and creative ways. Our special
guests Matilda Backendal, Jonas Hofmann, and Kien Tuong Trong give us a tour through the breakage and discuss a new formal model of how to actually build a secure E2EE storage system.

Watch on YouTube: https://youtu.be/sizLiK_byCw


Transcript: https://securitycryptographywhatever.com/2025/05/19/e2ee-storage/

Links:

- https://brokencloudstorage.info

- https://eprint.iacr.org/2024/1616.pdf

- https://www.sync.com

- https://www.pcloud.com

- https://icedrive.net

- https://seafile.com

- https://tresorit.com

- https://eprint.iacr.org/2024/989.pdf


"Security Cryptography Whatever" is hosted by Deirdre Connolly (@durumcrustulum), Thomas Ptacek (@tqbf), and David Adrian (@davidcadrian)

Speaker 1:

And then they take the IV and then the length of the padding, and then they encrypt it again, this time with, again with CBC encryption, but this time with a very specific IV, which is 1234567887654321. Just a series of peculiar decisions.

Speaker 2:

Hello, welcome to Security Cryptography, whatever I'm Deirdre.

Speaker 3:

I'm.

Speaker 4:

David, I should not be awake right now.

Speaker 2:

We have three special guests today. We have Matilda Backendahl Hi Matilda Hi. We have Jonas Hoffman Hi Jonas Hi. And we have returning champion Kian Tuong Truong. How are you, kian?

Speaker 1:

Doing great. Happy to be back.

Speaker 2:

We have three special guests today, because they've put out some very cool attack and construction research all together on end-to-end encrypted cloud storage systems, and I think Thomas is in the background salivating over all the fun little attacks that you found in all of these systems. Kian and Matilda are currently at ETH Zurich and Jonas is at TU Darmstadt. How giddy are you about the attacks in this paper?

Speaker 4:

You know, I think my general feeling is they don't make attacks like this anymore. Like some of the things that you see in this paper are things that this might be your last chance to see some of these attacks ever, so I'm really excited to talk through this stuff. So these are attacks on cloud drive systems, cloud storage systems. There's like six different systems that you guys looked at. I guess, like you would do a better job than I would of kind of introducing you know what your targets are here and what really I think importantly here, what the threat model is like, who you're concerned about actually conducting these attacks. So why don't you try this instead of me?

Speaker 1:

Maybe, I can start this discussion or I would like to, you know, introduce my other, my fellow PhD students here. So I would like to start, maybe historically, like where does this all start? Actually, it starts with Matilda, because Matilda wanted to look at Mega, and maybe she will say more about that first, but essentially that is the main inspiration for our work. So I don't know, matilda, do you want to start by saying how it all started for you?

Speaker 6:

Sure, yeah, I was just looking it up now actually, because it's been a couple of years. So it started because we wanted to build more advanced security for file sharing systems. Actually, we were looking at things like forward secrecy and much more advanced security properties really than just end-to-end encryption, and then at some point it dawned on us that there's not even really good like basic security for Cloud Storage, and so that's how this work started. We were just going to look at a few systems to get inspiration and then try to introduce these more advanced properties. And then we started looking at Mega and quickly realized that it was broken. So there was not much to do except to just keep going down that rabbit hole, and that was primarily Miro Haller, who's not here today, but he did really a lot of the work finding the attacks there, together with Kitty Patterson as well.

Speaker 2:

Yeah, I think they were on previously for attacking Mega and that was a lot of fun. How frustrated were you to be like I wanted to do something new and better and I have to go and just do the basics first before we even can get to the next stuff of forward secrecy or forward secure or post-compromise secure, end-to-end encrypted storage?

Speaker 6:

I think it was actually quite fun. You know, I was sort of excited to have stumbled upon this part of cryptography where we were behind, where, you know, I expected us to have much better security, especially coming from. You know how end-to-end encryption has become the default for messaging systems and so on. I really expected also data at rest to have better guarantees. So it felt kind of like a bit of an aha moment, like, oh, here's a place where we as a community haven't looked enough and there's a lot of, you know, low hanging fruit and open problems that need to be tackled. So not so much frustrated as excited, actually.

Speaker 2:

Okay, you looked into at least four publicly available production, end-to-end encrypted like cloud storage systems, kind of like your Google Drive or you know Box or whatever. But they weren't those ones. They were some of these other constructions, because Google Drive is not end-to-end encrypted. It has this other weird thing where you trust Google a lot more. Can one of you run through what your approach was and generally what you found to high level and then we'll get into each one?

Speaker 1:

Yeah. So essentially, I mean, I guess the idea here is that after Miro published his thesis, I looked at the list of cloud storage products and said, OK, so some of these might be interesting. And at the same time there was Jonas coming in who requested a thesis and I thought, yeah, maybe that's a good thing to start with. And we looked through the list and we saw, at least for the start, sync and then other the other providers came a little bit later. I think maybe jonas should talk about how he what we found on sync yeah sure.

Speaker 5:

So, uh, sick was the, the first provider that we started with, basically, and the idea was first to just look at the provider and then decide basically where we're going to go from there, depending on whether we find something that we can prove or something that we can break. But the expectation was rather that, since there were already attacks on other Cloud Store systems, that it was not going to be super secure, probably. So what we did is we looked at sync. That that means we looked at the web application mostly and tried to find out what exactly is going on, and we found a couple of attacks right away or within a short amount of time.

Speaker 4:

MARK MANDELMANN Before we dive into the attacks you guys should describe, especially for people in the US that might not be familiar with all of these apps, right Like Sync, in particular Sync. Let me see if I can do this from memory. You guys did Sync pCloud, IceDrive, cfile, trezorit.

Speaker 3:

Did I miss?

Speaker 4:

one.

Speaker 3:

That's it. We did five.

Speaker 4:

Yeah.

Speaker 5:

What are these things? Basically, they're providers of end-to-end encrypted cloud storage and, in comparison to regular cloud storage providers like Dropbox, Google Drive and so on, they don't just offer encryption in transit so when data is sent to the server and encryption at rest, when data is stored on the server, but actually provide end-to-end encryption. That means that the server shouldn't be able to see any of the data or also modify any of the data that is sent by the client, so the client should be the only party that is in possession of the key material and should be the only party that is able to access their files.

Speaker 4:

Basically, and all of these systems. Essentially the keys are held client side, but really all of these systems are password protected.

Speaker 5:

Yes, so in the sense that the user password is used to derive keys and then to allow the client to encrypt the data when sending it to the server. So, because these cloud storage systems should be used also like should support multiple devices and so on, all the key material that is used should be accessible by a user password that the user can enter on different devices.

Speaker 4:

Gotcha and I wasn't familiar with Sync before I read this paper, but is Sync a big deal?

Speaker 5:

It's not as big of a deal as Mega is. It's not like the biggest provider in the space of entering the crypto cloud storage but they have around 2 million users and there are some institutions that are probably interesting that are using this software, like the Canadian government, for example, so it's definitely an interesting target.

Speaker 4:

So a pretty big deal. And this is like a SaaS application. This is not software that you run yourself or you run the client, but they run the service for this.

Speaker 5:

Exactly so. In the providers that we've analyzed, there's also one provider where you're able to host your own instance, so where you're able to run your own server, which is C file, but for all the other applications there's a server that is basically run by the provider and that you can access by running your client locally and then sending information to the server.

Speaker 4:

So in the sync case you're doing some amount of reverse engineering.

Speaker 5:

You could call it that we're looking at the code of the web application, which is nice because it's already accessible in your browser, but of course, there are some measures to prevent understanding what's going on, like there's some obfuscation and you need to get into what exactly is going on by looking at the code and also looking at the requests and the responses that you're sending and receiving from the server, which is the largest chunk of understanding how the protocol works.

Speaker 4:

Cool. So with that in mind, like roughly, what did you guys find For?

Speaker 5:

sync we found that a few different attacks that allow us to attack the confidentiality and the integrity of files and of metadata. So in particular uni, there's. There's one interesting key replacement attack that we could do for sync, which basically allows us to, or allows a compromised server to, replace key material that the client is storing server side and then allows us to, or allows us a compromised server to, look at files that the client is sending to the server. So any file that is uploaded in sync can be read by a compromised server. But the server is able to do more. They're actually also able to read a lot of metadata. To modify the metadata, they can, for example, attack the binding between a file name and a file content. So if you have two files, you can basically swap the content of the two files and it's not an issue in the protocol. And there's also SyncSync. It's a provider that offers sharing of files so you can share an end-to-end encrypted file with someone else. There's a problem with the authentication of the public keys of users.

Speaker 4:

You guys looked at like five different systems and some of them did sharing and most of them did not do sharing. Did any of the systems that do sharing do so successfully or securely?

Speaker 5:

So in Tresorit. So Sync and Tresorit are the two providers that offer sharing. In Tresorit there is actually a protection for key material. So this issue with the all authentic keys only arises partially. So it's a bit of a matter of definition whether they do it successfully or not, because the problem is that public key infrastructure that is run by Tresorit is operated by Tresorit themselves. So since we're in a setting where we assume that an attacker can potentially compromise provider infrastructure, it's quite likely that they would also be able to compromise this public key infrastructure, and then the system is attackable. So it depends whether we assume that this public key infrastructure is compromised or not, and if it's not compromised, you could say that Tresoring is doing it successfully. Otherwise it's also an issue, and Sync, the other provider, is definitely not really doing it successfully.

Speaker 4:

Gotcha. So from my read of the paper and from what you just said, tell me if this is like a bad summary of kind of what you guys came up with but it seems like, apart from the metadata stuff where every one of these systems it looks like they don't protect metadata In some cases, just straight-up plain text for you know file types and where the file came from and stuff like that. That aside, there's like two broad kinds of attacks here, right? That that aside, there's like two broad kinds of attacks here, right? The first being like usually key manipulation things where from that point on, from that attack on the server is going to be able to read the contents of files that were uploaded. Like that's the first broad area of attacks is reading files. And then the second and I think kind of in the paper more represented attack is servers being able to inject files into people's file systems, like control over what the content is. Those seem like the two big kind of broad cryptographic areas of attacks that you guys found. Did I miss a thing there?

Speaker 5:

What I would add, maybe, is that there are problems with integrity of protection of files in general, of files in general. So there are some providers that just don't have any integrity protection for the encryption schemes which would allow a compromised server to also change the encrypted files by messing with the ciphertext, which maybe doesn't fall into this category of injection attacks but is also like an interesting attack that is possible for multiple different providers. In the same category are attacks using unauthenticated chunking. So in the setting where the client splits the file into chunks before uploading it to the server, there are instances where the server is able to then swap these chunks around and do like a mix and match to create a ciphertext that they want, which is also a problem.

Speaker 4:

Arguably the stuff where you're not doing integrity protection for existing files is scary in a different way than being able to inject files directly. In that like, if you've got executables or things up there, if you can like tamper with files that are, you know, in some way implicitly trusted, then you're breaking other security models. But you also have that problem with just uploading a file right. Just the file being in somebody's like encrypted vault online kind of gives it an implicit trust in somebody's encrypted vault online kind of gives it an implicit trust. So that's basically the big attack that you're thinking about there, with both the integrity and the injection stuff is the server is going to put things there that people are going to trust, are part of those volumes and are not.

Speaker 5:

Yes, so it could be, for example, executables. But we could also think about a setting, like a political setting, where some adversary, some nation-state actor, is trying to put compromising material into the drive of some unwanted party, something like this, to try to incriminate them in a specific way. This would maybe also be covered by this injection attack, which is also a problem, of course, right.

Speaker 4:

And then simply as cryptography engineers and as connoisseurs of decent designs right, this seems like table stakes. Right, the server shouldn't be able to just randomly make up content and have it appear to be authentic. I put it there. So, in the same case, I I got you off track earlier, so talk to us about what sick looked like, what you guys like, how these attacks actually worked, or maybe starting with sync. I think pCloud is they're reallyCloud is the really fun one for me, but Sync seems like a good starting place.

Speaker 5:

Yeah.

Speaker 5:

So for Sync, we looked at the code and it was already quite clear very early in the process that there were going to be rather severe vulnerabilities.

Speaker 5:

There were some red flags that we found, for example, this issue with the key replacement, so missing authentication for the key material that is stored on the server. This was like something that came up quite early, and then we spent some time looking at sync and basically then had to take a decision whether we're going to deep dive into SYNC a bit more or whether we're going to spread out and look at different clients. And here the motivation was that since we already had some interesting attacks on SYNC, it probably made more sense to also look at other providers and see if we could break them as well. So we started looking at the other providers, basically at the same time, and then we tried to see if the same issues are always. Basically that we found at Sync and, as it turns out, there are definitely some common failure patterns, so things that people get wrong in products that are developed independently of each other, and that is quite concerning. That is like the most concerning thing, at least for me, regarding our paper.

Speaker 2:

So it looks like very common is that these, like perf, file system keys or just key material in general, is unauthenticated, unauthenticated encryption, unauthenticated sharing keys, unauthenticated metadata, like you said file names, size, type, date, like all this stuff is unauthenticated which allows, like you know, a power, a server that's been compromised or you know, you are theoretically trusting that you shouldn't have to trust to just manipulate whatever it seems like there's just you just said that there's just like a common naive construction pattern amongst all of these, like how are, what is the general pattern of how these things are being constructed? Pattern amongst all of these, what is the general pattern of how these things are being constructed? And your first suggestion of like no, just have a real AEAD, it sounds like these things need a real AEAD, authenticated encryption with additional data at least, and then we can start building from there and at least a lot of this data wouldn't be unauthenticated and manipulatable by the wrong party, aka the untrusted server.

Speaker 1:

Yeah, I think this really is sort of a culture mismatch, let's say, because you look at the websites of all these cloud providers and the thing that they say is something around the lines of we provide zero knowledge encryption which is you know what does it mean for you with zero knowledge, encryption, and for us, as Thomas also said, it's this thing about integrity.

Speaker 1:

you know it's table stakes, it must be there, Whereas I don't think this concern has been, you know, absorbed in the larger public. So when they say our cloud storage is secure, what does it really mean? And apparently it doesn't involve integrity.

Speaker 6:

Can I add to that? Because I think, having looked at some other systems beyond Mega, my takeaway was that most of them understand that the file data itself needs to be integrity protected, but then they fail higher up in the in the hierarchy. So, for example, it doesn't seem to be self evident to developers that keys need to be protected with authenticated encryption, and then, of course, you have no protection for the, for the file data lower down. But this fact that, like also key wrapping steps are in the key hierarchy, do you say that?

Speaker 2:

that just doesn't seem to be obvious yeah, especially because, uh like, a key feature of all these systems is you put your encrypted file in the the end-to-end encrypted cloud storage and you have to be able to share it, either with yourself as a new device or with somebody else, to be like hey, here's a link, a link to my file that's in my encrypted thing. I am wrapping the file key or a key, decryption key. I'm sending it to you in some way, but that means you have to store encrypted key material. The plain text is more key material to decrypt the file, but you have to store that encrypted key material on the server or you have to store it somewhere to share it.

Speaker 2:

And it seems like that is being treated differently than the actual files that you're uploading and I have a hard time understanding why and like okay, so are they encrypting? They're not encrypting it with like AES-GCM, like the key wraps. The key wrapping is not like an AAD or something like that. What are they using to actually like encrypt these file keys or this key material?

Speaker 5:

It depends on the system, of course, but in a lot of cases they're using, for example, some asymmetric encryption like RSA, oep, to encrypt symmetric keys which is not authenticated. But yeah, for example, it still protects the confidentiality of the keys, but not the authentication, and then there's a problem with that. Now, there are also instances where people actually do it properly and use an AAD scheme to encrypt another symmetric key, for example, but oftentimes it's in steps that use asymmetric encryption where this process fails.

Speaker 2:

Is that Trezor it?

Speaker 4:

I mean, I think they're really probably, you know, from an aesthetic perspective, let's say, probably the coolest attack or the stunt cryptodeist attack on the paper is pCloud, right?

Speaker 4:

So there's a situation where you have an RSA OEAP key, you have an RSA key pair and then the private key there it's encrypted, but it's encrypted with CTR. They went out of their way to use a relatively modern RSA and then non-authenticated encryption for the key itself and pCloud. My understanding from the paper you're just going to correct me here if I'm wrong about this is they do a consistency check, right, so you can't just inject an arbitrary public key. Is they do a consistency check, right, so you can't just inject an arbitrary public key and then I guess so the encryption of that private key is bound in some way to the user's password, so you can't just swap out the key completely because the private key wouldn't decrypt properly at that point. And they check to see if the public key matches the private key, but it's unauthenticated. So you can somehow bit flip that private key to match an arbitrary public key. Like, can somebody first of all fix my explanation of that attack and then kind of walk us through it, because it read really fun in the paper.

Speaker 1:

Yeah, that is correct in the sense that what you have is this encrypted RSA private key somewhere on the storage and what you retrieve is both your public key and your encrypted private key. Now the private key, as we said, is encrypted using this weird modification of counter mode, whereas the public key is just there, unauthenticated, just given to you. Now you would like to give some arbitrary public key to the client, because then the client is going to use the public key to encrypt their own keys, and so if you choose the public key, then you can decrypt whatever you want. Now the problem there is that you also have to provide a private key which is consistent with whatever public key you provide, and that becomes a little bit harder. But because they use counter mode, you can start bit flipping within this private key and then do some fun things.

Speaker 4:

MARK MANDELMANN yeah, like an RSA private key. It's mostly random, right. So how would you bit flip that you don't know what bits to flip?

Speaker 1:

FRANCESC CAMPOY FRIEDMANN yeah, except for the fact that, thankfully, they encode everything there and that means that we have a nice header. We have a nice version part of the header. There's the length, there's even the public key inside, so you know exactly what bytes are in between there, and then clearly, there's a private part that you don't really know. However, then turns out that you don't actually need to bit flip that part because the public part is large enough to encode a private key within it. So now you have to imagine that there's a DARE parser that is going to. So you're going to decrypt this private key, you're going to parse it and then whatever is at the start of the DARE encoding tells you how long the key is going to be. So you just cut it a little bit short, encode another private key inside which is shorter than the one that you actually have, and then there the coder is going to just discard whatever data you have created.

Speaker 4:

This is like Deirdre was wondering why I like this paper so much.

Speaker 2:

There, you go yeah, oh boy.

Speaker 1:

This was a fun attack. This was fun except for the fact that it has a little bit of a few caveats, because you have to encode a smaller private key. But depending on the libraries that are decoding this private key, some of them will not accept any arbitrary private key. Surely you could put a private key with some arbitrary value for n and maybe you can put your public exponent, e, to be 1. That would be pretty fun. Except that there are some consistency checks when, say, openssl imports this key and says, oh, clearly the public exponent can't be 1.

Speaker 1:

And so it's just going to reject it, and so this is not consistent across implementation. However, we have seen that there is a CLI client for pCloud that uses a completely different library, and that library does not perform such checks.

Speaker 2:

Oh good, pcloud. They're all web-based, except was it which one had a downloadable client? You could set up your own server. They're all web, okay, yeah, so they're all. They're all web-based, so it has a separate client. That doesn't do the the validation checks as well.

Speaker 1:

It's a cli client I guess, it's useful for automating things.

Speaker 4:

I'm not sure cool, very cool we were talking a second ago about how, like some of these systems do, use authenticated encryption for the content of files, or, as they all do, right.

Speaker 4:

GCM seems to be a common design point across all these things, but they don't do authenticated encryption or they don't have a coherent security encryption model for the keys themselves, and then everything falls. Apart from there. I'm struck a little bit by how old some of these systems are, so I think CFile is the one I'm really so. At one point, cfile used ECB for their encryption, and I know this both from your paper and also because if you search for CFile, you'll find, like 2011 threads where, like, well-meaning crypto nerds are trying to explain to them that their crypto is broken. So, like I also wonder how much of this is just these systems?

Speaker 4:

I think the same thing about Mega, by the way that these systems kind of date back to a period where the norms are best practices for building these kinds of systems, at least for general practitioners. These don't look like systems that were built by academic cryptographers. Let's say, the norms were not there in 2011 that are there now, right, I think people would be a little bit more careful with these things now. I wonder how much of this is, just how old these systems are, and, like, in answering that question also, I think we should probably give our listeners some sense of kind of what the common design of these systems are like, what they actually look like and what the constructions basically are, what those protocols look like.

Speaker 6:

I actually look like and what the constructions basically are, what those protocols look like. I think you have a really good point there, thomas, that Mega were also still using AES-ECB to encrypt keys when we looked at them in 2021. And one thing that we found out when we tried to propose mitigations is that it's actually really, really difficult for them to move to something else. So you are probably right that some of these flaws are just legacy problems, because it is very difficult, when you're dealing with persistent data, which is very much the use case of cloud storage, to change your encryption schemes.

Speaker 2:

So it's really difficult because basically the only time that they can migrate people off the old format is when they change stuff, and if they're not changing stuff, they can't.

Speaker 6:

You can't do anything exactly you'd have to force all of your users to come online and download all of their data and re-encrypt it if you wanted to change the, the mode of encryption that's used, and at least for mega, even at their peak bandwidth, we found that that would take over half a year, given the amount of data that they're storing and that's, you know, assuming that all of their users would be able to come online and do this, because it has to be locally done on the client, since it's end-to-end encrypted.

Speaker 2:

This hurts my heart a little bit, but I also like vigorously remember this similar constraint when I was working on Zcash, which is basically end-to-end encrypted Bitcoin, and Zcash has at least three different versions of how it does these shielded transactions. And the modern one is wonderful and it has new features and it has new security properties that are so much better and faster than the previous ones, and it has new security properties that are so much better and faster than the previous ones. But there are people who still have money in the very first iteration and you can't force them to come on and like migrate their money to the latest versions, and so it's like it's a forever sticking point about like what do. Like we can't maintain that sort of stuff forever. Like what do, like we can't maintain that sort of stuff forever.

Speaker 2:

What do we do about the people who never come online and migrate their money or migrate their files from the old version of encrypted storage to a newer version? If you want to invest in that, there's no easy answer. Like you might abandon people, but that's that's a risk you would take and you know some people don't see the value add to the modern encryption, even if you say, like look at how I can just like break the shit out of this if I wanted to and like they just don't quite understand it and it seems a bit esoteric. They're like what do you mean? It's encrypted and it's like, well, it's not very well encrypted, it's badly encrypted.

Speaker 5:

Maybe interesting to add that there are also some exceptions to this rule. So, like iStrive, for example, is one of the providers we analyzed, and they were founded in 2019, I think, so this is rather recent, I would say iStrive is from 2019?.

Speaker 3:

Istrive is the one where they recent, I would say, and I would say that the industry is from 2019.

Speaker 4:

IceDrive is the one where they said that two fish is more secure than AES. Right, yeah, exactly.

Speaker 5:

So I would say that the industry standard was already better than this when they started the company. So I'm not sure if they I don't know have the same excuse as the other providers.

Speaker 2:

I wonder if they had a copy of Applied Cryptography and that's all they looked at in 2019 when they were creating Ice Drive, which is a shame. Go buy Serious Cryptography by JP Amundsen. That's a good book.

Speaker 4:

Ice Drive has. Not sponsored Users to fish, and also they have their own block cipher mode, right Like they came up with their own mode for doing bolt content encryption too. Maybe IceDrive is a good place to fix on just to describe what one of these systems look like. I feel like these systems all kind of look a little similar, like there's differences, obviously, but there are design commonalities to them. So if you were to describe roughly how IceDrive was designed, describe that system to us.

Speaker 5:

So IceDrive is rather simple in design in comparison to the other providers. They only use symmetric cryptography. They have a user password and the user password, like with other providers, is used to derive a symmetric key. So you have a key derivation function that you use together with the user password and the user password, like with other providers, is used to derive a symmetric key. So you have a key derivation function that you use together with the user password, and then what they do basically is that there's a master symmetric key in iStrive and this master key is used to encrypt all the files that you have. Since iStrive doesn't support any sharing, this is rather simple and a sufficient approach because you never need to share any key material with anyone else.

Speaker 5:

The problem is that there are already some things that go wrong there the issue with metadata that I mentioned earlier. So iStrafe leaks metadata and also allows a malicious server to tamper with that metadata, and then there's a problem that they use either their custom encryption mode or CBC mode together with TwoFish, which doesn't offer any integrity protection of the files. So you can mess with the ciphertext a bit and then also mess with the content of the files, which is, as we mentioned earlier, especially with executables and stuff a problem. And they also run into this problem of the files, which is, as we mentioned earlier, especially with executables and stuff a problem. And they also run into this problem of unauthenticated chunking, where you can basically reorder the chunks in which a file is uploaded and build a file that you have like a file of the service choosing, with some caveats, of course.

Speaker 5:

How this differs from the protocols of other providers is that other providers usually use a lot more different keys.

Speaker 5:

So, for example, depending on the file hierarchy, one key per folder and then one key for each file. And also a lot of the other providers, especially the ones that offer sharing, use asymmetricetric cryptography, like some public-private key pair that is specific to a user account. In some cases, pcloud, for example, does this. They use asymmetric cryptography in the same way, so there's a public key and a private key that each client holds, but they don't offer sharing. So in theory, at least in the current state of the protocol, it would also be sufficient to just use symmetric cryptography, because they don't really need the public key for anything. This is maybe for, I don't know, compatibility reasons or because they want to add sharing in the future, which I don't know, but at the moment this is not needed. So there are in other providers some steps in between, where the keys of files or folders are encrypted under the keys of the folders or files that are above them in the fine hierarchy.

Speaker 6:

I just wanted to mention that there's one other system that stands out a little bit, which is Proton Drive. This is the cloud storage scheme by ProtonMail. They use only asymmetric cryptography because they're basing also their cloud storage system on OpenPGP. So just to mention there's also some other interesting systems out there.

Speaker 1:

Yeah, and specifically for IceDrive, we were talking about their encryption, the mode of operation, and it looks like a series of peculiar decisions that we've seen. So I mean, I guess first of all using CBC encryption you know, raw CBC encryption is already an interesting decision.

Speaker 1:

And then padding your message with only zeros Okay, sure, but then you have to encode somewhere the length of this padding Sure, Okay. Then you have to provide an IV and they choose the IV at random okay, but then they choose it only from letters and digits, and I don't know where this idea comes from.

Speaker 1:

And then they take the IV and then the length of the padding, and then they encrypt it again, this time with, again with CBC encryption, but this time with a very specific IV which is 1-2-3-4-5-6-7-8-8-7-6-5-4-3-2-1. Just a series of peculiar decisions.

Speaker 4:

I think my favorite thing about iStripe was the fact that their unpadding function doesn't check to see if the bytes are zero. It'll just take a random whatever it says the padding was in the header, it'll just chop that off the end. Absolutely like whatever it says the padding was in the header, it'll just chop that off the end.

Speaker 2:

Absolutely. We're so far away from, like you know, the good APIs of cryptographic primitives. But like this is yet another reason why like put an IV here, put a nonce here, random engineer who's calling an encryption library is like no, because it results in like IVs or nonces with the value one, two, three, four, five. Don't expose that to people at all.

Speaker 1:

That's a very good point.

Speaker 6:

It's a bit of a difficult balancing act, though, I have to say, because in a recent research project, together with Mateusz Karlata and Nicola Daldanis, we were trying to design a secure file sharing system, like actually implement it, and then we ran into the issue that a lot of the APIs are very restricted, for good reason, right, for historical reasons. We've started building more restricted APIs because it's turned out that they're misused, but then when you have some very competent users who want to build complex cryptography, it can become limiting instead.

Speaker 2:

Well, I would make an argument that, like my first pass at building something like this would involve nice primitives like HPKE for your hybrid encryption and it has an AAD in there, but you can basically use it for your key wrapping and then you use a real AAD for your files and your other stuff and at the top layer, like none of that stuff is exposed, it's all kind of wrapped up and it's stuff like that. Like if you are sufficiently advanced that you're really trying to get efficiency and secure construction and stuff like that, then you kind of you might have to get into the guts because you're trying to break through the nice layers of abstraction in favor of efficiency. But if you're just trying to get something that's nice and secure and maintainable for a general practitioner and not an elite expert, you can get away with not having to dig into the guts. But it is a balancing act between, like, if you're trying to push the metal or you're trying to push these constructions, yeah, you're going to have to like break through these abstractions and constructions that are there to try and keep that sort of complexity and detail away from just a general practitioner.

Speaker 2:

It is. It is a tough balancing act. It is a tough balancing act, but I would argue that we, the general cryptography community, are trying to make things like HBKE and other things to make it a little bit easier to do efficiently and securely, without having to get your elbows deep into IVs and nonces and cipher modes and things like that. Anyway, oh, something about a Merkle tree.

Speaker 4:

Sorry I might be skipping ahead so we're just, we're in the background, trying to make sure that we're catching all the details that we wanted to call out in these papers, right?

Speaker 4:

so um, yeah, like I think we, we, you know we want to get into a little bit of, uh, I mean probably tressor in particular is, you know, an interesting example of a more sophisticated system. But also we're just dunking on like one of these systems went out of its way to do a Merkle tree for authentication that didn't really do much to provide authentication, which is just another fun call out.

Speaker 5:

Yeah, so basically this was a system that was used in pCloud for the integrity protection of files. This goes in the direction of this unauthenticated chunking issue, where they split the file into chunks, basically, and then they compute a tag, a metag, over each chunk. So far, so good. And then they try to combine these metags to build like a tag for the entire file. And the way that they do is by using, like a merkle tree to aggregate all the individual tags into one. But then the problem is that, instead of including only the root of the merkle tree in the ciphertext, they include all the intermediate steps of the computation, so all the individual tags of the chunks. And this is an issue because the uh, the server can then simply like choose the tag that it wants, cut the part of the ciphertext and obtains a valid ciphertext that is shorter than the original one. So the problem is basically that they didn't just include the root of the Merkle tree, but also the other leaves of the tree and then it just breaks down.

Speaker 1:

Yeah, and also the fact that you know, usually the way I think of Merkle trees is that you take I don't know your chunks and then you hash them together and then maybe you authenticate the root.

Speaker 5:

Yeah.

Speaker 1:

But why, like? For some reason, they decided to use max all the way, all the way down, and so this is peculiar. But then it also makes you think that that means that any subtree of this tree is also a valid tree. Oh, makes you think that that means that any subtree of this tree is also a valid tree, because it's sort of recursive now.

Speaker 6:

Oh no, so you?

Speaker 1:

don't have the tag only for the root, but you have tags everywhere, so you can just take a small part of the tree and then serve it as the entire tree.

Speaker 4:

I feel bad for them because it feels like pCloud went out of their way. So I feel like IceDrive went out of their way in a bad way, and then pCloud. It seemed like they were trying to get things right right, and then it just didn't work out for them.

Speaker 1:

Maybe. I mean, it's hard to say. They also made some very peculiar choices. I am not sure it's. Also the encryption scheme or the encryption method that they use is also quite strange. So, for example, they have some cases in which they have messages of 16 bytes that get encrypted differently from messages which are longer than 16 bytes.

Speaker 2:

What.

Speaker 1:

And the encryption for messages, which are 16 bytes, includes storing the message with the Mac key as well. Okay, I take it back. I wouldn't necessarily say that, yeah.

Speaker 3:

So what about C file? So how is C file constructed and how did it go wrong?

Speaker 1:

So well. C file is also a little bit peculiar in the sense that they don't do sharing per se, for example. What you can do is you can just share the password to one of your drives to somebody else if you need to share it. I mean, that's on them to the side, and the other thing is that this is one of the only self-hostable solutions as far as we've seen, but it's also been used by some universities.

Speaker 1:

I think that one of the interesting things that we found on C-File is the fact that you can really see the. If you do archaeology, you can see the layers of limestone and sort of, and you can see the virus legacy versions that have been stacked on top of each other, and one of the attacks that we found, in fact, is the fact that they have somewhere as rich between the various version of the versions of the encryption protocol, and every version utilizes something different. So, for example, some of them use the use ecb and some of them use ECB, and some of them use TBC but AES-128, some of them AES-256, some of them use the OpenSSL bytes-to-key function to derive keys.

Speaker 4:

Which is amazing, by the way.

Speaker 1:

Which is amazing, which is just SHA-1, repeated a few times. Okay, so you can really see you have a nice cross-section of all these things that are stacked on top of each other. And then the problem is that the server sort of decides the version. So the client asks the server hey, which version do you support? And then the server can clearly just say, oh, I support only the weakest version, and please use that. And that's not great.

Speaker 4:

And the weakest version here is really really weak really weak.

Speaker 1:

Yeah, exactly, it's the one that uses the bytes to key and I think just three iteration of that. So that means it's shall one repeated three times and with no solve. And so that means that the the passwords of the users, which is essentially the root of uh of security here, gets. It's very easy to brute force.

Speaker 2:

What did Tresorit do? Because I'm scanning and I'm like all right, gcm, some weird. They have an HMAC in here, which is good. They have modern RSA. They have sCrypt for stretching out that password. How is Tresorit broken?

Speaker 5:

So, from what we found, trezorit is, I would say, a lot less broken than the other providers and it's really obvious that they put a lot more thought into their cryptographic design, which is a good thing for the security and the privacy of the users.

Speaker 5:

But it's not a very good thing if you're trying to analyze it, because reading through the code is really, uh, not very easy because, also, it's obfuscated to a very high degree and they use a lot of different keys. So there's like they split their, their file storage up into these treasures and there's a key object for each one of the treasures and there's also an object for each, and then these are related and they use like a very maybe convoluted combination of symmetric and asymmetric keys. And, apart from the issues with metadata, that kind of arise in all of the providers, the biggest issue we found there was this thing about key replacement when sharing files with other users, so that you need to query a public key of someone that you're sharing a file with and the server is potentially able to replace that public key with a public key that it knows or a public key that it generated themselves, and this is possible only if the public key infrastructure that Trezorit has fails. But since they're running the public key infrastructure themselves, it's likely that someone who is able to compromise the server can also compromise the PKI. Yeah.

Speaker 2:

That seems to be like a running theme or a running issue with any end-to-end encrypted system is like, at some point you have to trust the infrastructure. For some services, um, like for a lot of the stuff we seem to trust, like, just carry my bytes. And even then you might be like oh, there might be a delivered like a, a compromised server or you know service might not deliver your bytes from like one device to another, like that's like the ultimate failure mode, but that's accessibility, not confidentiality or authentication or anything like that or authenticity. But then the next step is like um, I'm trusting you to uh provide the public key material that identifies who I am, or at least my devices. And so they're like okay, I want to share this file with david. And the server is like cool, here's David's public key and you can start your handshake to share key material or do whatever you're going to do. And then it's secretly sharing. You know the FBI's, you know ghost public key and you're just sort of trusting the server.

Speaker 2:

So the sort of immediate answer to that is don't trust the server service. For you know, quote PKI and like exchange that data out of band. But like, how do you do that? You have to have another trusted channel that's not controlled by the service and like the way that other people, and then it turns into okay if you do that once you have trust on first use, but what about rotation and like then and so on and so on and so on. These are not easy things to solve for. So it's kind of understandable that, like, if you're just trying to stand up a service that you have to trust minimally to like, carry your encrypted bytes, you're just sort of like, yeah, I will just hand you the public key and you just trust me and you're just like, okay, because, like, what's the alternative? It's sort of like, ah, go find your friend in person, get on a call and like, show each other a qr code or something like that.

Speaker 4:

It's just, it's a hard problem to solve I'm really struck by this whole situation, by how similar it looks to me to the situation that, like, threema and matrix were looking at, where, like so you have these papers that are just like these clown fires of vulnerabilities.

Speaker 4:

I mean, I'm constitutionally mean to people who build crypto, but there's a lot going wrong here. There's a lot of stuff like you're just kind of ping ponging around like the greatest hits of like 2010 era crypto vulnerabilities, right, and the solution for something like Matrix, which had kind of the same situation, it was like this like hodgepodge of different protocols that were put together that would all bounce off of each other in bad ways. There were a bunch of vulnerabilities there, and the solution there is something like MLS, which is what Matrix has ended up doing, right Is you take a coherent design that already exists and has been vetted and then you build your system around that coherent design, and there are a couple of coherent designs that you could use in a messaging system. Like Threema could take Signal Protocol or some open version of Signal Protocol, or they could take MLS, and then you wouldn't have to worry about all or most of the problems and the constructions they had, because there's a reference to go from there, but that doesn't really seem to exist for these systems.

Speaker 4:

There isn't a model of what does a good encrypted drive look like? What are the constructions that we should be using there? So I'm wondering if that's a thing that we should be talking more about.

Speaker 2:

Yeah, and then, like Matilda, this kind of goes into your work on like a formal treatment event of cloud storage which is just is just like okay, write it up in the formal setting like what we expect from a good design of ntn encrypted storage, so like maybe your work turns into an actual protocol that's exactly what we were hoping, right.

Speaker 6:

So we had this realization that thomas also made that there. There's not really a good reference out there. And that's what started the formal treatment work, where we really wanted to provide a good protocol for end-to-end encrypted cloud storage, and our focus there was first of all to create a security model, because we realized that there wasn't even any security definitions. So of course you can't be provably secure if there are no definitions by which you can write such a proof. So that was actually the bulk of that work is figuring out what's the right syntax for this kind of object. You know, what level of detail should we view it in order to be able to say something about its provable security? And that turned out to be very difficult. But after we created that model, we also wanted to say something about how to securely build it. So we also built or included. It's not a fully fledged cloud storage protocol by any means, but it's at least sort of the skeleton structure that we would recommend. So that's also part of that research.

Speaker 2:

Oh yeah, why was it difficult to really nail down the model of a secure encrypted files, anti-encrypted file storage? Because my first knee-jerk reaction is like, well, you need you know authenticity on the metadata and you need you know CCA resistant encryption on the you know the key storage and on the file storage, like it seemed. Why was it difficult?

Speaker 6:

So I think there are multiple reasons that it's difficult.

Speaker 6:

One is that these systems are very different from each other. They try to provide different functionality, and so you know, syntactically it's a difficult object to nail down. It's not just an encryption scheme where you know which inputs you're getting and what output to expect. This is like it's a system and it's running interactive protocols between a client and a server. So one of the first things that we decided was that we really wanted to capture what was happening on the level of messages between client and server, so more fine grained than if you just look at you know like here's an encryption algorithm, and so one thing that was difficult there is that the complexity becomes kind of like a supercharged version of key exchange. So you know, like key exchange models are notoriously complicated and they're just running one protocol, the key exchange protocol, Whereas for cloud storage, you're running a bunch of protocols. There's like registration when you first create an account, then there's authentication and then, once you're signed into your account, you can do all of these things to your files, like upload a new file, maybe change a file, download a file, share a file with another user, receive such a share, and so on. So we had to first of all think about, you know, what's the core functionality, because some of the systems so you know, when we did this, this formal treatment work I had looked at Mega, Nextcloud and Protondrive together with the collaborators, and these three systems are vastly different. Already Mega, for example, has this chat functionality, for which they use some of the same keys. Protondrive, of course, also has their anti-aggregated emailing service. Nextcloud is meant to be self-hosted, so that's really targeting organizations who want to set up their own server, and they do sharing in a very funky way. So very different kind of systems.

Speaker 6:

So we had to first think about what's the core functionality, how do we capture this in syntax? And once we've nailed down which interactive protocols to look at, we then had to find a syntax that could actually handle all of these messages coming from client to server. And one super difficult thing there is that when all of these things are just messages between a client and a server, you don't even have identified ciphertexts right. So the client might not be saying, oh, here I'm encrypting a file and now I'm going to send you a file, ciphertext, right, these are just sort of hidden inside messages. It could be that the client encrypts the file chunk by chunk and sends it over multiple messages.

Speaker 6:

It could be I don't know that they combine these ciphertexts that are file ciphertexts with other things and, abstractly, when you're creating a security model, you don't know this right, we don't have access to any of this information. When you're analyzing this system, it might be possible to identify file ciphertext, for example, but on the abstract level, where we wanted to define a security notion, like I don't know, integrity of ciphertext, this turned out to be impossible Because there is no ciphertext which you can, you know, replace by random, for example, in a security game, ciphertext which you can you know, replace by random, for example, in a security game, you describing how different, how diverse and complicated these systems are in trying to like synthesize some sort of game or you know, systems of games for these different notions of this, like multi-protocol system.

Speaker 2:

Um, like reminds me of all these other end-to-end encrypted messaging apps where you might start from a good foundation of like, say, you use Signal in your new end-to-end encrypted chat app, but then you need to do file sharing and then you need to do secure backup of your end-to-end encrypted chats and you need to do this, and then you need to have groups, and then, and then, and then and then and you start from a very nicely modeled, designed cryptographic protocol, but the the necessities of a software product and service means it kind of just starts growing in all these directions as you start adding features and integrating stuff and this seems to be like you coming in at the end point being like all right, how do I even think about all these things and how do they fit together? Do you have any sense of how difficult your current kind of model for these antenna encrypted systems will be to maintain or update, as like things kind of start growing organically or get added to a system like this.

Speaker 6:

I think there is a risk that this will be difficult.

Speaker 6:

We tried hard to stick to the core in this model, to not make it overly complicated and also to make it modular in some sense.

Speaker 6:

So, for example, the way we treat file sharing in our model is that this is handled through some sort of abstract out-of-band channel, and this is really because we want to be able to plug in whatever way real-world systems are doing this, which could be using a PKI hosted by the cloud service provider, as we were talking about before, but it could also be some sort of out-of-band verification over a secure messenger or meet in person, or key transparency, for example, and I think we realized after the fact that we should have done this even more.

Speaker 6:

So we're currently thinking about revising the model to make registration and authentication a separate part from all of the file operations, whereas, you know, currently all of these protocols are handled within the same model. It might be nice to do identity management separately so that we can also treat providers that already have an identity management system in place. You know, if you think about the bigger providers that are now finally starting to add antenna encryption to their cloud storage, just like Apple and Google and so on. They are already identity providers, right, so they might want to handle this part very differently.

Speaker 3:

Do you think, just based on, like the user experience that you get out of, let's call it, a centralized, like cloud storage provider, do you think it's possible to like nail a set of properties in a construction that gives you more or less the same experience that you would get from a centralized, non-encrypted one to just hand-waving like a secure end-to-end encrypted one Like? Is that something that we think is actually like possible to build and we just haven't done it yet? Or is the nature of the problem such that you you have to sacrifice, sacrifice somewhere?

Speaker 6:

is the question whether there is like a core set of features that would work for all of these systems, or is the the question, when you move trust from the server to to the client, whether can you keep the user experience the same for end users while removing trust in the server, or do?

Speaker 6:

you have to change the experience somewhere. I think there's reason to be hopeful that we will manage this as cryptography research goes forward, but there are certainly different or some things that are very difficult at the moment. So, for example, real-time collaborative editing is something that most server-side encrypted systems provide today, like Google Docs, where you can edit the document together with your collaborators alive. This seems difficult from just an engineering point of view to do on encrypted data. Where do you host the updates and who does the merging of them in real time when you can't rely on the server anymore? But there are systems that are trying to do this. It's also an open area of research in cryptography at the moment. So I think there's definitely a reason to be hopeful, but whether or not the user experience will be identical in the future, I think that's a hard question. I think maybe users will have to get used to the experience being slightly different in exchange for better privacy. We just have to educate users about what it means for their data to be confidential and integrity protected.

Speaker 2:

Interesting, Although if we have very fast primitives we may be able to get away with it and then not notice very much.

Speaker 6:

Yeah, absolutely. Hopefully, in terms of latency, it won't be a problem. We'll still be able to support features like editing. But, for example, as soon as you have end-to-end encryption, if a user loses all of their devices or their password, there's just nothing that the service provider can do to help them get back their data, and this, I think, is not something we will solve. In terms of user experience. We can educate users to make sure that they have backups of their keys and maybe second factor devices and stuff that can help them get back into their accounts, but it's never going to be the case that you can email customer service and say hi, please send me into my account.

Speaker 2:

Oh boy, ok, there's so much great stuff in here and I totally agree that we love the thought of end-to-end encryption for a lot of these things, but then there's a whole world of people that are like, what do you mean? I can't just log into my account? What do you mean I can't just log into my account? What do you mean I can't just email support at, and like they'll just get beat back in. It's like, well, that's the point, we can't get in ever, only you can get in, and some people are comfortable with that. Like you lose your device, you lose, but some people are not comfortable with that. And so, but anyway, matilda, jonas and and returning champion, ken, thank you so much for this work. This is really cool. Um, I am really looking forward to any kind of future, like nice formalized constructions based on your model and updates to your model, matilda, because I like to build the good stuff.

Speaker 2:

Um, and all these attacks. Like all these attacks are just sort of catnip about, like people like it's obvious that whenever these were built which is apparently 2019, for at least one of them, like there's just a lot of stuff that people don't understand is important about building end-to-end encrypted things. That is not just about like can I read the bytes. There's authentication and integrity that really, really matter, especially when you're storing this encrypted key material on an untrusted server. And this is just like a laundry list of why that's true. Thank you so much.

Speaker 4:

Yeah, we've raised the bar on the messaging side of things.

Speaker 4:

I think most practitioners today have a sense of. If you're not running some derivative of signal protocol or MLS, you're kind of outside of kind of the state of the art right now. Right, and I love the idea that you know you're driving towards that same model for, you know, storage at rest, which is like if you were going to come up with the second most important problem in like end user cryptography after secure messaging. It's this, right, it's like where you're storing your files, and I love that. There's like a you know, in messaging there's kind of for me in these messaging systems there's kind of an original sin of losing track of the connection between group management and key management right.

Speaker 4:

And, like when you forget that the group management is the equivalent of the key distribution scheme, the whole system falls apart. And here there's another kind of similar original sin situation where, like, if you forget that the entire key hierarchy needs to be authenticated, needs to have a coherent security model that chains back to some root of trust, then no matter what else you do, whether it's a Merkle tree or your own block cipher mode, the whole thing is just going to fall apart because somewhere in that system there's a key where it's like the server controls this key, like I can just switch the key up. The whole thing falls apart. So I love that there are some kind of core notions here of things that, like all of these, every single one of these systems gets wrong in some way because there isn't yet a popularized kind of coherent security model for these systems to build on. You guys are driving that forward. That seems awesome. Thank you so much for kind of describing this stuff to us.

Speaker 2:

Yeah, thank. That seems awesome. Thank you so much for kind of describing this stuff to us. Yeah, thank you. Security Cryptography Whatever is a side project from Deirdre Connolly, thomas Tachek and David Adrian. Our editor is Nettie Smith. You can find the podcast online at SDWPod and the host online at Durham Crestalum, at TQPF and at David C Adrian. You can buy merch online at merchsecuritycryptographywhatevercom If you like the pod. Give us a five-star review wherever you rate your favorite podcasts. Also, now we're on YouTube with our faces our actual human faces. A human face is not guaranteed on YouTube. Please subscribe to us on YouTube if you'd like to see our human faces. Thank you for listening.