Chapel Hill Computational Linguists Crack Skype Calls 156
mikejuk writes "You might think of linguistics as being interesting but not really useful. Now computational linguistics [PDF of original paper] has been used to crack Skype encryption and reconstruct what is being said in a VoIP call. What is surprising is that though they are encrypted, the frames that make up a Skype call contain clues about what phonemes are being spoken."
Speach recognition (Score:4, Insightful)
Re: (Score:2, Funny)
Do you speak as well as you spell?
Re: (Score:3, Funny)
I hope so; city's spelling was flawless.
You'd best learn what grammar is before you try to be a grammar nazi.
Re: (Score:2)
Speach?
Re:Speach recognition (Score:4, Funny)
Scottish Gaelic. Noun speach f (genitive speacha, plural speachan)
1. wasp
Like newcastlejon said, his Scottish Gaelic spelling was flawless. I always hate it when Google doesn't recognize my wasps.
Re: (Score:2)
"Speach"?
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Vonage gets about 75%. Not bad. I think, secretly, that they hire people in India to do it.
I should have guessed when the robotic voice sounded like Apoo! Vonaaage!
Re: (Score:2)
Linguistics not really useful. The ignorance (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
Linguistics provides the foundation and formal frameworks for...
Agreed. mikejuk was obviously feeling dyslexic. In point of fact, nobody likes to discuss linguistics. It is boring as hell. :p
Re: (Score:2)
The ignorance of the statement "You might think of linguistics as being interesting but not really useful" is simply astounding.
Right. Would they have a linguist on basically every interplanetary mission, if they were just a bunch of useless bookish nerds ?!!
Re: (Score:2)
The ignorance of the statement "You might think of linguistics as being interesting but not really useful" is simply astounding.
What ignorance would that be. I read that as a statement as a hypothesis that the common man might hold linguistics to be not really useful. The statement makes no claim whatsoever that linguistics is in fact not useful. In fact it makes the exact opposite claim.
Do you believe that the commonly held opinion is that linguistics is useful? or simply some academic pursuit for bearded people with leather patches on their elbows. I think you can easily get a feeling by looking at research grants to linguistics a
Re: (Score:2)
The common man might think this, but this is slashdot, where I hope the level of computer science and software engineering clue is still a bit higher than the background levels. Such people should already be aware of the close linkages between linguistics and computer programs and systems.
But maybe I'm just engaging in wishful thinking now.
Re: (Score:2)
Right now that would probably entail buying Microsoft. What could possibly go wrong?
Re: (Score:2)
Side channel attack (Score:5, Informative)
Re: (Score:2, Interesting)
The simple description is: By looking at the size of the encrypted data packets you can guess what phonemes were spoken. Yes, that's all there is to it. They are just looking at how much data is sent and guessing what might be said that reasonably fits in that size.
An obvious simple fix would be to vary the length of the packets with random padding (using a cryptographically secure random algorithm to determine the length). It would add overhead but probably not that much considering how small these pac
Re: (Score:3)
Re:Side channel attack (Score:4, Informative)
Re: (Score:2)
If the padding is random you'll decrease the amount of information leaked, but there may still be enough information leaked to reconstruct some conversations. What you really need for total security from this attack is to eliminate the side-channel completely, such as by sending packets of the same size and with the same frequency no matter how much data you've actually got that needs sending. That is a form of padding too, but it is better than random.
^^ This. I'm actually surprised to hear that with Skype the packets are of variable length and (somewhat) a function of the contents. I would have imagined that, after encryption, the communication protocol would split the content into packets of either random or same size.
But OTH, there might have been performance implications that forced Skype to not do just that. After all, there are legit reasons to not do super encryptions (as with the Predator's unencrypted download links [schneier.com].)
Re: (Score:2)
Re: (Score:2)
Yeah. Furthermore, this is a *really* old and *really* well-known side-channel. Everyone knows, and has known for many decades, that crypto by itself is no defence against traffic-analysis, that is, you still know what size the packets are, and who the sender and recipient is, and the frequency they're sent with.
The only way to thwart that completely, would be to send a constant stream of constant-size packages regardless of if anything is being said or not, this is an easy fix, but it conflicts with the go
Re: (Score:2)
You could still save a lot of bandwidth, and protect against phoneme exposure by having fixed packet size transmission happen for a short interval after speaking occurred (perhaps randomly between 0.5 and 2 seconds). You'd be able to tell when people were speaking, but lose visibility into the cadence of the words. That shouldn't be that large an impact on the overall bandwidth consumption but should pretty much shut down this side channel.
Re: (Score:3)
if your encryption leaves the message where it can be read without decrypting it, then it was never actually encrypted
skype is using a lot more bandwidth than they need to. like single-sideband radio, they can drop at least half the channels they're sending and the information will still be perfectly intelligible on the other end. they've effectively done that by sending superfluous encrypted gibberish on their "main" channel.
the bonus is, their method of sending the message in the side channel is probabl
Re: (Score:2)
if your encryption leaves the message where it can be read without decrypting it, then it was never actually encrypted
While you are technically correct, you are not really contradicting what I said.
The encryption algorithm itself does not allow you to obtain the plaintext without decrypting it (as far as we know); the problem is that the protocol requires many encrypted messages to be sent in a particular sequence, and the size and sequence of those messages leaks information about the plaintext. This is a side channel, not a break of the encryption algorithm itself, and the problem is solved without any change to the
Re:Side channel attack (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
The Cone of Silence was never really all that soundproof, either. Nor was it at all cone-like.
Re: (Score:2)
Re: (Score:2)
Arguing about whether they broke "the encryption" or "the secure channel" or "the encryption machine" is a worthless rhetorical exercise.
Except that it is not just rhetoric. Suppose I use PGP to encrypt all of my email, but then save copies of the plaintext on a "cloud system" and someone comes along and reads the plaintext. What was broken? It was not PGP; PGP, when used correctly is secure.
Yes, if you use a cryptographic algorithm incorrectly, your security may be compromised. That does not mean the cryptographic algorithm was broken, it means your specific way of using it was bad. Just because someone managed to compute Sony's P
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Then I guess by your definition, all encryption everywhere is "cracked,"
No. It requires actual judgment of whether the flaw is in the standard as expected to be applied. You are apparently purposefully ignoring judgment in order to defend your pet idea. There exists no implementation of Skype's encryption (not talking about the cypher it's based on) which you couldn't "break" in this method. Thus, there is no Skype conversation free from this attack, regardless of platform, implementation, or anything else. I'd call that "cracked." You call that "secure." That's where ou
Re: (Score:2)
Thus, there is no Skype conversation free from this attack, regardless of platform, implementation, or anything else. I'd call that "cracked." You call that "secure." That's where our opinion differs.
Except that is not what I said. I said that the encryption algorithm has not been cracked, because it has not. The attack is a side channel attack. This does not mean that Skype calls are secure, it means that an otherwise secure algorithm was applied in a way that undermined the security of the system.
There is a difference between the encryption algorithm, and the system that uses that algorithm. This same attack would have worked if a different encryption algorithm had been used, even one as wide
Re: (Score:2)
There is a difference between the encryption algorithm, and the system that uses that algorithm. This same attack would have worked if a different encryption algorithm had been used, even one as widely evaluated as AES. The encryption algorithm is not what was cracked here.
"They broke Skype's encryption" is a true statement. The encryption package Skype uses is broken. Whether they did that from breaking the algorithm (I see after I spend a post proving "cypher" to be a pointless red herring, you've swapped to a new red herring in substituting "encryption algorithm" for cypher without changing your statements at all), or by some other attack that compromised the security of the encrypted calls is irrelevant to the truth. "Skype's call encryption has been broken." Again, t
Re: (Score:2)
There is a difference between the encryption algorithm, and the system that uses that algorithm. The encryption algorithm is not what was cracked here.
"They broke Skype's encryption" is a true statement. The encryption package Skype uses is broken. Whether they did that from breaking the algorithm (I see after I spend a post proving "cypher" to be a pointless red herring, you've swapped to a new red herring in substituting "encryption algorithm" for cypher without changing your statements at all), or by some other attack that compromised the security of the encrypted calls is irrelevant to the truth. "Skype's call encryption has been broken." Again, that'
Re: (Score:2)
"They broke Skype's encryption" is a true statement.
My point from the beginning is that that state is ambiguous. It is not clear from that statement that the researchers did not crack the actual encryption algorithm. It does not make it clear that the problem has more to do with the compression than with cryptography.
Whether they did that from breaking the algorithm (I see after I spend a post proving "cypher" to be a pointless red herring, you've swapped to a new red herring in substituting "encryption algorithm" for cypher without changing your statements at all), or by some other attack that compromised the security of the encrypted calls is irrelevant to the truth.
It is relevant to whether or not the statement is clear about what the attack actually constitutes. Again, my point from the beginning was that TFS is ambiguous.
Re: (Score:2)
Again, my point from the beginning was that TFS is ambiguous.
No, your point from the beginning was that it was "misleading." Misleading is an opinion that, based on other comments and what's actually broken in the wild, false. If you had asserted facts (ambiguous is a fact, misleading is an opinion), then there would have been no room for discussion.
When you hear that an application is broken, is it mostly because the underlying cypher was broken? I've never heard it that way. Because when someone broke the cypher, the statements were naming the cypher, not comm
Re: (Score:2)
Is that really side channel - by that I mean it seems to me like block cipher mode crypto on a per packet basis is being employed... which would make it akin to a watermarking attack.
Re: (Score:2)
um, this counts as cracking their encryption. Just because you can't efficiently perform a "cleartext" digital translation (it is analog sound...) doesn't mean you can't read the message.
And now that Microsoft has bought them for 8.5 billion: LMAO.
Fuck you Ballmer.
eww it's not Skype's day... (Score:1)
Ouch.. (Score:1)
Re: (Score:2)
Microsoft hasn't even bought it yet. Secondly, Skype has already had 2 major outages in the last 4 years.
Encrypting a wave (Score:2, Informative)
Of course, since the data basically represents sound waves, there is a certain level of predictability and pattern on the data unlike normal data which is much more random.
It would have to be a special encryption to get rid of this pattern using a more dynamic algorithm that changes as it progress (which can make it annoying to decrypt or simpler to detect) or disjoint the data over a greater amount of data (making it somewhat harder to find the patterns though still might be possible) of the encryption tho
Re: (Score:3)
normal data...is much more random.
Actually, most data used in practice is not uniformly random. Text, images, and even computer programs tend to have significant biases.
It would have to be a special encryption to get rid of this pattern using a more dynamic algorithm that changes as it progress
http://en.wikipedia.org/wiki/Stream_cipher [wikipedia.org]
We know how to get these things right, and the problem with Skype was not the type of data, but rather the way in which that data was compressed.
Re: (Score:2)
Re: (Score:2)
Of course, since the data basically represents sound waves, there is a certain level of predictability and pattern on the data unlike normal data which is much more random.
It would have to be a special encryption to get rid of this pattern using a more dynamic algorithm that changes as it progress (which can make it annoying to decrypt or simpler to detect) or disjoint the data over a greater amount of data (making it somewhat harder to find the patterns though still might be possible) of the encryption though that is difficult in a time sensitive app like Skype which encrypts and sends as it receives the data.
It does not follow that encrypting sound waveforms leaks information just because they are predictable. If that were the case, encryption wouldn't be very useful in general. There is no such thing as "normal data" and most data people need to encrypt does have strong patterns. The entire purpose of encryption is to make non-random data look random.
The method for guessing what people are saying described in TFA exploits specific properties of the most efficient voice compression algorithms coupled with timin
plague of any compressed voip conversation (Score:3)
I remember reading something similar with sip over encrypted channel... I guess it is the plague of all compressed communication even if encrypted... the only way to bypass that is use an uncompressed protocol and not blank out the silence. I guess what's new is they've done it with skype.
Re: (Score:3)
Re: (Score:2)
Or make it a constant bitrate.
Re: (Score:2)
Re: (Score:2)
I remember reading something similar with sip over encrypted channel... I guess it is the plague of all compressed communication even if encrypted... the only way to bypass that is use an uncompressed protocol and not blank out the silence. I guess what's new is they've done it with skype.
It's only a problem for variable bitrate compression algorithms, not less efficient fixed bitrate ones like the venerable G.722 [wikipedia.org]. It may only be a problem for voice-specific variable bitrate codecs, not general ones like MP3 or Vorbis. This risk from this type of attack may also be greatly mitigated by decoupling datagram size and timing from the output of the encoder, which would probably increase latency but still allow use of efficient codecs.
Comment removed (Score:3)
Re: (Score:1)
The reason why is that any serious encryption attempt of IP traffic would make all packets a constant size
From TFA: A solution might be to break the data up into fixed sized frames but this would make it more difficult to reconstruct the data if there was packet loss.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Depends how well the VOIP system is designed, you could build a VOIP system to handle high rates of random packet loss. You would just need to
1: keep the buffers relatively big relative to the size of the packets (this means either smaller packets or bigger buffers)
2: use forward error correction techniques and/or information spreading techniques so that a lost packet can be reconstructed from others in the buffer
Probablly you would want to go into this mode in an adaptive manner if the channel was detected
Re: (Score:3)
From TFA: A solution might be to break the data up into fixed sized frames but this would make it more difficult to reconstruct the data if there was packet loss.
And even then, the data rate would leak some information about the content.
The only trivial solution for zero leakage is to either use constant rate encoding, or use some kind of padding to make the data rate constant. Non-trivial solutions would include some random data rate variations to obfuscate the data rate of actual payload content. Unfortunately, all these methods will waste bandwidth.
Re:Skype's encryption sucks (Score:5, Informative)
The reason why is that any serious encryption attempt of IP traffic would make all packets a constant size, significantly below expected MTU size (taking into account tunnels). This attack would not exist in that scenario.
It's actually harder than that. You also have to generate the packets at an even rate as well, or you'll still have some leakage.
Even after you do that, the presence or absence of a stream of packets will at the very least indicate if a call is in progress; to defend against that, you have to *always* transmit the stream.
Even then you're leaking information about the maximum amount of data you could be communicating.
The goalposts keep moving right on down the field when you're talking about side channels. You just have to pick the point where you're comfortable.
Re: (Score:3)
Codec as the weak point (Score:1)
TFA states that this is possible due to the codec that is used:
the best...compression for voice data makes use of the structure of speech
So using a not-optimized-for-speech codec (e.g. mp3 or wav) would defeat this.
Re: (Score:2)
it could have been defeated by encrypting the entire data stream instead of just part of it.
Re: (Score:2)
No, it really can't. Essentially this same paper, but as an analysis of SIP-IPSec/SIP-TLS, was published not long ago. Any real-time, size-efficient voice codec leaks a ton of information about the underlying speech just in the rate and size of its packets, so any encryption system that is real-time and length-preserving (i.e. any system that would be considered suitable to be paired with the underlying codec) leaks the same information. You can add padding to hide this, but A) that defeats the purpose of y
Re: (Score:3)
Okay, so, then, what are the teachers in the Charlie Brown specials saying?
Huh? Mr. Smarty-pants?
Language? (Score:2)
TFA was TLDR, but a quick question to those of you with knowledge to understand this... Did a particular language help? Does this work on all languages? Are some languages more secure than others?
IE - Esperanto - Easy to break, but languages with Click Consonants [wikipedia.org] are harder?
Huh? (Score:5, Insightful)
No, I find linguistics pretty useful. Especially since it has some pretty 1:1 relationships with computer programming. And Larry Wall was a linguist. And what kind of lead in is that?
Re: (Score:2)
i find linguistics pretty useful, too, since it's how translation of all kinds works (including code compilation). in fact, it's pretty silly to say anyone doesn't find it useful. maybe they meant studying linguistics is pretty useless, if you're not going to work in the construction of translators. but that could be said of any subject, and the continued propagation of that attitude across all subjects and throughout the population, in a nation operating democratically under the principle of majority ru
Re:Huh? "You might" (Score:2)
"You might", and apparently you're someone who "might not". It's the lead-in for its intended audience, which is non-linguists. And among non-linguists, it is possible that people might find it interesting but not useful. Perfectly accurate, audience specific.
You'd think the linguists complaianing about this would be able to parse out the "...and you might not" which is implied.
Similar work in a December 2010 paper (Score:2)
The article was published in ACM Transactions on Information and System Security, PDF version [unc.edu].
The paper details a gap in the security of VBR compressed encrypted VoIP streams. The authors had earlier found that it is possible to determine the language that is spoken on such a VoIP call, based on packet lengths. Now they have expanded their research and show that itâ(TM)s possible
Encryption... (Score:1)
Re: (Score:2)
with text, if you have a part of the message, it's a lot easier to break the encryption method
This is called a known plaintext attack, and any decent modern cipher should be secure against it (that is, you should learn very little even if I give you plaintext/ciphertext pairs). Modern ciphers are generally designed to be secure against this type of attack, as well as stronger attacks:
Original Slashdot Story (Score:2)
Re: (Score:3)
Yes; this is follow-up work to the paper [acm.org] in that earlier article.
Also important to note, neither paper is specific to Skype; their work is on encrypted VoIP in general. But apparently /. prefers things having to do with Skype for some reason.
side channel exploits latency constraint (Score:2)
If you can compress the data stream from the packet contents to just the lengths of the packets and still recover the word stream, that suggests two things: A) vocal inflection is worth 100 words per syllable, and B) you're not compressing enough in the first place. Yet there's a reason why compression sucks: the low latency requirement. Compression over 5 minute speech blocks would blow this side channel away.
Were it not for the human tension of a conversation amounting to a group of people mutually waiti
And here (Score:2)
Fsck You, Slashdot (Score:3, Interesting)
Re: (Score:2)
Re: (Score:2)
I have confidence that eventually computational linguistics will crack speech/language in general and lead to computers that can learn languages as readily as human infants.
Why do you think this? Really, I want to know, because it is quite a claim.
Actually, two claims:
A) That computational linguistics will crack speech/language and
B) That this will lead to computers that can learn languages as readily as human infants.
Re: (Score:2)
Now how do you solve the hard problem of AI, which is, how do you give it will? How will it want to do something? How
What?! (Score:2)
You mean Skype wasn't smart enough to mix in other sounds while encrypting the original sound?! That is just retarded. Note that I am not a mathematician or any sort of "really smart guy." But I can definitely picture in my mind why this would be somewhat trivial. Vocal sound is primarily frequency modulated which means that the flow of signal will vary in density on a constant carrier. If you mix up the numbers, you will still see a great deal of fidelity in the variations of the frequencies of data r
Re: (Score:2)
if the signal were combined with another sound pattern which the receiving end would know how to properly remove after decryption ..... I have to wonder why this isn't being done. It is simply too obvious to patent.
You obviously haven't been paying attention to the absolute nonsense that has been successfully receiving patents these days.
Re: (Score:2)
I imagine it went down something like this..
Boss: Is the encryption module finished? We need to get 1.0 out in four days!
Programmer guy: Almost, I just need to find a way to pad the voice data.
Boss: What's that for?
Programmer guy: Well, the data stream's encrypted, but someone could find a way to guess which words are spoken depending on the size of the data stream over time.
Boss: But it's encrypted already, isn't it?
Programmer guy: Yes.
Boss: Just finish the module up and help the guys working on the networ
It takes a computational linguist... (Score:2)
To demonstrate the obvious. What do you expect when using high complexity VBR codecs with no blinding of any kind. I sincerely hope this was not news to anyone.
School or Town? (Score:2)
Re: (Score:2)
That all depends on the name of the School, doesn't it?
Private schools tend not to be named after the town. Public universities very often are named after the town they're in. I live in Ohio, I can name quite a few schools named after towns. . .
U of Toledo, U of Akron, Youngstown State University, Cincinnati State, The U of Cincinnati, Kent State University. Out of state I can think of University of Chicago, UC San Diego, UC Berkeley, (pretty much all the UC schools are named for the towns they are in), Uni
Visual example (Score:2)
This is not the exact same thing, but it's a great example of how encryption alone is not enough and it must be done right.
Block cipher modes of operation [wikipedia.org]
Scroll down til you see the penguins.
Have I understood it right? (Score:2)
So, as I understand, it may not be the obvious weakest potential link that has been compromised - the cipher itself for example - but rather a detail of implementation that paved way for their successful attack, right? If Skype fragment the encrypted data stream in variable sized frames that have also rather umm unpredictable (bear with me here) sizes, the attack, as stated by researchers themselves I believe, could not be instantiated in its current form? The entire weakness is based around the fact that i
Re: (Score:2)
Re: (Score:2)
And you sir are a master debater.
Re: (Score:2)
Indeed. In a few years it will probably buy you a frozen pizza... or Kansas.
Re: (Score:2)
it's a large multiple of what you need to buy a presidential election
Re: (Score:1)
It's how you spent it not how much you can buy with it. Skype was probably only worth $3B. Microsoft is acting like a newly rich basketball player by overpaying for needless stuff. Just because you can pay for something doesn't mean you should.
Re: (Score:3)
It's hardly "newly rich" - it's been rich for quite some time. I'd call this more a "desperate grab for relevance".
Re: (Score:2)
Re: (Score:2)
Touche.
Re: (Score:2)