Researchers Map Locations of 4,669 Servers In Netflix's Content Delivery Network (ieee.org) 57
Wave723 writes from a report via IEEE Spectrum: For the first time, a team of researchers has mapped the entire content delivery network that brings Netflix to the world, including the number and location of every server that the company uses to distribute its films. They also independently analyzed traffic volumes handled by each of those servers. Their work allows experts to compare Netflix's distribution approach to those of other content-rich companies such as Google, Akamai and Limelight. To do this, IEEE Spectrum reports that the group reverse-engineered Netflix's domain name system for the company's servers, and then created a crawler that used publicly available information to find every possible server name within its network through the common address nflxvideo.net. In doing so, they were able to determine the total number of servers the company uses, where those servers are located, and whether the servers were housed within internet exchange points or with internet service providers, revealing stark differences in Netflix's strategy between countries. One of their most interesting findings was that two Netflix servers appear to be deployed within Verizon's U.S. network, which one researcher speculates could indicate that the companies are pursuing an early pilot or trial.
And? (Score:4, Insightful)
Why is netflix's server architecture so interesting? Or is science just miles behind the industry in this subject and now they are desperately trying to catch up?
Re:And? (Score:5, Insightful)
Wait, are you saying that information like the server load distribution of a real world system like Netflix isn't useful to studying such systems and designing future ones? There aren't that many systems like this in the wild, and most of them don't release their information publicly, so getting an extra example could be quite useful. If they didn't have this, many of the people complaining here would then instead be complaining about academics who ignore real world systems in their studies. It is not unlike a lot of research work that is done to characterize various products that the manufacturer doesn't provide enough info for, whether the latest cpu performance in a real world test to benchmarking networking gear or detailing the performance of some new high speed camera, etc.
Re: (Score:2)
Reverse engineering trade secrets is fun?
Re: (Score:2)
Re: (Score:1)
Interesting ratio (Score:2)
Netflix has ~4,000 movies and ~1,000 TV shows, any server thus seems to handle just 1 movie and about 1000 connections each (if 5% of their user base is actively streaming at any one point)? This just seems like an awful lot of servers for what I find to be a relatively low load for simply streaming data.
I'm sure Netflix could save a lot of money and network headaches by using a BitTorrent type approach, it would also alleviate the "problems" with the providers, most traffic would stay within their network.
Re:Interesting ratio (Score:5, Insightful)
The BitTorrent approach is the wrong one for two reasons:
1. A lot of people have asymmetrical connections with a very slow upload speed
2. A lot of people have monthly data caps with hefty fees for going over
Re: (Score:1)
Perhaps people would notice and demand at least 3rd world country level Internet from their shitty providers then.
Re: (Score:3)
Re: (Score:2)
You may be paying with your bandwidth for a cheaper netflix for you.
Re: (Score:2)
Here's a bigger reason.
How do you play bit torrent content in a browser or app?
How do you control access to the content based on accounts?
How do you prevent Joe Blow Wireshark Pro from noticing that Senator Blowhard McJesus has been binge watching R rated slasher flicks?
I'm sure they can all be overcome with some coding and customization, but they've already done that work for their existing solution.
Re: (Score:2)
And yet it works for all uses of bittorrent. Bittorrent was invented to distribute data without overloading a weak server and worked that good, that the filesharing community instantly adopted it.
Of course, your upload won't be enough for someones download. But the upload of a few people may sponsor your download.
And the tech is already there, see popcorntime (never used it myself, but a nice idea).
Re: (Score:2)
I'm sure Netflix could save a lot of money and network headaches by using a BitTorrent type approach,
But you're wrong. They use caching servers instead, on the premise that shows are watched in trends. They know more about watching habits than you do, and they are relatively technically competent, so if there were benefit to that they'd probably be doing it already.
Re:Interesting ratio (Score:4, Interesting)
Regardless of how many people are actually watching, 20Gb/s average is pretty cool. Another interesting note is Netflix servers barely benefit from caching data in memory. Each server is handling to many requests per second from so many different customers, almost no customers are at the same point in the same show, and requests from a customers are temporally far away from each other that almost all requests are just random access. It's also interesting to know that Netflix is beyond the 80/20 rule, they're in the 90/10 rule, in that 10% of their data represents 90% of their requests. Predicting which 10% is important, and they can't use normal evict least-used algorithms because that would cause cache-thrashing. They algorithmically predict what will be watched every night, upload the data to be cached and logically "pin it" so it doesn't get evicted.
Other interesting stuff that they support for syncing the servers is each server can be configured to use a different route to pull down its data and even configure the amount of bandwidth, then the servers within a local can sync with each-other with a kind of P2P setup. This helps load balance routes. Their SSD servers hold quite a bit less storage than mech-drive storage, so the SSDs typically are hit first, but hold only the most requested of data. Last I knew, their SSD servers did not support acting as a cache while loading, because of IO patterns that didn't play well with SSDs with mixed sustained heavy reads and writes. They may have changed or may be changing in the near future. I know the biggest reason for this was the way most SSD firmware supported garbage collections could cause long pauses of no activity with sustained heavy writes. One of the changes was for FreeBSD to have a target latency for reads/writes and throttle the writes until latency came down.
Re: (Score:2)
I've seen the same problem on cheap SSD's like the Samsung "Pro" lines. I've had much better results with the Intel DC line though, they seem to be able to sustain their "average" read and write speeds.
Re: (Score:2)
Insult to
Re: (Score:2)
I think you're missing the point of these appliances. Netflix has plenty of capacity to host all its content in central locations.
These appliances are installed at the ISP offices so that the content is as close to the subscriber as possible. That way the quality of the video is not dependent on the quality of the long-haul network from the ISP back to Chicago, Dallas, Ashburn, London, Frankfurt or wherever.
It also reduces the IP transit costs of the ISP, which they are typically paying for based on utiliza
Hopefully... (Score:2)
On the Faroe islands? (Score:2)
Re: (Score:2)
Probably a strategic location sitting on a trans-Atlantic cable. Though, you could also get that in Portugal.
Re:On the Faroe islands? (Score:4)
Netflix will send a CDN server more or less to any ISP which requests one, and is willing to pay the power bill. Do you not remember when many ISPs were loudly refusing to install these free machines even though they would save them money because they objected to "free" colocation on principle?
Re: (Score:2)
The ones complaining loudly mostly objected to competition with their own video product. ISPs that aren't also cable companies or have some other ulterior motive love these caching devices because they dramatically reduce their transport/transit costs and increase customer satisfaction.
The colocation objection was smoke and mirrors crap. These things take up less space than some T1 muxes.
Re: (Score:2)
If you could find out how many subscribers it has in each country, it might not be odd at all.
Also, you have to factor in things like the potential for natural disaster (Japan) and the gub'mint horking your servers in a political/ransom/whoknows move (Russia). Sweden's a good, stable location from which to serve content across the top of the world with little worry.
You mean to tell me (Score:2)
Isn't it common knowledge? (Score:2)
Isn't it common knowledge that Netflix will provider servers/appliances to ISP's who request them in order to cut down on video traffic during peak hours? Why is the fact that Verizon has a few such a big deal?
This program is well known: Open Connect [netflix.com]
Re: (Score:2)
Re: (Score:2)
Even at the height of those games, my 75/75Mbit FiOS never had issues streaming Netflix. I think it was an entirely fabricated issue. I have not however ever streamed 4k content...one of these days I will have to flip open the laptop screen and watch a 4k show to see what the big deal is.
Re: (Score:2)
Heatmap (Score:2, Insightful)
The original paper on Arxiv (Score:2)
https://arxiv.org/pdf/1606.05519v1.pdf
Related slides:
http://eecs.qmul.ac.uk/~boettget/mapping-netflix-coseners16.pdf
Research (Score:2)
Every real scientist is embarrassed, when somebody claims that reversing secret data from companies is research. There is NOTHING researched there, everything was known, just not to you. Research is about discovering something new, not about trying to get somebody else' secrets.
Use your time and money for something useful. Stop tracing netflix servers, start inventing something on your own.
Re: (Score:2)
Indeed, its not only about inventions, but analysis of things and discovering how stuff works.
But discovering how stuff works means like what are the parts of a atom. Not dissecting things, which are built before and you could just ask the builder. Else there is endless science. One team builds stuff and keeps it secret, the other team dissects it.
Keep this to the businesses. Let amazon prime video dissect netflix and keep it to science to find the best way to do the stuff, instead of just finding out the n
Re: (Score:2)
Scientists AND the people, who keep stuff hidden while they know its researched at the same time.
Missing some details (Score:2)
They're missing a few details in their analysis. What about DNS load-balancing returning multiple potential IP addresses per name? What about anycasting IP addresses, or multiple end-points for an IP address depending on entry patch into a location? I think what they really found was a total number of potential current DNS names, but I somehow doubt that is the entirety of the CDN deployed right now. Also, because Netflix is very well known to be static content with controlled client applications, there is