Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Music Media

Running The Numbers: Why Gnutella Can't Scale 287

jordan (one of the founding developers of Napster), writes: "As the rumour mill churns over Napster's future, many folks see Gnutella as the next best hope for the music loving file sharing community. Problem is, Gnutella can't scale . [Note: if that URL doesn't work, try this mirror.] Almost all research on Gnutella up till now has been based on observations of the system in the wild, but this paper discusses the technical merits of that statement through a detailed mathematical analysis of the Gnutella architecture." The kind of numbers that you may not like to read if you figure networks expand to accomodate traffic at a never-ending pace. Update: 02/15 12:24 AM by T : Jordan also points to this mirror for your reading pleasure.
This discussion has been archived. No new comments can be posted.

Running The Numbers: Why Gnutella Can't Scale

Comments Filter:
  • The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no". I did my Ph.D. research on it. It works. Gnutella is broken, but don't draw a conclusion that a server-less environment can't scale. Read before you post this crap.
  • then the RIAA could easily make the case that you were storing illegal content on your machine

    No, reread the parent. You cannot know where the information is actually stored.

    A freenet node is basically a caching router, and AFAIK even the RIAA hasn't yet been able to repeal the common carrier status, so you should be ok.

  • First off Napster is to be praised for its ability to find some rare or bootleg tunes. BIGTIME props to Napster for that.

    Bottom line though is you people seem to forget what it was like in the good ole days for us to pioneer this CRAZE that swept the net. I feel like I should be talking to (Grandkids here saying this) "When I was your age we had to search the web for FTP servers and download them the old fashioned way."
    "I recall having access to a T1 at work when only the elite few had that and was running an MP3 site boasting 1 gig of tunes on a SCSI HD that was in a STATE OF THE ART P150 Dell server( I now have close to 20 gigs of MP3's)"

    Sure Napster is/was great Gnutella although will continue to be trouble...We will all make it.

    BTW if anyone wants to contact me, I will happily workl with you to upload my collection if you wanna open a site somewhere.

    The argument of college bandwidth, alhthough many will hate me for saying it, is legit. I work for a company that installs network management softweare especially to Universities and the ones that have blocked Napster have seen a substancial amount of traffic drop. I do not know what the answer is, I can say I know several gamerz that HATE Napster etc for the amount of bandwidth they lose on campus. Poor guys probably have a Ping of 27 instead of 21


    Razzious Domini
  • by PureFiction ( 10256 ) on Wednesday February 14, 2001 @11:56AM (#431980)
    I am nearing completion of a network that satisfies a, b, c, e.

    I havent started on d and f, but they could be added.

    This project is called The ALPINE Network [cubicmetercrystal.com]

    It scales linearly, and provides a query mechanism that rivals the performance of a centralized directory. (Although the bandwidth is more than a centralized query, but at least you have direct control over how much bandwidth you use and how).

    At any rate, I could use development assistance a great deal. Let me know if anyone is interested.

    Regards...
  • 1. How do you identify all the peers?

    Thats discussed on the site I mentioned, but essentially you each pick an ID to associate a given peer with. Its that simple.

    2. Let's say 10% of those 10K people are doing searches. That saturates a 56K modem, assuming you can really get your packets down to 56 bytes

    It would only saturate your link if all 10,000 searched at once. If they all searched within a 3 minute time period, or no more than 70 in one second, your link will not saturate. And the packet is 56 bytes for an 8 character query. For a 16 character query, it would be 64 bytes. etc.

    What happens when you try to have 100K people? One million? How about the 10 million+ of Napster? Your scheme would not scale.

    That depends on how good of a peer you are. If you dont repsond much, and have a very low link, then you will be at the bottom of those 100,000 hosts query lists, and will get queried infrequently. I cover this on the site, but this is not a problem. The only thing that is limiting your use of the network is how much memory you have (you would need a hundred meg or so for a million connections) and your bandwidth.
  • Ok, but now OpenNap basically just utilizes the Napster paradigm and therefore puts into place Index servers.

    If the RIAA succeeds in suing Napster and blocking their service, which seems very likely at this point. It is not at all far fetched that they will easily be able to receive court orders against anyone else running the same time of service.

    So your OpenNap is not a replacement service because every index server is liable for a court ordered shutdown.

    That and the index server requires bandwidth, bandwidth costs money and how many people are going to donate full T3 lines to this? Thus the service is capped in terms of the number of connected users based on bandwidth available.

    Once Napster is dead, there will be nothing else to replace it at the same scale unless it is operated with the blessing of the RIAA.
  • Well I never said I wasn't biased :)

  • No. The implication is that it's a series. The goal is to figure out what the progression is, and then come up with the next in the series.
  • by BeBoxer ( 14448 ) on Wednesday February 14, 2001 @12:45PM (#431986)
    Yes there is. He looks at aggregate traffic numbers, rather than per-client or per-search numbers. Saying that a search creates 6 GBytes of traffic sounds scary and un-scalable (Table labeled Bandwidth Generated in Bytes (S=83, R=94)). Holly cow, that's a lot of data. Now, table "Reachable Users" reveals that that 6GB of data is searching 7.6 million clients. If we do the math, we find that our traffic level is a little over 800 bytes / client searched (including responses.) Is 800 bytes of traffic for a search unreasonable? I don't think so.

    All the author really does is take an example of a mathmatical formula which grows exponentially and show how quickly he gets "scary" numbers. No effort is made to show whether or not the efficiency of Gnutella breaks down as the network increases in size. No effort is made show how much work is done per search or per result. He just makes assumptions about the gnutella network which results in exponential growth in the number of users, and then shows how the aggregate traffic also grows exponentially. Duh. What did you expect? By this logic, nothing scales.

    Don't get me wrong, I don't think Gnutella scales either. But you don't need to wave around all the FUDdy math that this guy does to prove it. The argument why it doesn't scale is simple:

    The reason is doesn't scale is that every search request (optimally) gets delivered to every client. We don't even have to look at how those searched get delivered. We'll completely ignore the amount of traffic in the backbone, and only count the traffic that has to exist on the last hop to each client. Let's assume that the requests are 100 bytes a piece, or about 1000 bits once we have all the overhead of UDP/IP/ethernet/PPP/ATM/whatever on top. If each search is 1000 bits, and the average client has a 56K modem, the whole thing falls apart when the search rate is 56 searches / second. If we assume 1 million users, each one can only perform a search about once every 5 hours on average before the modem links are 100% full.

    The problem here is the broadcast of every search to every client. Any distributed search network needs to either assume very high bandwidth connections for all the clients (because they are all servers to the whole network) or have some hierarchy of caches / servers. The amount of bandwith being used at each client increases as more clients connect. If the number of users goes up by 1000%, the traffic on my local link goes up 1000%. This is why it doesn't scale. It has nothing to do with how many GB of traffic the network as a whole has to handle. It's the simple fact that the traffic at every client increases as more clients connect. This is the problem that has to be corrected, and Jordan's paper never even mentions this fact, relying instead on big scary numbers. His claim at the end that gnutella generates 2.4GBps of traffic for 1 million users is the ultimate FUD. How much traffic does Napster generate when it has 1 million people connected? He probably doesn't know because their servers go down first.
  • by Greyfox ( 87712 ) on Wednesday February 14, 2001 @12:46PM (#431987) Homepage Journal
    Using indexes? If servers could coordinate and distribute indexes of information available, that would solve the whole searching problem. However you want to distribute them. I was thinking the other day that it would be fairly easy to implement file sharing using dynamic servers that could advertize themselves on the mbone. The servers could collect and distribute indexes of available files with the operator being completely unaware of the data being made available.

    Of course, you'd have to work out how to prevent hostile clients and servers from corrupting your indexes, but I'm sure that's a much more easily solved problem than working out how to prevent some skript kiddie from flooding napsters servers off the net with a DDOS.

  • by cje ( 33931 ) on Wednesday February 14, 2001 @12:02PM (#431991) Homepage
    Get a bunch of investors to run a big fat network pipe into a country with a name like "Ljubilvaniastanistan" where the rutabaga is the national currency and the yak is the national delicacy. Then watch Hillary Whats-Her-Name from the RIAA swallow her own tongue when she learns that her vast legion of lawyers are powerless to do anything about it. Of course, the Bush administration would probably order immediate airstrikes on the grounds of "protecting the wealth-creation security of national corporate interests", but that would be a public-relations nightmare, particularly if we put the new Napster right in the middle of a bustling village full of non-coms.

    In all seriousness, I don't condone mass piracy, but the RIAA has been screwing people for decades and I have to admit that I enjoy watching them squirm. What could the RIAA conceivably do if Napster were located offshore, preferably in a country not bound by the terms of the Berne convention?
  • If you have a server on the internet which you want people to connect to, it's got to be advertised somehow.

    Won't be hard to locate them.

    If the RIAA can get Congress to pass a law which places a substantial fine on those convicted of running internet services for the purpose of piracy... Which isn't that unlikely. The DMCA places something like a $1 million fine on creating tools to subvert copy protection.

    Who will be able to afford to risk running said service? Bill Gates, maybe Larry Ellison. Doubt that'll happen.

  • by StoryMan ( 130421 ) on Wednesday February 14, 2001 @12:03PM (#431996)
    Isn't this the sort of thinking that went into the creation of the internet in the first place? The idea of a decentralized network, etc. etc.

    Maybe Gnutella needs to take the meta-internet approach. A "new" internet on top of the current internet?

    (I dunno. I ask because I'm curious. How is Gnutella in general different than the internet in particular?)
  • And nor does the average user, and therein lies the rub. As long as the interests behind RIAA are smarter than the vast majority of users (they are) you can be quite sure that RIAA will stop rappant piracy. You can say "But but there's always ", normally it's ftp, usenet, etc? The simple fact of the matter that most users don't have the time, the energy, or the intelligence to figure them out. The only reason that piracy has been as popular as it has been is because Napster lowered the bar suffiently low, it brought fast and easy piracy within the reach of a few keystrokes and mouse clicks.
  • You know, I have to take some exception to being called "sheep" because I buy CDs. Do you honestly, deep down, feel that because the current RIAA-supported distribution model doesn't compensate artists fairly, you are striking an idealistic blow for artists by using a model which, by and large, provides no compensation at all?

    Reality check: if you download music copied from a CD sold by an RIAA-affiliated label, you are not "boycotting RIAA-sanctioned music". Boycotting means you are willing to go without a product on principle. That's not what you're doing. What you're doing is, at the least, legally considered stealing the music (presuming you don't own the CD already or buy it later)--and I'd have to say it's philosophically pretty dubious. If you didn't "just want the music," you wouldn't be getting it for free.

    If you want to boycott the RIAA, you have to support artists who make their work available through "non-RIAA-sanctioned methods." But trading their music for free through Napster is not support.

    It's easy to defend Napster for what it might become. I think digital music distribution is coming, soon, and I suspect it will live without the RIAA. But it will require a viable business model for the artists, not the record companies, that allows an average, "second-tier" artist to get equal or better compensation than they would from a record company and provides a reasonable level of promotional support for concerts, merchandising, radio airplay, and the like. Napster does not provide this model. A future model might be free as in speech, but currently Napster is unequivocally free as in beer, and we're not doing ourselves or anyone else any good by pretending otherwise.

  • by Anonymous Coward

    The problem with Napster is that it has a single point of failure. The problem with Gnutella is it doesn't have an index. What you want is an index of all files with no single point of failure.

    An index is a root node, which points at branches, which points at leaves. So make 10 copies of the root, 10 of each branch, 10 of each leaf, and put each on a different transient machine. (If you think 10 roots is too few, have every user keep their own copy of the root. It's not big.)

    Then here's your protocol: Ping the roots one at a time, choose the first that responds. The chosen root pings the duplicates of the correct branch one at a time, chooses one. The chosen branch pings the duplicates of the correct leaf one at a time, it chooses one. The leaf sends the results back to the user.

    Updating the structure is the same, with the addition that nodes occasionally try to sync with their duplicates. You end up with duplicates never quite in sync, but so what.

  • I should warn you that it is not targeted toward file sharing, but massive-multiplayer VR and games. Some of the concepts are still the same, but you may need to extrapolate to get to others. Here is the link, and I'd be interested in any questions or comments you have.
    http://www.npsnet.org/~howard/dis.pdf [npsnet.org]
  • No, FidoNet requires a 'devlivery' or 'bulk transfer' protocol.

    The protocol used by ALPINE is for messaging. The types of broadcasts are very small packets. Usually 50-60 bytes. This makes a huge difference.
  • Just fyi, the RIAA did provide evidence that CD sales were impacted by Napster. They used the report that focused on college campuses. Now, you and me both think that report was bunk, but the appealate court "agreed" in their recent decision, that the report was valid. For more info, read the recent decision. It's all in there.

    MyopicProwls

  • And if the server is located in some small country (or not so small country) that doesn't respect American laws, what is Congress going to do?

    -jon

  • You've apparently never used a news client since 1992 or so. These days all of the collating and uudecoding is done behind the scenes. Just select a file in Pan and press "D". In fact the Usenet is a great way to distribute Fansubbed Anime without overloading any particular server.

    Now the problems (that havn't been mentioned yet): data on the usenet has a short lifetime, frequently less than 24 hours. If you don't keep on top of it, it is easy to miss things (like the fourth episode of a series). Second, you can't search out a particular song on the Usenet, you have to more or less take what is available. If you are looking for a particular song, the Usenet may not be for you (although you can certainly request it).

    Down that path lies madness. On the other hand, the road to hell is paved with melting snowballs.
  • provides a reasonable level of promotional support for concerts, merchandising, radio airplay, and the like. Napster does not provide this model.

    I really disagree. First of all, Napster unquestionably provides a distribution model that provides a reasonable level of promotional support for artists. It's really great how many new artists I've discovered because of MP3s. Not necessarily just because of Napster, but if a friend says (as often happens) "hey have you heard this new stuff from Boo Williams?" and I say "Boo who?" (no pun intended -- go download his music it's great) then all of a sudden Boo has an opportunity to have his music heard by someone who never would have heard it otherwise. Super!

    Thankfully, MOST of the artists that I listen to have come down on the PRO NAPSTER side. This includes Ben Folds Five, Green Day, Limp Bizkit, The Offspring, Chuck D, and others. Unfortunately, some of my favorite artists have come down on the other side. These include the most vocal three: Metallica, Dr. Dre, and Eminem. That sucks.

    MORALLY I get over the problem. Is it morally wrong for me to want free music? I don't think so. Is it morally wrong for an artist to produce work that I listen to for free, never buying his CD, never going to his concert, never buying his T-Shirts? Perhaps. Perhaps not. But certainly it is no worse that middlemen becoming so ridiculously rich by screwing me with $18 CDs. CDs should be between four and six dollars; about half should go to the artist (about what they get now, or a bit more).

    Trust me, I sleep just fine at nite having spent the whole day listening to MP3s. And I do own CDs -- oh God do I own CDs. I counted once, and I probably gave the RIAA $4,000+ in my (short) lifetime. That's a lot. A whole lot. I figure they are still $3,500 ahead after all the free downloading I've done.

    I also, by the way, have 'discovered' artists via MP3s and Napster, and subsequently bought their CDs and gone to their concerts (e.g. Cypress Hill and Lavay Smith -- don't laugh) so those artists are definately ahead because otherwise they wouldn't have seen any of my money at all.

    MyopicProwls

  • Yes, of course. Gnutella makes only the most limited attempt to ensure privacy. Case in point: a while back (in fact, it still may be running) a server returned fake matches to requests for kiddie porn, and published the IP addresses that had been caught trying to download the "files" on a webpage. I don't have to tell you how indignant some people got, and for the funniest reasons.
  • by jordan ( 17131 ) on Wednesday February 14, 2001 @12:07PM (#432021) Homepage
    Did you check the math? Seriously, the numbers are not "big and scary like" because I was spinning them a certain way, they are big and scary because they are.

    I even share the equations and methodologies I used, and try to poke holes in my own conclusions.

    Further, I'm not a competitor. I haven't worked for Napster in 3 months. Before Napster my background was in poking holes in things anyway. All I did was finish a personal project I started a long time ago.

    You actually sound more like FUD than anything. :-)

    --jordan

  • The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no".

    Not so fast. Right now, the biggest problem with decentralized networks is that they all have some form of routing/forwarding. If you got rid of routing/forwarding, then they could scale.

    For instance, lets say you have a napster style peer group, 10,000 peers. What if, to query these peers, you sent a small UDP packet to each of them directly? No routing, no forwarding. How long would this take?

    Modem: 2.5 minutes
    DSL: 13 seconds

    I would say that this is an acceptable period of time. And the bandwidth used was all your own, nobody elses, except for the 56bytes each peer received for that single packet they got from you.

    I am working on such a network, its called The ALPINE Network [cubicmetercrystal.com] and has all the features mentioned.

    So, if you get rid of the forwarding/routing you can have a decentralized network that scales linearly.
  • You've still got to query each peer, and a linear search like that isn't acceptable when you've got a lot of people hitting a big database

    There is no big database. There are lots and lots of little databases.

    And the network adapts to load. I go into this somewhat on the site.

    The important thing I want to point out is that this network is used mainly for locating content. Once you have found it, it may reside within Freenet, it may reside within OpenCOLA, or MojoNation, etc. And then you will benifit from their architecire for the actual delivery of the data.

    The broadcasts are used solely for discovery of resources, with the delivery being a whole other scenario. UDP is a horrible bulk transfer protocol.
  • No, because you control exactly how often or how much response you provide.

    If you are getting swamped, you will respond to less and less queries, and then your quality in the eyes of those peers will drop, thus, you will receive less and less queries.

    This is actually a balanced type of configuration, which handles load in an efficient manner.

    Also note that over a DSL line, you could receive in excess of 10,000 queries a second.
  • by roman_mir ( 125474 ) on Wednesday February 14, 2001 @12:14PM (#432041) Homepage Journal
    These big scarry numbers actually look very very close to what normally a network analyst would predict for Gnutella. Gnutella network will display network slowdowns with increasing number of active nodes, that is true simply due to a fact that the networks have limited resources, the physical networks will stay the same, the software running on them can concievably bring the physical networks down. Caching data is a good solution for Gnutella but note that it is only good if you use a client that does caching and note that Internet users generally don't like sharing their own resources (I mean their bandwidth) with the neighbours.
  • What they could prove is that you transmited copyrigted material, regardles of wether or not you actualy stored it on your machine or not. And thats the problem. You can't get in trouble for downloading or storing files, its the uploading that get's ya.

    Amber Yuan 2k A.D
  • A simple query with an 8 character query string would be 56 bytes. The string above might be as much as 160 bytes.

    I did some monitoring of the gnutella network for a few months, and the size of an average query is about 8-16 bytes at most. Many many queries where even less.
  • If the users are chained together through ids one hop at a time, then you would have to route and re-ruote a query for their ids before you even do anything!

    No, you missed a major point; there is no routing and no forwarding.

    This is what makes it so simple, and linear. You directly communicate with all the peers you want to query. Everyone directly communicates with each other. The only thing this implies is a transport service which can support a large number of concurrent connections efficiently. This is what DTCP does.
  • The big problem with usenet is the avialability of files. If I wanted to download Metallica's Master of Puppets, I would first have to see if it is there. If not, I would have to request it, and wait some time for it to appear on the newsgroups (and all parts are there). If someone uploads a 128k/s MP3, and I wanted a 192k/s, then I have to request & wait again. Napster/Gnutella avoid this problem by allowing me to search all of the music that is available, and getting the files I want right now!
  • How would you protect such a network from a cuckoo's-egg attack [slashdot.org]? If any machine on the network can declare itself to be a server, then the RIAA can set up servers that poison the network with bogus data. If the client has a list of servers that can be trusted, then once the RIAA's lawyers get their hands on the same list, they know who to sue.

    (If the people running the RIAA and MPAA had been clueful, they would have been pursuing this strategy against anonymous file sharing from the very beginning. If 99 out of 100 requests for insert-top-forty-song-here on Napster return William Shatner singing "Lucy in the Sky with Diamonds", then most people would rather pay for the CD than sift through all the false results. But I digress.)
    --

  • Why not using IRC ? I mean, it's there, it's reasonably reliable, and allows both centralized and P2P communication.

    The "client" would be a bot. It would join a channel (say, "#bjork" or "#trancegoa") and to make a request, it would simply utter something on that channel in some protocolish language (eg "SEARCH 'Bachelorette'"), and other bots would respond in a P2P fashion (ie /notice the_asking_bot IVEGOT Bachelorette.mp3).

    This would deliver us from the scaling curse as it is described by Jordan's paper. This would also lead to a Usenet-like classification of available files among channels (if you like david bowie, you would /join #davidbowie), thereby bringing people with common interests together. Technically, IRC networks are the best example of a semi-centralized-yet-free network I can think of.

    Think of this: Napster was made as a sharing system, where people could chat. We have a chatting system. Why not allow people to share files on top of it ?
  • > I'm speaking practically here. I'm going to visit 10,000 cities. Please give me the absolute guaranteed best route (in my lifetime, if you please).

    You are wrong. Either you are speaking theorically, in which case the salesman problem is trivially solvable, or you are talking practically, and you don't give a shit about the *best* route. A good one will be sufficient, and there are very good heuristics for that.

    Cheers,

    --fred
  • And the crap part, is that it's JAVA BASED.

    *vomit*

  • by Omnifarious ( 11933 ) <.eric-slash. .at. .omnifarious.org.> on Wednesday February 14, 2001 @04:03PM (#432060) Homepage Journal

    You build something that uses a distributed algorithm to build a spanning tree. The nodes near the top of the spanning tree become the servers. You build the algorithm so that parents in your spanning tree will naturally have more bandwidth than you do.

    I've been thinking about this for a long while.

    Building the spanning tree isn't hard. Every node just selects one and only one parent node. They tell the parent that they're a child of that parent. You prevent cycles having a parent refuse to be a parent unless it also has a parent. If it loses its connection to its parent, it tells all the children that it no longer is a parent. One node 'seeds' the network as a root by saying it can be a parent without being a parent and not looking for a parent. Eventually it can delegate roothood to a child that has proven high bandwidth. It cannot cease being a root without doing this delegation.

    You can have connections to nodes that are neither parents nor children, but search requests should not be propogated to those nodes unless you have no parent. Eventually a search request will make it onto the spanning tree and be efficiently distributed.

    You can eventually elect servers who are near the top of the spanning tree. Nodes should, in general, elect parents that have more bandwidth than they do. This means that nodes near the top of the spanning tree should have the most bandwidth.

  • I already had an intuitive grasp of what he was talking about, and his numbers seemed ballpark correct to me. I too thought the result set bandwidth numbers looked a little fishy, but the others seemed fine.

    I've been thinking about this for months.

  • by Omnifarious ( 11933 ) <.eric-slash. .at. .omnifarious.org.> on Wednesday February 14, 2001 @04:12PM (#432064) Homepage Journal

    I really want to build this with my StreamModule system [omnifarious.org], but nobody is helping me with it, and I don't have the time to hack it out, especially since I'm so ridiculously methodical when it comes to code.

    You build something that uses a distributed algorithm to build a spanning tree. The nodes near the top of the spanning tree become the servers. You build the algorithm so that parents in your spanning tree will naturally have more bandwidth than you do.

    I've been thinking about this for a long while.

    Building the spanning tree isn't hard. Every node just selects one and only one parent node. They tell the parent that they're a child of that parent. You prevent cycles having a parent refuse to be a parent unless it also has a parent. If it loses its connection to its parent, it tells all the children that it no longer is a parent. One node 'seeds' the network as a root by saying it can be a parent without being a parent and not looking for a parent. Eventually it can delegate roothood to a child that has proven high bandwidth. It cannot cease being a root without doing this delegation.

    You can have connections to nodes that are neither parents nor children, but search requests should not be propogated to those nodes unless you have no parent. Eventually a search request will make it onto the spanning tree and be efficiently distributed.

    You can eventually elect servers who are near the top of the spanning tree. Nodes should, in general, elect parents that have more bandwidth than they do. This means that nodes near the top of the spanning tree should have the most bandwidth.

  • by Merk ( 25521 ) on Wednesday February 14, 2001 @01:24PM (#432072) Homepage

    Ok, this time I did a bit more thorough check of the numbers. I agree with the first half, the traffic generated by the request half of the message. What I'm not as convinced of is the response side of the equation.

    I don't know what the typical percentage of Gnutella users sharing files is, so I'll accept your figure of 30%. But 40% of those sharing files having a match? Even with your reduced number here I think it's high. If 40% of people sharing files had a match that would mean with default settings you'd get: (N=4, T=5) 484*(0.3*0.4) = 58 people finding a match. And with the numbers you use later of 10 matches a person you'd get 580 matching entries. I've never received anything near that high. But if I did, I certainly would have no motivation to increase T or N.

    What happens if it's only 10% of those sharing that have a match? With the default settings you'd still get 14 people matching, or about 140 matching entries. That's still a *lot* of responses, more than I've ever received.

    If all your default numbers are used, your nightmare scenario would yield 0.3*0.4*7,686,400*10 "found" responses to your query. That's 9 million 223 thousand 680 "grateful dead live" songs (though not unique) shared among 900 thousand deadheads who are all simultaneously online. Whoa.

    I'm not an expert in human psychology by any means, but let me suggest this. With most tools, people don't feel any need to "tweak" them unless they're not working right. With 480 songs returned, I don't think many people would feel a need to tweak their settings. If someone was having a hard time finding something they might then change their settings -- but if they were having a hard time finding it they wouldn't get so many responses returned.

    The only way I can imagine those monstrous amounts of data resulting from querries is if it happens by maliciousness or mistake.

    Am I missing something?

  • The core issue here is the distributed nature of Gnutella (and other pure peer-to-peer models without centralized indexes), and its effect on bandwidth.

    With Napster, the bandwidth usage from the query is negligable. A single packet (your query) goes out to a single destination (the Napster index server). A small handful of packets (your listing of places your desired song is located) comes back. A few K total, then you get your 4mb transfer.

    With Gnutella, the bandwidth usage from the query is significant. Your query goes to several peers, which then forward it to other peers, etc... and each server with the song requested sends you back a packet. Looking at the numbers in the analysis shows that your query will quickly generate more bandwidth usage than the actual transfer (which you'll still have to do to get your song). The bandwidth hit is distributed, true, but it still adds up, and grows logarithmically with the user-base rather than linearly.

    Gnutella's success depends upon a significant portion of its users also being servers (i.e. making files available for download) -- being a provider as well as a consumer. There's a server-side hit, too... with Napster, a provider of files sends a few packets to the Napster index server advertising its wares. Aside from the bandwidth usage of the actual transfers the provider is serving, very little impact. With Gnutella, every query within your range will hit your server. Bandwidth usage from queries will quickly outstrip bandwidth usage from transfers, and this will tend to discourage people from being providers.

    Please, don't get me wrong here. I think that peer-to-peer will be the future, but there are problems to be solved. Gnutella, as it stands now, will not scale well... the math in the paper in question is good, and matches real-world observations. The challenge is managing the queries, routing the queries intelligently, and keeping the bandwidth usage down "below the radar" of backbone providers and system administrators.

    I don't know what can be done about the bandwidth usage of the transfer itself, but keeping the query traffic down will help in keeping administrators and providers no more filesharing-hostile than they already are. Now is the time to be treating these people well, instead of antagonizing them further. You don't want to bite the hand that feeds you your bandwidth :)

    This problem has been solved before, by the way. Think "routing tables".

    • Disclaimer: I work for a company that does bandwidth management. What bias that gives me, I don't know... only fair to let you know.
  • by tswinzig ( 210999 ) on Wednesday February 14, 2001 @12:20PM (#432079) Journal
    But if the price of gasoline goes up, you can bet your last dollar that teleportation will be made practical. Or that cars that use fusion will be developed.

    Not everything is practical just because there is a need for it.


    Great straw-man rebuttal! How about if you try a more rational analogy? Going from gas combustion engines to teleportation or fusion power is a tad bigger leap than going from Napster to a similar service! And Napster ceasing to exist versus gas prices climbing higher is not analogous either...

    A better analogy would be:

    "If we run out of petroleum-based fuel, a similar or better form of energy will come to the forefront."

    And that's ABSOLUTELY TRUE, reasonably proven through a huge mound of empirical evidence.
  • by selectspec ( 74651 ) on Wednesday February 14, 2001 @11:05AM (#432080)
    Of course we've discussed this twice already here [slashdot.org] and here [slashdot.org].
  • No one said that distributed P2P needs to be a Napster clone. Free software authors frequently make the mistake of just coping some retarded existing commercial software (witness the influence of Windows on Gnome and KDE). We need to try a lot of totally diffrent ideas too. Here is one example:

    Your system plays the role of file server by offering a list of available file and plays the role of search server for you by collecting the lists of available files from diffrent people. The key here is that only you search your own system's database, so only you get taged for the cost of collecting the databases of too many diffrent systems. Clearly, your system needs to figure out automatically which nodes it should track by remembering where you actually find stuff, but this should not present any real problem. You would also introduce a little randomization by tranking random nodes for a limited period of time.

    This might work just as well as Napster for people who always DL the same type of music (like Tech for me). Clearly, you would not be able to show off to your friends by DLing any song they request, but that is not really that importent.
  • by dasunt ( 249686 ) on Wednesday February 14, 2001 @12:25PM (#432084)
    Reality Master 101 writes: Not everything is practical just because there is a need for it.

    Warning: Rant Ahead!

    Partially true. In your example, you said that if price of gasoline went up, teleportation or fusion-powered cars wouldn't be developed. I agree. However, if the price of gasoline went to $20/gallon tomarrow (an outrageous rate, but its just an example), then we'd either see a changeover to natural gas/electric or some other alternative energy source vehicle, or cars would be developed that got 400 miles/gallon.

    So why would gas/electictric cars be implimented and not fusion or teleportation? Well, first we have a demand for transportation. The demand for transportation is rather high, at least in the developed world, and especially in the US, since all of us seem to want to live in the woods and commute to the city. Therefore, if the demand is high, we *will* find something to fulfill the need, as long as the cost of fulfilling the demand is not so great that we have to sacrific other, equally important demands. We don't commute to work via helicopters because the time, money, and energy we would have to exert to be able to use them isn't worth the extra few minutes we'd shave from our commute time. We don't commute to work with buses because we prefer living in areas with lower population densities (e.i. suburbs) which make buses impractical and we don't like the inconvience of having to conform to the bus's schedule and having to interact with other members of our community. We are looking for something that fulfills our need to get from point A to point B with the lowest oppertunity cost to us. This is the economics/social side of the scale. On the other side of the scale is the harsh laws of science and technology, which dictate what has been done, what is possible, and what is impossible, and what the costs for doing each are. Say we have a possible solution set such as this { car (gasoline), car (electric), walking, teleportation, car (fusion) }. Science tells us the teleportation looks impossible. Therefore, we eliminate it. Technology tells us that fusion powered cars haven't been done yet, and considering everything that we know about "hot" fusion, its doubtful we could ever fit a fusion reactor in a vehicle the size of a car. We are now left with gasoline-powered cars, electric-cars, and walking, in this simplified example. Walking is too much of an inconvience to us, science doesn't have a problem with it, but human nature, and the time it would take, plus distance that would have to be traveled, make it impossible. On the economic/psycology/social side, walking isn't happening. So what will it be, electric or gasoline? The technology that's in place makes gasoline-powered vehicle cheaper then electric, and gasoline, even at the high prices that it is lately, is still an economical means of transport. Plus, we have human nature, gasoline is tried and true, electric isn't. Electric also has some problems with travelling long distance, and infrastructure doesn't support electric right now. Therefore gas is the best solution to our problem. In the future, if electric becomes more ideal then gasoline (enough to override our habit of sticking with what we know), we will switch.

    So, we learn this. Each problem/solution pair depends on economics, human nature (psycological/social), science, and technology.

    Lets apply this to Napster, OpenNap, Gnutella, and the rest of the field. Napster was nice and easy, a lot of us became accustomed to using it, and the technology (on our end) was cheap. However, Napster is either dead or moving towards a fee-based service. All of a sudden, from the economics viewpoint, Napster is less ideal. OpenNap is simular to Napster, there is the additionaly hassle of finding a server, but since Napster is having trouble, OpenNap seems a lot more attractive. However, OpenNap from the social viewpoint, is insecure, it has a central server, it can be attacked. Therefore, what do we have left? Gnutella is free of cost, and cannot be shut down through elimination of a central server. It is harder to use, and technology says it won't scale in the current format. Plus, it eats up bandwidth like a hog. :) However, science says its possible to build a gnutella-like network that will scale. Therefore, we have NeoGnutella, which will be built if there is a big enough of demand, or OpenNap. OpenNap is, as we mentioned before, is easier to use, and simular to the Napster that most of us know and love, while NeoGnutella will have the benefit of never being able to be shut down. What will win? I personally think that both will survive, due to the fact that there is a large enough market to be divided up by 2 players (again, simplified example) but that OpenNap will probably grab most of the Napster fallout due to simularity to its commercial cousin. However, if OpenNap servers become attacked legally and thus often shut down, we will switch to the NeoGnutella because finding one "node" that we can persistantly connect to is a lot easier then refinding OpenNap servers, even if OpenNap seems to scale better then any distributed net solution, and even if OpenNap is more familiar. Therefore, the long term outlook for Gnutella depends upon if it will be adapted to scale, and if OpenNap will be attacked, as well as other issues not addressed in this rant. We all have different wants. OpenNap, Gnutella, Freenet, FTP/HTTP "warez" sites, IRC "warez" channels, Napster, (formerly) Scour, and other services have evolved to meet this need. Since Napster was the most appealing to most users (and because of media hype), it became one of the biggest file sharing programs out there. Now since Napster has a rocky future, another method will become the biggest.

    The above was a rant, and presented simplified examples. I didn't mention gyro-driven cars, monorails, carts hauled by penguins, or bicycles, amoung other things, because I was trying to keep the examples simple (and carts hauled by penguins aren't really practical). I didn't mention stuff like how critical user mass applies to file sharing systems because it didn't pertain to the topic of the comment. So please, don't flame me with a comment how widget-driven cars are the ideal solution, or that file sharing also depends on bandwidth. Nitpicking just wastes both of our time. On the other hand, valid comments are appreciated. :)
  • More frequent queries can also be transformed into a couple of 1 byte special codes:
    • 127 = 'sex'
    • 128 = 'mp3'
    • 129 = 'Metallica'
    • 130 = 'Natalie Portman'
    • ...
  • Most FUD is not an outright lie. It is quite possible to spread FUD by stating the truth in an unfavorable light. When Microsoft critizes Linux for having dozens of distro's which are not 100% compatable, that's FUD even though they are technically correct. In your case, your argument is basically that a system with millions of users will generate billions of bytes of traffic. This fact, while true, does not mean that a system is not scalable. In fact, it only means that the million users are generating modem-levels of traffic. 6GBps spread over a million users is ~48Kbps. Modem speeds. If your definition of un-scalable is any protocol which allows the users to run their modems at 85% capacity, then I think you need to include lots of things including FTP and Napster.

    That's why I think your paper is FUD. It throws around big scary numbers which are technically correct, but which are in fact very modest levels of bandwidth when averaged over the userbase.
  • "The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no"."

    That question bugged me so much, I will like to answer it for you, the answer is YES! I figured out a solution, after reading the paper yesterday, I spent my time in class scrawling and pondering over that, and I have a very simple elegant solution, I can't believe it! So, I am going to perform some experiments first before I make a fool of myself, but I certainly think it can be done. If I told you how, you do smack yourself in the forehead and say, "of course!"

  • ..it can't scale. Anyone taking a computer science class should've realized that...

    On my campus, we've been using Limewire to make a private Gnutella network. We use it to trade files with each other. That way we're not all trying to get the same files from the internet. It's much faster. People at other colleges should try it.
  • I saw no analysis of the fact that multiple computers I am connected to will be connected to each other. I believe the paper essentially looks at gnutella as a tree with the searcher at the root. This is clearly wrong. I believe that the result of this is: less bandwidth per search due to repeated queries being dropped, and less hosts reachable due to fewer new nodes at each level out. However, it actually makes things worse, because a node may recieve a query multiple times, it just won't pass it on more than once. SO, bandwidth per search per node hit by search goes up. It scales WORSE thn predicted.

    Also, this means that the population P DOES have an effect on the number of reachable users, because as P increases the number of redundant connections will decrease. Don't have the math to prove it, but I think that's the way it works.

    Also, is there analysis of why gnutella can't scale in terms of P? I can see why it won't scale in terms of number of users I can reach, but why not in total users, IF users are content to let themselves be limited to a small fraction of the network (this should be enforced by the clients. I know people can wrte their own, but they shouldn't write them to allow huge TTLs).

    Also, what of the reflectors? [clip2.com]

  • by Leon Trotski ( 259231 ) on Wednesday February 14, 2001 @11:08AM (#432107) Homepage
    The cool part about Freenet is that it's SECURE. It's anonymous, and it's crypto-tastic. This means that Freenet is not subject to the kind of witch hunts that we've seen with systems like Napster, where individuals with certain IP addresses were booted for distributing Metallica MP3s. They physically cannot tell where you are, if you're running Freenet. It's also decentralized, meaning there's NO SINGLE POINT of FAILURE that can be brought down by technical or legal means. No Bertelsmann deals, in other words.

    Freenet is also very well architected, unlike bogus Gnutella. It's designed to scale up, so that popular stuff gets cached all over the place. Like, more people downloading means that your connections go FASTER. This is cool.

  • by Splork ( 13498 ) on Wednesday February 14, 2001 @01:52PM (#432108) Homepage
    Is it possible?

    Yes!

    By using an internal microcredit/payment system (called mojo) and localized reputations Mojo Nation [mojonation.net] aims to do exactly that. Better connected brokers (peers) will naturally become more "server like" due to having a better uptime, lower latency and a lower mojo cost overall for other brokers (peers) to use.

    The resources in the system are allocated dynamically. No strict heirarchy needs to be defined, it will establish itself appropriately for each individual peer as it is needed.

    PS a new version (0.950) was released today.
  • by SquadBoy ( 167263 ) on Wednesday February 14, 2001 @11:08AM (#432114) Homepage Journal
    The OpenNap servers are *very* good. I don't think I've used a Napster server for several months now. Grab gnapster and get this [freshmeat.net] and you are good to go.
  • It seems to me that the principal bad assumption of gnutella was that <i>forwarding search requests</i> costs less than <i>forwarding file lists</i>. The second problem is the network topology, though that can be fixed relatively easily, and some of the newer client/servers seem to be tackling that problem.

    If you switch to a more napster-like model where each user submits a file list, then freeloaders don't consume as much bandwidth. You develop a database over time as you stock up on file lists. The downside is that you can't just join and search (though maybe asking nearest neighbors to search could be part of the protocol). Since users might update only a few times per day or less, the overall bandwidth use isn't that high.

    For the topology problem, I would suggest more of a ring-chain topology, with some redundancy (backup connections in case a link breaks, and multiple rings that are sparsely interconnected).

    This is fun stuff to think about. Similar problems are present (self-organized networks) in "bottom-up" nanotechnology. Maybe I should ask for a DARPA or NSF grant for nanotech research and spend my time and money working on a p2p network...
  • by karld ( 141289 ) on Wednesday February 14, 2001 @11:08AM (#432117)
    Was that not the line in Jurassic Park about an enzyme prohibiting male offspring? Well, Gnutella may not scale well today, but a legion of MP3 loving programmers WILL find a way to share music. The proverbial cat is out of the bag and millions of consumers have tasted blood. The RIAA cannot put this Genie back in the bottle and only significantly lower prices for music with added goodies will bring buyers back. And it better be online, reliable and good.
  • They reported that CD stores around college campuses had GROWING sales.. But the sales weren't growing quite as fast as they were elsewhere.

    This could be from any number of causes.

    1. People at a college might have more straightjacketed finances and can't afford to increase their CD spending as fast as the general public.

    2. People at a college might tend to order online more often, thus satisfying their music consumption through non-local stores.

    3. People at a college may be joining CD clubs or may be purchasing CD's at home where they have convenient access to a large collection and bringing them to college instead of purchasing them near college.

    4. A statistical anamoly. A decline in sales isn't actually happening.

    5. A million other possible reasons.. Colleges are drugging their students so they purchase textbooks instead of CD's.

    The conclusion: While such a correlation may exist: college cd purchases aren't increasing as fast as the average in the nation, that could have been generated by any NUMBER of possible causes.

    If you want statistics I'll believe: Take universities who's student populations are similar demographics that do and don't have (say) napster, and ask them how many CD's they purchased in the last year. Or use some other technique that isn't susceptable for the flaws #1-5 above and give me numbers that don't have obvious artifacts.
  • by Reality Master 101 ( 179095 ) <RealityMaster101NO@SPAMgmail.com> on Wednesday February 14, 2001 @11:09AM (#432123) Homepage Journal

    Anyone who understands how Gnutella works (unfortunately, too few people) knows that Gnutella is horribly broken, will never work, and is basically unfixable.

    The more relevent question is whether you can have a peer-to-peer network without central servers that *can* scale. And the answer is "no".

    However, the REAL question is whether you can have a peer-to-peer network with decentralized servers, i.e., with clients that automatically establish a heirarchy among all the clients, and certain clients become more "server like". They only way to make a Gnutella work is by making it heirarchical, but the heirarchy needs to be automatic for it have the same general "virtual network" aspect of Gnutella.

    Is it possible? I don't know. You would probably have to have automatic bandwidth measurements, depth probes, all kinds of things to make it work. I simply don't know if it would be possible to automate something like that.


    --

  • Er; you did see who the author of the article was, right? Not exactly one of the record companies favorite people... Napster co-founder Jordan Ritter.

    You're saying they paid him off, or did you just not bother to read the header?

  • IIRC, and I am not sure that I do, but isn't there some bug in the windows TCP/IP stack that you can't have too many "open" udp "connections" at once?

    All of the communication is done through a single UDP socket. DTCP is a multiplexing transport protocol which operates over a single UDP connection.

    You are correct about the number of open UDP sockets though. On any UNIX or NT variant the limit is usually 1024 to 2048 per process, and 64k per IP address (the PORT value in UDP or TCP is only 2 bytes)

    This is why native UDP or TCP cannot support the required number of connections to perform direct queries to each peer in a large network.
  • by reaper20 ( 23396 ) on Wednesday February 14, 2001 @11:11AM (#432132) Homepage
    Gnutella is neat, but for a reliable MP3-only service, check out Audiogalaxy. [audiogalaxy.com]

    At first I was put off by the web interface, but:

    1) It remembers everything you request in a queue and will get it when available. (A must for dial-up users)

    2) Auto-resume using temp files.

    3) A small app in your system tray/console only sends/receive when you have it running.

    The greatest advantage is that ZDnet/CNet/MSNBC and other DON'T mention audiogalaxy in their "quest for Napster clone" articles, so the quality of users, and therefore the music, is excellent.

    Unfortunately, it is a centralized system, but so far, it seems the mainstream media/RIAA have ignored it.

  • I had thought of IDA as a secret sharing scheme like Shamir's. Thanks for bringing this to my attention!

    I found the original paper:

    MICHAEL O RABIN : Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance [google.com]

    Basically, it means you can break a file of length L into N chunks each of length L/M, such that only M chunks are needed to reconstruct the file. It's exactly the right thing for these circumstances.
    --
  • Go ahead. Strike me down all you wish. I have more karma than you could possibly ever imagine.
  • Heh. Grandpa indeed.

    MP3 first came out in 1996.

    But it almost seems like forever doesn't it? To me it's encouraging that this stuff is so new because it means that in 4 years I'll be a "grandfather of the internet" too. :)

    This is a truly fun time to be alive.

  • Yes, and Napigator [napigator.com] works, too. The only thing is, when Napster's official servers close down, the OpenNap servers may experience their own "Napster flood" effect. I've already been unable to connect to some of the more popular opennap servers from time to time because of user limits.
    --
  • What if a subtree has a parent that isn't connected to anything with greater bandwidth? Would the subtree never join the spanning tree? Oh wait, it's the internet, everything can connect to everything.

    I'm not sure what you mean by this. The bandwidth constraint would only be a guideline on deciding on a parent, not a straightjacket. That guideline, consistently applied, will tend to push high bandwidth nodes towards the top of the tree. I'm considering connections here to be pretty fluid. Almost as fluid as current Gnutella connections.

    The expected depth of a random spanning tree is around sqrt(n) for an n node graph. It would be good to balance the tree.

    The fact the connections and made and broken reasonably frequently will tend to cause the tree to become bushy.

    What if the root goes down without performing any protocol? Are all children inaccessible when a branch node goes down until the spanning tree has been renegotiated?

    I've thought about this. One solution would be to have the root's children hold an election as to the new root. Another would be to have the root designate an emergency root.

    I consider this problem not too hard to solve. The much more interesting problem is when you have two nodes who think they're the root.

    Are you still sending all requests to all leaves? If you are, you've cut the bandwidth down from n! to n. Napster does logn, beating this handily.

    That was my intention initially. A later version of the protocol could use information about the spanning tree to designate caches that had all the information for either a subtree or the whole tree.

    You also have to admit the going from n! to n is a huge improvement. :-)

  • by ajs ( 35943 ) <{moc.sja} {ta} {sja}> on Wednesday February 14, 2001 @11:13AM (#432151) Homepage Journal
    Isn't this failing one of the main reasons [sourceforge.net] for the creation of Freenet [sourceforge.net]?

    I understand that there are basically three reasons for Freenet:
    • Abolition of censorship
    • Archival of documents based on their percieved "usefulness"
    • Elimination of standard bottlenecks in most peer-only networks (I hate the term peer-to-peer, but won't digress into the rant behind that statement)
    So, do we really care that Gnutella lasts any longer than the time that it takes to get Freenet everywhere?
  • I love AudioGalaxy. It's a lot easier to use (in terms of searching), downloads auto-resume, and downloads automatically come from the fastest available connection nearby. You also tend to get far fewer truncated files because it will, by default, download the most popular version of the MP3 (in terms of size and bitrate), but you can also custom-select which version to download yourself.

    It's definitely worth a try (and blocked by far fewer firewalls and ISPs than Napster!).
  • The trouble with news is one lost article screws your download. But that's what error correction is for! A simple Hamming code allows you to, say, break the file into 26 data shares and add 5 error-correcting shares such that the file can be reconstructed after one share is lost; you can do better with more sophisticated error correction schemes.

    I haven't seen any P2P proposals which make use of error correction technology, and it does seem like it might be useful.
    --
  • by jordan ( 17131 ) on Wednesday February 14, 2001 @11:14AM (#432157) Homepage
    FUD? Just read the math, man. Make your own decision, sure, but read the paper first. There's nothing FUD-like about the mathematics in the paper.

    --jordan
  • I still don't see why it's FUD. FUD is about deception, about spreading information that aren't true. My math, at least AFAICT, is not blatantly off, and while my assumptions are numerous, at every turn I question them and raise the reader's awareness that they could be off, and why. I think it's ridiculous to characterize my research as FUD. I'm not attempting to deceive anyone.

    As for why Gnutella can't scale, the point of my paper was not to duplicate other work or research. I don't mention a lot of the reasons because I think they are either irrelevant or different methodologies arriving at the same point. The premise of my paper wasn't to cover the practical limitations of Gnutella, since those have pretty much been beaten like a dead horse. The premise was to take an alternative angle at addressing the question "Can Gnutella Scale?", simply by calculating network impact with some math, and provoking some thought.

    In other words, you look at 6GBps and say "FUD!!! That number is wrong!!" I look at it and say, "Hmm, well, 6GBps or 4GBps, makes no difference why or how, it ain't gonna work."

    --jordan

  • by tcc ( 140386 ) on Wednesday February 14, 2001 @11:15AM (#432162) Homepage Journal
    Simple, with all that media franzy going on (Napster trial even got 1st page covering in my local newspaper) it's a big-scale advertisement for MP3. Yes Napster has a userbase of 60 Million so using the argument that it's only specific individuals that are doing it is wrong, but if that story made it in my local newspaper (and we could see a mention for gnutella too), guess how many people that didn't know about it or napster will be curious to try different services out.

    Now there will be media coverage (other than internet) mentionning other alternatives like IRC, Gnutella, search engines, etc etc, this is really a stupid move... not counting the many people that is going to be pissed off at RIAA and stop buying CDs.

    RIAA should have worked closely with napster to bring a decent buisness model instead of bashing on them, they might have actually profited from that. They've shown how many "copyright material" were leeched every second (around 10,000) but did they show EVIDENCE that their sales decreased DUE to napster? no, they didn't have to, but if they would have, things wouldn't be that way. You bet after napster shuts down, their sales will decrease, I, for a start, will not buy anymore CDs.

    I hope a company picks on big artists for digital distribution and doing something like stephen king, a buck a download, money would go STRAIGHT to them and the record label would stop it's own piracy (i.e. ripping many artists off and taking the public for complete morons).

    For now Gnutella will do for most people, and if people SHARE, maths or not, it will work, not as nicely as napster did, but there will be a bunchload of alternatives if gnutella isn't doing the job.
  • I did my Ph.D. research on it. It works.

    Got a url to it? This sounds like an interesting read.

  • but i cant help but wondering if the total generated traffic is really the bottleneck, or if it would be the sum of the latencies that really slows it down...

    i mean, each client is only passing a small amount of data between each, so i dont know if the agregation (sp) of the total bandwidth usage is a ... useful ... measurement...


    tagline

  • by jordan ( 17131 ) on Wednesday February 14, 2001 @11:18AM (#432172) Homepage
    The paper is also available at http://www.tch.org/gnutella.html [tch.org] .

    --jordan

  • by PureFiction ( 10256 ) on Wednesday February 14, 2001 @11:18AM (#432176)
    I am currently working on a fully decentralized searching network. You can read more about it here. [cubicmetercrystal.com]

    The key aspects of this network will be:

    - No forwarding. This is currently eating gnutella alive. A UDP based multiplexed transport protocol is used to maintain hundreds of thousands of direct connections to all the peers you want to communicate with. You can also tailor your peering groups precisely to what you desire, as far as quality, reliability, etc.

    - Low Communication Overhead. All queries that are broadcast are performed with minimal overhead within UDP packets. A typical napster breadth query (10,000 peers) would take a few minutes on a modem, and seconds on a DSL line.

    - Adaptive Configuration. Peers that have better or more responsive content will gravitate towards the top of your query list, thus, over time you will have a large collection of high quality peers which will greatly increase the chance of you finding what you need.

    There are a number of other features, however too much to detail here.

    Also, this is under heavy development, and not operational. I am going solo on this at the moment, and so progress is slow. However, once completed, it *should* be a scalable alternative to completely decentralized searching / location.

  • But if Napster gets squeezed, you can bet your last dollar that it will be made to. Or something like freenet or audiogalaxy will take over.

    But if the price of gasoline goes up, you can bet your last dollar that teleportation will be made practical. Or that cars that use fusion will be developed.

    Not everything is practical just because there is a need for it.


    --

  • What about the *pre-user* bandwidth? Even if you have Gigs worth of data to move, if you have millions of users and things are split up evenly, that's only kilos per user. The clincher is looking at peak bandwidth at any given node, and comparing that to capacity. Did I skim the paper too fast, or did it not address this rather thorny mathematical question? Not that I believe Gnutella scales smoothly at all.
  • On the contrary. Napigator [napigator.com] is a nifty little freeware tool that lets the Napster client program use other Napster servers. The OpenNap network is huge and not going anywhere anytime soon...
  • ...and will continue to improve if only folks would move to newer, more robust, and more compliant clients. If you're still running gnutella 0.53, or even Gn0tella, check out BearShare at http://www.bearshare.com/ [bearshare.com]. You'll be surprised at how far Gnutella has come - that only hints at how far it may go in the future.

    Critics said man would never set foot on the moon. Now critics are saying Gnutella is doomed. Funny, they've been saying that since March of last year and I'm still happily downloading MP3s. Ignore the critics and keep the faith.

    Shaun
  • Freenet is also very well architected, unlike bogus Gnutella.

    The problem with Gnutella is not the transferring of files, it's the searching. You'll note that Freenet conspicuously avoids the subject of searching, except for "yeah, we're thinking about it... real soon now!"


    --

  • First, 10,000 peers isn't exactly world domination. If your network is successful and swells to 1,000,000 peers (still less than 1% of the Internet), suddenly you're tying up your modem for 250 minutes per query.

    No, you connect to as many as you want. You can stop at 10,000, half a million, etc. Each peer is in direct control of how much bandwidth they use, how many peers to connect to, and how many queries they perform.

    Second, presumably other people are making queries, too. If there are even 20 queries per second, your modem link will be saturated even if you're not making any queries of your own.

    Thats where ALPINE comes into play. It allows the ordering of peers based on quality and value of responses. If you start getting busy, you simply quit replying to queries, and your perceived value to those peers drops, you then get queried less.

    The details are nore complex, but you should never encounter saturation unless you specifically configure you client to do so, and even then its unlikely.

    Third, discovering and storing a list of 10,000 peers -- not to mention 1 million peers -- is prohibitively expensive. Remember, there's no centralized server dishing out lists of addresses

    You build them up gradually. And continually refine your list over time, so that you eventually have a list of similar peers with quality content and service. You dont get one million peers all at once. There is a discovery protocol in place, where you can ask for a number of peers from one you are already connected to. No need to do it all at once.

    Third, discovering and storing a list of 10,000 peers -- not to mention 1 million peers -- is prohibitively expensive

    You can store the connection information for 10,000 peers in 2 megabytes of RAM. The DTCP protocol is specifically designed to be very compact with almost no overhead per connection.

    Fourth, the amount of churn in a group of 10,000 peers is quite high -- nodes are arriving, leaving, and crashing all the time. Even if you could find out about all 10,000 peers, your link would be saturated keeping up with changes in group membership

    There are protocols for resuming connections if you IP address and port change. Also, these peers would have a perceived loq quality in comparision to more stable nodes, and thus would move down your list. you may not even need to maintain a connection to them at all.

    The design of the server is also similar to a daemon process. Your GUI or client would interface with the server through a CORBA interface. You can shut down the client and the server is still running. You can reduce bandwidth usage if you wish, or shutdown the server. However, it is designed for a more persistant presence than most peering services.

    And last, your network inherits all of the d-o-s, spam, and privacy problems inherent in any broadcast-search network. Gnutella has demonstrated these problems (if not solutions to them) handily. Learn from the idiocy of others.

    Actually, this should be less of a problem than you would suspect. DoS is still only as bad as TCP. There is a connection protocol similar to TCP with handshakes, etc.

    Spam is even less of a problem, as you can ban peers which spam or attack you. Peers can share this information in a growing pool so that spammers and rogue clients are effectively ostracized from the network. Each peer can decide who and when they communicate with. It puts the power back in your hands.

    By the way, these were some very insightfull questions. Thanks for the reply.
  • by karot ( 26201 ) on Wednesday February 14, 2001 @11:23AM (#432189)
    So, Jordan, you provide a nice demonstration of a flaw. It is considered polite in many circles, that when destroying someone's hard-work, that you make a peace offering in the form of some assistance.

    Can we expect therefore to see an equally interesting and thorough discussion of how Napster/Gnutella can grow, evolve and perhaps merge, to provide the "ideal compromise" where we will not need 100Gb networks, but where:

    a) The destruction of any significant %age of the network is transparrently ignored or healed.
    b) The network will not segment as GnutellaNet can.
    c) Bandwith requirements are low[er]
    d) Anonymity of participants is maintained where required.
    e) The law can't shut it down so easily.
    f) Data can be secured, encrypted and/or signed (etc.) for specific users

    And MY personal wish:
    g) The end result is so globally accepted for file exchange and storage, that FTP dies a death, and we all live without buffer-overflow exploits for the rest of out lives :-)

    Note that Napster and Gnutella were very one-sided in their freedom with files. There was no facility available to ensure that the law wasn't honoured where desired.

    --
  • This was a plea for development assistance ;)

    I could very much use some additional C++ development talent to help with this project. Anyone who is interested please let me know.

    Thanks...
  • When you talk solely about downloading mp3's, I've tried both Gnutella and Napigator. I've always found Napigator to be more stable, easier to use, and more likely to provide good downloads than Gnutella. Better yet, Napigator works with existing Napster clients to bring da music to da masses.

    If its trading of MP3's at stake, I beleive that Napigator and nap servers like OpenNAP will save the movement, and not Gnutella.
  • I posted a list of what seemed to me to be very difficult issues in the first few weeks of the gnutella-dev mailing list (there don't seem to be archives online, not a good sign).

    This included the fact that load on each server grows proportionally to the total number of servers, so the total CPU usage for the whole system grows quadratically. There are also serious issues with naming, searching, tagging, and other things that could have been dealt with.

    There didn't seem to be much interest in this so I moved to lurking the Freenet mailing list, which seems to be a much more grownup way of doing the same thing.

  • Return Rant:

    I'd say there is no way to put the genie back in the bottle, either by products dying out or by legal action. Now that there has been a taste there will eventually be one or more working models. None is likely to have the instant dominant position Napster had (except possibly Microsoft's offering if they bind it into Windows) but that doesn't mean the concept will die. File-sharing is a simple concept and a very addictive concept so it's something with low market entry and lots of possible market share. That will drive companies to invest. Us geeks will invest jour time just to keep the companies from sealing us in and because we like to hack code. I myself was working w/ file-sharing concepts long before Napster existed and am sure I will be long after. The concept has no doubt been growing ever since the invention of email. As a species we like being able to communicate freely. That includes text messages, voice messages, movies, photos, music, games, etc. Therefore there is no way the idea of sharing these things will die out. They'll just get thought about some more and new better concepts will be tried over and over until we find the perfect one. Email, ftp, gopher, web, instant messaging, Napster, etc are all steps we've taken.
  • certain IP addresses were booted for distributing Metallica MP3s. They physically cannot tell where you are, if you're running Freenet.

    A few months ago, I tried to find a simple, lucid discussion of exactly how FreeNet works with IP anonymity. On a technical level, but without having to plow through the code. Anybody want to try? I'm not disputing that it works... I just don't see how you can prevent someone from sitting on your router and watching packets fly, and correlate the IP on the other end to a single system.

    --
    Evan

  • Why wouldn't it be ?

    Why isn't the travelling salesman problem solvable? Why is pattern recognition such a difficult problem when humans do it so easily?

    Don't underestimate the difficulty of the problem of a self-organizing network. It is definitely a non-trivial problem.


    --

  • can I get prosecuted if they have my ip?

    Yes. They can track your ISP, obtain a court order to search the ISP's logs, obtain your information, and arrest you.

    is this likely to happen?

    Short answer: no.

    Long answer: Do you know how many people do this stuff? If the FBI went after every copyright violator in the nation, they woould need an incredible amount of manpower. IF you aren't reproducing and (important) selling bootlegs, nobody cares. You've been taking the "FBI Warning" at the beginning of videos way too seriously. ;-)
  • With most tools, people don't feel any need to "tweak" them unless they're not working right.

    Uh, right. Hands up everyone who actually needs to compile the latest, greatest kernel? Hands up everyone who did anyway?

  • Comment removed based on user account deletion
  • by omega_rob ( 246153 ) on Wednesday February 14, 2001 @11:39AM (#432213)
    Hey, that wasn't a troll! That was freaking FUNNY! I want my karma back!

    *sigh*

    omega_rob -- friend of the dread pirate Napster

  • Had an idea for a solution to file sharing over the internet without the vulnerability of a centralized static site serving as the database:

    The client software would have preference settings allowing a user connecting to the fileshare system to indicate their "elegibility" to become a temporary Database Host. Options of Always, Ask, and No.

    Client software would work like this:
    Access specific IRC Channel and Query established hosts.
    Hosts (the temp. Database Hosts) would respond stating who they are, and requesting the client's share list.
    Database Hosts would negotiate which host would accept the new clients list.
    Client would then be told which host to transmit list to, and when its next update would be expected

    Search requests are then transmitted to the Hosts through IRC, results are returned directly to clients by Hosts. 1 to 1 transfers are then initiated using cilent's choice of protocol.

    When clients contact hosts indicating they are still online, the Hosts will ask client program about Server eligability. Database Hosts will change to those who indicate a preferable host environment.

    Of course there's specific things to work out, but what do you guys think? Use IRC as a central communications channel for everything, and use a randomized central group of systems as centrallized databases - faster search returns than gnutella can produce, but at the same time, the lack of an easily shutdown central server.

    Just a thought. Don't have the skills or time to write up a trial client.

  • by ShaunC ( 203807 ) on Wednesday February 14, 2001 @11:40AM (#432221)
    Forgot this the first time around. Here are some tips to improve Gnutella's performance for yourself and for everyone.

    1. Never connect to more than 5 hosts at a time. There's no need for it and you'll only hurt yourself by doing so. I used to spend a lot of time in the gnutella.wego.com discussion area, and then the GnutellaNews boards, helping out new users. Time after time someone would come in and say, "Gnutella is shit! I type in a search and I don't get results for 10 minutes!" Me: "How many connections do you have open?" Them: "50, and if I try with 100, it goes even slower!!"

    The more active connections you have, the slower your Gnutella experience will be... And by being a congested node, you're adding latency to the network for everyone else. Set your max connections to 5. That gives me, on average, an overhead of 6-10K/sec in background chatter, not counting uploads/downloads.

    If you're on dialup, max your connections out at 2 and (it hurts to say this) don't share files or you won't be able to do anything else online. If you really want to share - and that's a good thing - cap your uploads at 1. Leave routing up to the people with the fatter pipes.

    2. Go for diversity in your connections. If you load up your client and see that you're connected to 5 RoadRunner nodes, dump a few of them and try to connect to other networks. Peer-to-peer file sharing relies a lot on peering, after all. Connecting across ISPs, networks, and even across countries is a good thing.

    3. Don't share junk files. Please. Every time I search for Pink Floyd and get a ton of under-1MB MP3s in the results, I want to kill someone. Know which directories, if any, you're sharing... And clean them out from time to time. All those incomplete downloads you made are being sent out as search results, but nobody is going to download them from you. Those are a lot of wasted bytes coming through your query hits.

    4. Perhaps most importantly, use a good client. See the parent for details.

    Shaun
  • Well, first of all, I didn't destroy any hard work; in fact, all I did was prove mathematically what the smart folks at Xerox PARC and Clip2 DSS have been saying all along.

    But you're right, what's the next step? Well, I think there's a lot of great ideas out there already, but the technology in general is all still quite juvenile. I don't believe we'll see wide-scale adoption without a better Internet infrastructure to carry the traffic. Coming up with smart ways to ferry around data will always help, but in the end 14.4k is 14.4k, and 56k is 56k. Following the logic of Power-Law Phenomenon, fully distributed networks will probably never scale without the lowest-common-denominator amount of bandwidth being raised significantly.

    --jordan

  • by PatJensen ( 170806 ) on Wednesday February 14, 2001 @11:34AM (#432233) Homepage
    I was surprised to see that no one mentioned Usenet! Usenet is meant to be a distributed news system but easily accommodates many binary, multimedia and music groups. Usenet is easily scalable with the right infrastructure in place and can handle many clients (readers) across the Internet.

    Tools like NewsShark and NewsGrabber make it easy to post or obtain binary formatted files such as multimedia and there is plenty of it available. No waiting for downloads, no acne-faced punk kids aborting them, and you can batch and resume at your convenience.

    Usenet isn't that hard to use and there is a lot of music that can be found from your ISP's news server. Grab a client and check it out!

    -Pat

  • by Minupla ( 62455 ) <minupla@@@gmail...com> on Wednesday February 14, 2001 @11:36AM (#432237) Homepage Journal
    Natually the router-sniffer could see that you were transmitting freenet traffic, but since it uses peer-to-peer encryption, the best you might be able to get is trafic analysis information out of it.

    The other thing you could do would be to take over a node on the network, and request the material you're interested in, but since freenet uses relay nodes, you can never be sure that the information you're recieving came directly from the node you are talking to or through N relay nodes. Also the data is encrypted on the harddrive of the node operator, so you, provably, cannot know if you are storing illegal data or a copy of Johney's essay for school.

    Hope that helps,

    --
    Remove the rocks to send email
  • by trcooper ( 18794 ) <coop@re[ ]t.org ['dou' in gap]> on Wednesday February 14, 2001 @11:50AM (#432240) Homepage
    IMHO It's a long way from being decent.

    Downloads are generally horribly slow. Generally most of my downloads on Napster/OpenNap servers come in around 25 - 100Kps. Audiogalaxy claims you get the fastest for your location, but I can't see how I'm getting 1-2.5Kps downloads if that's really true.

    Selections not too bad, but you can't find the obscure stuff that you'll find on a network the size of Napster. It's organization is a step in the right direction, better than Napsters, but could be better.

    What I really don't like about it is the fact that you have to choose which version each time. Sure, you're supposed to get the most popular version, but myself I don't like 128K mp3's. I prefer 192K files. So each time I download a song I have to choose which one I want. It would be nice if I could tell it I prefer 192K songs, and it would default to those.

    With Napster I can find an entire album in a search and queue it up quickly. With AudioGalaxy it takes several clicks. You also don't have the ability to browse a users files. This is one of my favorite things about Napster is the ability to browse other users files. Sure Audio Galaxy gives you logical choices of other music, but I frequently find things I like that don't logically go with the song I initially searched for.

    Basically I think AudioGalaxy is a good idea. I'd like to see a better client, maybe standalone or a Java client so it would have a little more flexibility, and I'd like to see more potential interaction between users.
  • by Minupla ( 62455 ) <minupla@@@gmail...com> on Wednesday February 14, 2001 @11:51AM (#432243) Homepage Journal
    But neither I, nor anyone else can prove that I store illegal content on my machine.

    Freenet is a totally peer-to-peer system. It is not possible to tell weather I'm sending the file directly to you or am just transmitting at the request of a node behind me. And if it's possible to procecute for that, then ISPs everywhere are in BIG trouble :)
    --
    Remove the rocks to send email
  • by Mikiso ( 178087 ) on Wednesday February 14, 2001 @12:33PM (#432255) Homepage
    Perhaps a system which implaments Server On Demand(tm)(r) technology would fit here. You need to make a distiction between client and server. In this case, a client a program which retrieves specific data and stores that data for personal use. A server would also retieve data but then offers that data to clients. Servers also store data based on what people want. We've seen that in FreeNet. Nothing new here.

    This is how server on demand would work. A client starts and searchs for a server. It could be a local broadcast, searching a list of last known servers, searching a list of servers from a config file, asking the user for a server address, etc. If no server is found, the client automatically becomes a server. If a server is located, the client can begin to request data. If the server is under heavy load, it will ask a client (meeting certain requirements) to become a server. Of course, clients have a choice, perhaps a little config switch to either allower server status or never allow it. Anyway, the client needs to be worthy of server status. Some criteria might be uptime, bandwidth, available storage space, number of hops, etc. The burdened server would give the invitation. One thing that might be implamented is an automatic server invitation for clients which are located along the same route as more distant clients (say, somewhere in the middle of a 50 hop route based on response times). The middle server would then handle the requests for the more distant clients.

    Obviously, we need to maintain a list of the quasi-centralized servers. Clients can maintain their own list of servers. Servers can hold lists of other servers. Search requests are never handed to clients. Only servers are searchable. Therefore, clients may publish available files to server databases. Servers may ignore these if clients do not meet certain guidlines.

    We could incorporate a trust level into the server list. Once a client is deemed worthy to be a server, it is trusted. Certain bad events (dropping from the network too often, loosing storage space, etc) could reduce the trust rating. If that rating goes too low, server status is revoked and a new server can be appointed. Good events might raise that trust.

    Anyway, that's a bit of my rambling. I'm not a network engineer so I couldn't describe the scalability of this method. It might work or might not. So comments are welcome! ^_^

FORTUNE'S FUN FACTS TO KNOW AND TELL: A giant panda bear is really a member of the racoon family.

Working...