Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
News

Robo-chattel? New Legal Challenge to 'Bots 109

milomilo writes "Extending on the eBay vs. Bidder's Edge case, the NY Times reports (free registration required) that a Manhattan judge has granted a preliminary injunction against Verio from using 'bots to harvest up-for-renewal prospects from Register.com's WHOIS. The theory's that bots use up a piece of the target system's resources, denying its use to the owner. (Question: would search engines be different, presumably because they also confer a benefit on the target by making it findable?)"
This discussion has been archived. No new comments can be posted.

Robo-chattel? New Legal Challenge to 'Bots

Comments Filter:
  • "The theory's that bots use up a piece of the target system's resources, denying its use to the owner"

    How about not placing a machine on a free, publicly accessible network if you`re not up for letting people use it? Its not like its being DOS`d or anything, is it?
  • so If an article linking my site gets posted to Slashdot and my website/isp is flooded and pukes, can I sue VA??? This is a very very dangerous precident.
  • This whole thing is an interesting question, really... whether requesting something can be an attack.

    I.E. you nonmaliciously (meaning, it isn't a DOS, you're actually getting information) ask for large gobs of information off of some site, the way these bots did.. or the way a spambot might.. they call this "denying services", but still, it's a simple the questioner requests, the answerer replies. If it's "unauthorized use".. well, how can you talk about unauthorized use on a public server? How can these things, authorization and to who, be implied on a public internet? Should it be the job of the requester to not go where they clearly shouldn't be, or the job of the requestee to keep them out?

    Or look at it in terms of a port scan. I request things from each of these ports, thus figuring out which are open (and thus vulnerable to attack). I've seen people try to procecute this based on "unauthorized usage of machine".. well hold up, who said you had to authorize something? This person is just sending pings to ports, on a machine that by its presence on the internet you have implied responds to traffic. Why on earth would you need "permission" prior to using a system? If so, how would that permission be obtained? .. but of course none of this changes the fact that the port scan is almost always part of a malicious cracking attack.

    Or, let's say-- hypothetically-- there was a single-line javascript that, if accessed from a windows NT machine, would cause the kernel to be overwritten by 0s. If you put that up on a web page, would that be "hacking"? You didn't break the machine yourself; you politely ask the machine to break itself, and it complies. Is that your fault?

    But then, when you get down to it, all forms of "cracking" could be seen as requests. I request you process this block of information that just happens to cause a buffer overflow... you didn't have to process it, now did you? That last bit doesnt' really sound reasonable.. you have to draw a line somewhere, you have to note somewhere that it's no longer a request but an attack. Somewhere, for the sake of sanity, you have to draw the line, and how do you do that? Intent? How do you prove intent in court? What's the difference between the slashdot effect and a DDOS, at an abstract level?

    But still how the hell can you say it's illegal to ask for something because the questioned might give you an answer even though they don't want to...? That's where the law is heading, where it's been heading for awhile, and that's completely absurd.

    There is no right answer here, is there?

  • by Anonymous Coward
    Seem like these ruling are being made based on a necessary fact of life about publicly accessible network communications. 'Using resources'? of course, they made the resources publicly accessible, so we could use them. Now they want to sue, in order to restrict use to only the people they want.
  • by Masem ( 1171 ) on Friday January 12, 2001 @06:36AM (#511809)
    Ignoreing the bad link...

    Question: would search engines be different, presumably because they also confer a benefit on the target by making it findable?)"

    The standard search engines, such as google, altavista, etc, know and obey robots.txt, which is the same as Register.com's policy of not allowing spambots search through their site. If, after a robots.txt file is in place and the search engine continues to index it, I would say there's a good legal case there.

    Now, more interestingly is tools that 'mirror' web sites; they still are using a resource that you've made publicably available, except doing it over a timeframe that is much shorter than a human can do it, which usually means more resources used up at the server end. These bots tend not to follow robots.txt rules, and are only defeatable by User-Agent blocking. If the above ruling stands, does it apply here?

    Take it a step further: Ebay has taken action to stop meta-Ebay sites that index their site and make it easier than ebay's search engines to find things or to search multiple auction websites. Even though the information that is up is publically available from ebay, and IIRC, they still won, mostly because the information is still ebay's property and they didn't like it on other sites.

    Which all leads to an interesting question: when you click on a link, does that start a clock in which you have temporary copyright ability to download the information to your local computer, and after some time, that ability 'expires'? If so, sites that index or mirror without further authorization could find themselves in trouble...

  • Supermarkets will be more than happy to kick you out if all you do is go in and write down prices of things, so just because the info is there, and you're open to the public, doesn't necessarily mean it can be used any which way. (Not that I agree with that.)

    Moreover, the issue isn't you walking into a store just to look around with no intention of buying, it's as if you had 30 thousand you's go into the store, clogging every square foot, just looking with no intention of buying.
  • I don't know about your "javascript writing 0's" example, but I have seen a site set up so that clicking on a link will crash (100% guaranteed) any Windoze machine. Linux and similar systems are not affected, you just get a banner showing Bill Gates getting hit in the face with a pie.

    But I seem to remember that the link you have to click to do this says something like "Do not click this link" so maybe it's OK.

    I've been fighting the temptation to put one like that on my website (jsoftco.8m.com [8m.com] - shameless plug!).
  • by Anonymous Coward
    From the Fish. [digital.com] My favorite: Freely after the slogan: Man, those are thick, man!

    AMSTERDAM - TEASERS
    TEASERS
    (EX-CHooters)
    Damrak 36
    1012LK Amsterdam
    T. 0031-20-4287508

    FIRE WARM UP PARTY in the Teaser's to 22.04.00 of 13.00 - 16,00 o'clock

    The Teasers of sport bar, typical American bar, was always the meeting place of the NFL Europe fan (particularly with the Scots) for their Party's.

    Special flag are the waitresses, who are inferior to our Pyro's hardly. And if one has then times birthday, then the waitresses let themselves also which "nice" be broken in (not truely, Living putting)! Who would not loving have exchanged with you gladly!

    Freely after the slogan: Man, those are thick, man!

    By the way: The Teasers was called in former times Hooters , like the American branches. The name did not change, but in the concept to anything. It continues as in the Hooters. It was probably probable more a license problem. The Disco is inferior also during the day no Discothek. A D.J. with a violent sound system and Lightshow brought still each Partymuffel in tendency. The meal is very good in addition, not cheap. The Damrak connects the Dam Square with the main station. The bar can be attained by both workstations within 5 minutes. Beside the Teasers an excellent typically Dutch Frittenbude with megaportions of Pommes and thousands saucen is direct.

  • I always see people posting comments in which they try to justify their port scanning and whatever else they may do, as "white hat cracking". It is also common to use as example an unsecured house. Well, let me say, that if You were to walk down the sidewalk in front of my house and look in the window, so be it. that is fair use. If you choose to stand on the sidewalk and stare for the long term, then I will likely turn on the sprinklers. If you bring a chair, I will both turn on the sprinklers and call the police. Now lets assume you leave the sidewalk and come up for a closer look through my window. If I catch you doing so, I will beat the crap out of you and call an ambulance to come get your remains. If you decide to check my doors to see if they are unlocked, I will, without hesitation, put a bullet through your skull. My machine is online for my convenience, not yours. You have NO right to access any portion of my machine to satisfy your own curiosity. If i catch you doing so, and can determine who you are. You will get a visit from either me or the police. You should hope it is the police.
  • by Jerf ( 17166 ) on Friday January 12, 2001 @06:44AM (#511815) Journal
    I hate to jump on this bandwagon, but this is just way over the line Slashdot! You have a staff of people, a whole freakin' staff and you seem to spend less time on the homepage of your site then I do, all alone, on my weblog! In sheer people hours spent on the site, Katz appears kicking the ass of the entire rest of the Slashdot crew combined!

    What really ticks me off is that "The Old Media", through which many people still get their news, has latched on to Slashdot as "The New Media", meaning that Slashdot will be reflecting on my own efforts, and the efforts of anybody else trying to run a 'new media' style website. This is why I post this; Slashdot's flub-ups are personal and affect us all. The flub-ups affect people running new media sites (by tarnishing the reputation in the eyes of the Old Media press who doesn't care to dig past their original generalizations), they tarnish the reputation of Open Source (as they have been labelled the spokesperson of the Open Source movement by the same collection of media entities), and they tarnish the reputation of VA Linux. (Hey, anybody at VA listening? This is not good return on your investment!)

    Slashdot editors, wake up! You are not invincible. You can be replaced, and in Internet time, too. Please get some ethics, before you convince thousands or millions that the New Media doesn't have any!

  • I have a column [webtechniques.com] on this very issue (linking as "trespass to chattels") in the current issue of Web Techniques [webtechniques.com]. One of things I discuss there, in the context of streamlinking, is that a better solution to these issues generally is for developers to use links generated on the fly or session-specific permissions if they want to block linking.

    This may not be a solution for Register.com, as it has to follow standard protocols for the implementation of Whois, but we shouldn't let this Register.com precedent carry over to other scenarios. For standard linking problems, the better solution is to use code to prevent linking, not court orders.

    -- Bret
  • You thought that all Slashdot readers were "grown up enough"? Blimey. You can't be a regular.

    I'd also suggest that, proof-reading links aside, it's your own fault for reading Slashdot at a client's site. No wonder your boss isn't giving you any billable hours if you spent them reading and posting to a news site.

    I loaf around at work by reading Slashdot though, so I'm hardly holier-than-though. :-)
  • Even better analogy: you give your client list to your secretary, your competitor asks your secretary for it, and s/he hands it over willingly. There's no legal issue here; you just need to set up your web server to not return certain results if you don't want them to be public, or at least don't return them to certain people.

  • It's what I don't understand. I see this kind of incomplete story, or even poor articles, hit the homepage. But when I try to post something a bit usefull my stories are rejected.
    For example my last one was yesterday, about the first genetically modified monkey borned in the US. 2 links where given, one to the BBC and one to CNN.
    If this is how Slashdot works today, it's sad
  • It certainly seems to me that putting a computer online as an internet server is to make its space a public place, much as a walk-in retail store has an area where people can look at the products the store offers. An argument such as "bots use up a piece of the target system's resources, denying its use to the owner" is analogous to saying that a customer can't walk into the store because it denies the owner walking space. Why are they online? Maybe they should get back off.
  • and Verio suddenly caves in.....But it seems that thousands of their user use the "whois" gateway constantly. Mostly dialups if you do a reverse DNS lookup.

    There are a million ways around the injunction. (see example above) I think from a moral stand point the judge is correct. Unfortunately being morally correct doesn't mean a damn thing.

    SPAM uses extra CPU cycles, in some cases spam causes users to go over quota (is that a DOS attack against my users?) Is SPAM outlawed, is anyone REALLY doing anything about it?
    NO
    Do i think that this ruling will change the unsrupulous? No. Hopefuly Verio will show us that it is a "good guy" and go about it's business the right way.

  • by Richy_T ( 111409 ) on Friday January 12, 2001 @06:21AM (#511822) Homepage
    No, it's more like saying customers can't drive their cars up and down the aisle of the store or perhaps that people who are not employees of the store can't go through the door marked private. Ever see a sign which says "Shoes and shirt required"? The store has the right to control the manner in which people access the space

    Just because the machine is connected to the public internet does not mean that the machine is open for anyone to use however they please. This is enshrined in UK law these days (Note the gradual disappearance of "Welcome to hostname" for login prompts, it can be argued it's an explicit invitation for hackers to enter your machine

    I mean, your telephone is connected to the public network but would it be OK for me to set up a bot to constantly dial your home to see if you'd dropped the price on the car you were selling?

    Rich

  • I can agree with that, but you can still place data in the public with certain restrictions (on the web at least). For example, I can place an article or a picture online that belongs to me and require that it not be redistributed or otherwise used beyond viewing on my site. Similarly, some WHOIS services restrict any commercial use of their service -- and they can or may have done that at Register.com, which would presumably disallow this action.
    ---
    seumas.com
  • 'from the what'll-they-think-of-next dept' ? I already wonder what he was thinking when posting the article...

    Btw, think about the 'hooters' admin that have its site slashdotted right now...
  • I bet Hooters ISP is freaking out trying to figure out what all this traffic is, and why its going to hamsterdamn's Hooters.....wonder if a Hooters has ever been /.ed. And of course we all know what Hemos browses at the office when he is supposed to be working :>
  • Yes, sure. But think about this.
    If there were more serious stories, the ratio between serious and dumb replies would be higher. I've often seen that some stories only get 10-20 posts, not more. And they don't even appear on my Slashdot homepage. I've to use "older stuff" to see them.
  • But what application of the net does not consume resources on a machine other than you're own? Someone needs to tell this judge to shut down the net. It's EVIL.
  • respecting robots.txt is entirely optional. there is nothing at all stopping search engines/spiders/whatever from completely ignoring it.
  • I miss the days when CT and Hemos regularly patrolled their site, and would fix problems rather quickly. One can only assume that now they spend their days surfing for Hooters sites :-)

    ObOnTopic post:

    Question: would search engines be different, presumably because they also confer a benefit on the target by making it findable?

    The difference between what Verio is doing and search engines is one of implied permission.

    Web sites either grant or deny permission to search engines based on the /robots.txt file. If a web site wants to be indexed, they put permissive rules in robots.txt. Verio is spidering for their own commercial gain, and ignoring a number of posted policies against it. That is apparently what the judge has ruled on, violating an explicit request not to harvest. What is funny is that register.com doesn't have a robots.txt file, so does that give people permission to spider the site?

    the AC
    Maybe this was just a simple hack of /., but I'd also believe that Hemos just pasted the wrong link into the story from one of dozens of open browser windows, and didn't really double check before posting. Haven't we all done that at some point :-)
  • SPAM uses extra CPU cycles, in some cases spam causes users to go over quota (is that a DOS attack against my users?) Is SPAM outlawed, is anyone REALLY doing anything about it?

    When Canter and Siegel first pulled their little stunt, it was a bunch of geeks spread around the world that got upset. Who cares about that? Since then, spam moved to e-mail and more and more people are starting to use the internet: Judges, politicians etc. As the volume of spam increases and these people get more affected, we're starting to see rumblings of legislation against it. I am sure that one day, spam will finally be outlawed in one country then other countries will begin to follow suit.

    Rich

  • "Secondly, robot.txt is often a server level setup file. If you get some
    free space with the likes of AOL/Freeserve/Geocities you have no control
    over the indexing of your site. Additionally, some (albeit poor) ISPs
    don't offer configuration of this file."

    If you can't control your ISP's robots.txt, in the header of each page
    put:

    <META NAME="ROBOTS" CONTENT="NONE">

    I have 70MB of pages on my ISP and robots were costing me a bundle
    till I put this in. Now all major robots ignore it and only a few
    oddball wiseguys occasionally download the entire site (even though I
    offer them a compressed version of the entire site at my ftp site)
  • This is good in the sense that another company should not peruse a site to gather contact information for marketing purposes. I've always thought these kinds of practices were dishonest to say the least. I think this kind of behavior should be curtailed.

    On the other side of the coin. This is bad, because of this:

    "If I don't like your linking to my site, or searching my site, even though it is open to the public, and I say, 'Stop,' you have to stop . . . whether you are actually hurting me or not."

    Crawlers and Links shouldn't be penalized. It's a way of finding useful information (as opposed to finding new business... ...A way of getting contact information of people who probably don't want to be contacted, anyway.)

    I also understand the reasoning behind the robots.txt file. If the information being gathered by a crawler will be outdated (in the case of eBay auctions) then it's a good way to selectively remove portions of your site for searching, because it's not appropriate.

    ---

  • ...moderate this story down as -1, Troll?
  • It seems to me that it all boils down to, If your visiting the store and think your going to buy, go into the door. Otherwise just walk to the next store.

    And if your someone whom writes down the prices, it's ok, as long as you do it politely and do not get in the way of my customers.

    Just think about it. If I had a site that get /. every 1 hr. what would i do? I'd ask the offending parties to please reduce the amount of hit's. if that doen't work then I'd block them. if they go around my moves then I will have to think they are out to do my some sort of harm ( hey bandwidth is expensive) to me and my system, and I'm forced to take legal action. I could take the law into my own hands ( doss attack ) but that would make me legally wrong.

    Best case I think is that we set up a robot tos page . this page will charge the bot indexing company a certain rate for reading the web site. That robot tos page will be mentioned in robots.txt as a non-indexable page. Therefor if a rouge robot get's into the system I have an acountable party to bill for abuse of my system.

    any further ideas ?

    I also think i could catch alot of spammers this way

    michael
  • by segmond ( 34052 ) on Friday January 12, 2001 @09:46AM (#511836)
    ... if they can't check the link, will they even read the article?

  • i was in a meeting then lunch, so i missed the hooters link, but the fact that slashdot changed it like nothing happened without an update is wrong and scary! i mean, this reminds me so much of "1984" where the history is rewritten as time goes by.

  • The link is now correct, but no acknowledgement of their monumental error.

    At least admit you fucked up, instead of pulling the old switchero and pretending you actually read this stuff first...
  • Seems to me they're deliberately avoiding the point of the whole issue, they being the courts as well as the paper.

    The point of the issue as I see it (contradictions and argumentation more than welcome!) is that the information regarding which domain names are up for grabs should be public knowledge, otherwise it becomes restricted data that can only be used by Register.Com for marketing purposes, creating a form of "lock in" effect which enables them to aggressively market their own customers whilst preventing others from doing so in as targeted a way. Is this anti-competitive?

    The argument about the cost to the company of the use of their server, and these numbers hovering around 2%, are irrelevant - if they want to remove the load, all they have to do is provide a table with the expiry dates of the registrations, which is the information people like verio want.

    Should this information be private? Are these registration companies entitled to use their status to hoard this information and therefore erode the competitiveness of the market they inhabit.

    I don't know the answer to these questions, but it pisses me off that the judge in question has dodged the issue.

  • But the Standard needs to be expanded, I think, to cover categories of 'bots or categories of files.

    For instance, I don't let image-surfing software suck up my bandwidth looking for copyright violations or whatever. The images I have are either original to me, or part of a snapshot archive of newsgroup regulars and events, and there's positively no benefit to me in return for bandwidth hogging.

    Now, the bulk of the graphics are in particular directories that are blocked in robots.txt, but you ought also be able to say "no *.jpg" or "no *.cgi" or whatever. Or "no image indexing bots," or whatever.

  • by RussRoss ( 74155 ) on Friday January 12, 2001 @05:31AM (#511841) Homepage
    It's worth noting that search engines honor the robots.txt protocol, so any web site can easily opt out of being indexed. There isn't anything like that in WHOIS. If I remember right, ebay lists its auction items as off-limits for bots in robots.txt. I see that as the strongest distinction between search engines and the cases mentioned here.

    - Russ
  • You might want to fix that "http://www.rheinfire.de/pamster.htm" link in the article. "HOOTERS"
  • Requests from particular hosts can be blocked by setting Apache configuration files. So if one host really takes more resources of other host, than this one wants, it is up to him to restrict or farther deliver those resources. Restricting can also easily be done by firewall rules or even simple scripts which block the IP address if it behaves too paranoid. It is up to webmaster/system operator to make decisions about that and NO complainings should be done.
  • by Ralph Wiggam ( 22354 ) on Friday January 12, 2001 @05:32AM (#511844) Homepage
    Who would go to Hooters in Amsterdam? "Well, I can go smoke the best weed on Earth, go see a live lesbian sex show, boink two prostitutes at once....or I can go see chicks in small shorts and eat chicken wings." And who is the dorky guy in the corner of the bottom picture? Hemos?

    -B
  • Not meaning to address the competition issue, I do think that the use of bots is a logical evolution for internet users/consumers generally. The argument that I quoted is directed towards the legitimacy of bots, not whether they were used by a competitor. And, besides, I think that a consumer should have the right to compare prices. What is a consumer if s(he can't do that fundamental consumer thing? And using modern tools doesn't change the principles. If the bot uses so many cycles as to degrade performance, then maybe it's not a bot, but something else.
  • Does this mean that we're going to need to sign a license agreement before initiating a TCP/IP connection with a machine we don't own?
  • by TheKodiak ( 79167 ) on Friday January 12, 2001 @05:32AM (#511847) Homepage
    For those of you not afraid of goatse.cx, http://www.nytimes.com/2001/01/12/technology/12CYB ERLAW.html [nytimes.com]
  • The technical alternative is to put fair queueing in web servers. Requests from recently seen IP addresses should go behind requests from new ones. This will also provide resistance against denial-of-service attacks where attackers are pounding on a site making dummy requests. With this in place, robots won't be a problem; their requests will be processed after less intensive queries.

    Load-sharing boxes for server farms ought to have this feature. And it should go into Apache.

  • by zorg77 ( 215228 ) on Friday January 12, 2001 @05:31AM (#511849) Homepage
    http://www.nytimes.com/2001/01/12/technology/12CYB ERLAW.html
  • by American AC in Paris ( 230456 ) on Friday January 12, 2001 @05:31AM (#511850) Homepage
    Wow. Slashdot got mega-trolled this time.

    I did notice, however, that the required registration at the "New York Times" [rheinfire.de] was not free...

    information wants to be expensive...nothing is so valuable as the right information at the right time.

  • The link in the story has obviously been corrected (it pointed to http://www.rheinfire.de/pamster.htm), so you might find this an interesting read:

    From an interview by german computer and technology magazine c't with Rob Malda (aka CmdrTaco) (translation of translation follows):

    c't: Haben Sie ein Problem damit, etwas zu korrigieren?

    Malda: Das hängt von der Geschichte ab. Wenn ich einen Absatz geschrieben habe, der einen Fehler enthält und es Hunderte von Kommentaren über diesen Fehler gibt, würde eine Fehlerbereinigung diese Kommentare sinnlos machen. Die Nutzer-Kommentare sind wirklich wichtig. Wenn man sie nicht liest, ist es so, als würde man sich nur den Anfang eines New-York-Times-Artikels anschauen, aber nicht auf Seite 8 umblättern, um ihn weiterzulesen.

    Translation:

    c't: Do you think that correcting things is problematic?

    Malda: That depends on the story. When I have written a paragraph with an error and there are hundreds of comments about this error, correcting it would make all these comments inane. User comments are really important. Not reading them is like glancing at the beginning of a New York Times article and then not turning the pages over to page 8 to read the rest of it.

  • ...that we will get a 'terms of usage' front page for every web server you connect to? That is the only way that you could say that someone can legally restrict how you use their systems' resources.

    'Cause a Bot doesn't use up any more bandwidth or system time than I do when I keep hitting 'refresh' to see something update, or bids change.

    They can't accuse the bot of doing anything wrong without accusing every user of the page. Unless they have a terms of use page first.

    -Steve
  • then they should be BLOCKED by search engines. They can either accept that the internet is designed to be searched, or eist off in thier own dark little hole.
  • Or at least you can tell them you don't want them to index certain directories...
  • It could potentially give more business to Register.com, especially if they prominently link near the results of a search. Just about the same as a search engine (except more resources taken up.)
  • I mean, it's a lot like walking into a competitor's office, pulling open his file drawers , compiling a list of names and numbers of customers and then going back to your office to call them up and try to take them away.

    If this was not Register.com's WHOIS service that was being used, then I would consider it a little more like a company that makes photocopiers looking in a public phonebook for big businesses, calling them up and saying "Hey, we'd like to do business with you and we'll beat whatever your current photocopier service is charging you".
    ---
    seumas.com

  • by UncleOzzy ( 158525 ) on Friday January 12, 2001 @05:59AM (#511857)

    Oh my god! The Hooters girls were bots all along? I feel so dirty!

    (And yes, the Hooters girls do use up resources on the target system, if you get my drift)

  • It is just plain scary. And to think my story about this [rubberhose.org] site got rejected. Wow. This is bad kids very bad
  • by macdaddy ( 38372 ) on Friday January 12, 2001 @06:02AM (#511859) Homepage Journal
    ...can anyone translate the text on that site for me? Oh hell who cares. The pictures are color!

    --

  • So the argument is that if I getting information off of your non-password protected, completely open web-server and use that information, that is fine. However, if I write a program for a business to find the information, it is wrong but it may not be illegal because the program uses to much resources. Now, if I write a program that intentionally tries to max out your site (e.g. Yahoo) performance it is wrong and considered hacking and the FBI will try to put me in jail.

    What is the difference? Intent; having enough lawyers.

    The (costly) answer might be to write software that determines the load a user places on a server and block that IP for X time if the load is deemed excessive. Eventually, you could get to the point where you could make the bot look like the load created by a user, then what do you do if that replicate that bot over 1000 IP addresses. The (real) answer: if you offer anything for free, you have to depend on the user to not take advantage of that free service. After all if people acted like businesses there would never be any change in the leave a penny/take a penny cups.

    Corporations are inherently evil since their only goal is to make money. The only salvation is to hope that good people run them.
  • Well, there's one difference between the situation described in the summary and your analogy. Verio is not a customer. Phrase the question this way: just because a retail store is publically open to customers, does that give a competing store the right to come in and take copies of all the name/address/telephone info from the sales slips so they can advertise to the customers? I don't think so. That's what Verio's doing, though, and IMHO Register.com's entirely within their rights to kick Verio out.

    Although the article linked to in the summary doesn't seem to have anything to do with the situation described. Someone trolling, maybe?

  • by onion2k ( 203094 ) on Friday January 12, 2001 @06:28AM (#511862) Homepage
    First point, search engine crawlers only honour the robot.txt protocol if they're told to do so. A 'dishonourable' search engine could simply ignore the file and index everything in its path. Already the main engines are boasting 500million+ indexes, its only a matter of time before they start resorting to underhand tactics to boost their numbers.

    Secondly, robot.txt is often a server level setup file. If you get some free space with the likes of AOL/Freeserve/Geocities you have no control over the indexing of your site. Additionally, some (albeit poor) ISPs don't offer configuration of this file. Whether it is the fault of the ISP, the search engine, or you, for the crawling of your site would be a matter for further debate.

    Onion
  • by American AC in Paris ( 230456 ) on Friday January 12, 2001 @06:30AM (#511863) Homepage
    Who would go to Hooters in Amsterdam?

    ...well, it suddenly occured to me that I would go to Hooters in Amsterdam. At least, that's what my employer would think if they ever decided to check the proxy server logs. While they're fairly cool about web browsing in general, they are decidedly less cool about employees looking at "objectionable material" at work. I guess I'll need to institute a policy of proofreading Slashdot's front-page content for them, to check for things like goatse.cx links...

    I'm really, really glad that the submitter didn't slip a really objectionable link in there. I'm also really, really pissed off at Slashdot for this kind of crap. This is total incompetence. (I'm not even taking into account the duplicate stories [slashdot.org] on the Chinese rocket lanunch [slashdot.org] in the Science section...)

    This kind of fsck-up at virtually any other major online content provider would be grounds for immediate dismissal for the employee in question, for crying out loud. READ YOUR DAMNED FRONT PAGE SUBMISSIONS!

    information wants to be expensive...nothing is so valuable as the right information at the right time.

  • by Jerf ( 17166 ) on Friday January 12, 2001 @06:31AM (#511864) Journal
    I do not 100% agree with these rulings, but don't fall into the trap of 100% disagreeing with it either.

    "Everybody must be allowed to access web resources" is a statement from the POV of the accessors. Consider that statement from the point of view of the server managers: "We must allow everybody to access our resources in any way they choose."

    Do you really want to make that statement? If you put up a public resource, must you allow people to abuse it if they wish? Or can you take actions to stop such abuse, esp. as it nearly always does real, if not always a lot of, damage. In the case of Bidder's Edge vs. eBay, eBay was suffering real slow-down of service, which affects its bottom line. Must eBay allow it?

    Perhaps the real danger is not so much the rulings per se, but the legal doctrines being used to make them: "Under the reasoning in the Register.com case, "you don't have to prove harm or show any evidence of harm," he said. "Harm will be presumed." He said that he fears the Register.com case will "spread like Kudzu" through the court system."

    At any rate, just recognize that things are somewhat more complicated then they may seem at first. It's tempting to oversimplify in either direction, but the truth is probably complicated.

  • Oh no, baby... are you trying to tell me that all along you've been fembots [austinpowers.com]? But that's just not groovy, baby!

    (Translation: Click the link, Hemos. Or even just hold your mouse over it to see where it goes.)

    --
  • As my subject says, I was called this week regarding the recent registration of a domain through dotster (whose records can also be seen through the register.com WHOIS search).

    So Mr. Verio calls me up, asking me about my hosting and design needs, and is actually not so bad for a salesman. He sends me some info from their website and agrees to follow up on Monday with me if I'm interested about using their services.

    Now that I've read this article, I don't know what to think/do yet. It looks like Verio provides a level of service which would be pretty affordable for my domain needs, but I also don't want to perpetuate bad business practices by giving them my business. I'll have to prepare some thoughts on how he got my name and information and grill the poor guy next time he calls.
  • and yet is making decisions based on it. From the article
    How much harm? That's the key question. In the eBay case, which Judge Jones relied upon, eBay offered evidence that the burden on its computer servers from Bidder's Edge's web crawler represented between 1.11 percent and 1.53 percent of the total load. However slight, that degree of interference was harm enough, the judge said, because eBay in effect was prevented from using that portion of its personal property for its own use.

    This is 1.11% of the "total load", meaning 1.11% of the CPU usage, not 1.11% of the CPU. Nothing here argues that the machines were fully loaded. Had they been, then yes, eBay was prevented from using that load. But it's doubtful that eBay had a 100% load all of the time, so it's doubtful they were using those resources.

  • I don't suppose the judge took into account that this creates a level of after-the-fact information control?

    The simple fact is that every single interaction between any two computer systems requires resources in the way of memory, processor time, network bandwidth and sometimes disk access. Now the judge seems to indicate that if you go over my service as it was designed, and retrieve information, and then use that information in a way I don't like, I can forbid you from using my service.

    That's completely unrealistic. Unless the searching routine is basically stomping the server by requesting as fast as possible, there is no real damage being done that isn't done by a regular cyber-squatter wannabe trolling the database in this case. If there is an issue with the DoS sort of effect, ask the other party to back off, or alter the server software to restrict the rates at which requests are accepted from certain IP addresses or blocks. Better yet, negotiate a new service where the database query is run locally by the whois provider, bundled up, and distributed for fee to anyone interested.

    The thing is, it is public information. This sort of legal adjustment of the reality is foolhardy in the extreme. If I take a quote from an article on a news site, citing the reference properly, and use it as a portion of a work that results in something the originating news site doesn't like, can they forbid me from using the site now?

    The ramifications are a lot further reaching than just bots. It's all a matter of degree.

  • True, robot.txt is optional, but all major search engines _do_ honor it. I attended a talk by some of google's engineers a while back and they have a lot of legal threats early on just because they crawled a site too fast. Most commercial search engines are pretty responsible about crawling because they can't afford the legal threats and bad press otherwise.

    Someone else mentioned a per-page META tag alternative to robot.txt, and I would also point out that the server-level configuration problem is between you and your ISP, as are many other access and content issues. The outside user only really cares (legally at least) about how their browsing/searching behavior interacts with the ISP, and then the ISP has a relationship with the site that is completely seperate.

    - Russ
  • We need a Hemos Quality Seal. After all, there is already the Taco Quality Seal [slashdot.org]

  • Namely, if this cases wins and is upheld, then look next for a lawsuit seeking to shut down the SEC 'FraudCrawler' [usatoday.com] and any other government owned or contracted law enforcement crawlers that come along after it.

    Here's why.

    If Register.com wins, then the courts will have recognized bandwidth as an asset on an equal legal footing to money or physical property. Once bandwidth has gained this legal stature, then suddenly the fourth admendment of the constitution kicks in:

    "The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures , shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized."

    Because unauthorized consumtion of bandwidth would then be legally equivalent to a seizure, the government would have to obtain and serve a search warrant on the owner of any server that they wish to examine the contents of!


    "A microprocessor... is a terrible thing to waste." --

  • Requests from recently seen IP addresses should go behind requests from new ones.
    I like it, I'll get as much of a share of the resource as ALL of the AOL users behind one of the 12 AOL proxies :-)
  • Web sites either grant or deny permission to search engines based on the /robots.txt file. If a web site wants to be indexed, they put permissive rules in robots.txt. Verio is spidering for their own commercial gain, and ignoring a number of posted policies against it. That is apparently what the judge has ruled on, violating an explicit request not to harvest. What is funny is that register.com doesn't have a robots.txt file, so does that give people permission to spider the site?

    robots.txt...it's not just a good idea, it's the law.

    -jon

  • I like it, I'll get as much of a share of the resource as ALL of the AOL users behind one of the 12 AOL proxies.

    One might view that as a feature.

    Maintaining a list of proxies you like might be necessary. (Or not, depending on whether you want visits from AOLers). This is why network routers have weighted fair queuing.

  • Darn cut and paste. One more try: http://www.nytimes.com/2001/01/12/technology/12CYB ERLAW.html [nytimes.com]
  • A search engine should only glance over a site once, index it, and move on to the next site. Did it take up some resources? Yes, but just a little for a short time. But from the sounds of this (since we're lacking a non-Hooters related article), it looks like long term repeated use of system resources. There's a big difference IMO.

    -B
  • I wonder. The last time Slashdot got trolled like this, someone talked about setting up a redirector page and then pulling it in favor of a different page. Did this happen here?
  • You do know that Verio *is* Network Solutions and Register.com is their competitor, right?
  • by tag ( 22464 )
    And the random line at the bottom of this page (for me, anyway) applies:

    We only acknowledge small faults in order to make it appear that we are free from great ones. -- La Rouchefoucauld
  • The link to the bot story is here [nytimes.com]. No free registration is required :)

    Although, the other page was a Hoot!
    ---
    Interested in the Colorado Lottery?
  • To be honest, I'm actually glad this happened. NSI has either been raped so many times or sold my personal information so many times (the later of course) that 80% of the mail I get including bills is for that bogus company name I made up. 80%! Now I hope this doesn't get construed to mean all bots like legit web crawlers but harvesting personal information from websites and databases is the first step in sending unsolicited commercial e-mail and shouldn't be allowed.

    My $.02

    --

  • You could just go to http://www.slashdot.org/search.pl [slashdot.org] to see all of the newest posts...
  • "owing to Verio's robot. In her opinion, Judge Jones said that the harm estimate was 'thoroughly undercut' by Vario in pretrial discovery,"

    Did this bug the living shit out of anyone else? Is there one company, or two? None of my friends who work for Verio have mentioned a sister company called Vario. And it would have been funnier if they had spelled it Varyo.

    Actually, I think the Rhein Fire had a better editor.
  • The ruling itself is good, but it seems like there needs to be a better logic behind future rulings. What happens when a company is affected as Register was but cannot show an appreciable system burden, or at least one that a judge will accept? Using trespassing as part of the arguement makes some sense, as a website or database could be construed as a property, though I wonder whether trespass laws are written to sufficently to cover virtual property? There comes a point when trying to use analogies for the Interent becomes futile. The Internet isn't necessarily like anything else. The Internet is its own thing. Its time we had policymakers that understand that and deal with it appropriately. http://unholyrouter.com
  • The WHOIS database states that running bots through it violates their user agreement. Doing so with webpages, however, is encouraged by most sites, and can be blocked with a robots.txt file at any time.
  • Not that I'm suggesting anything, but if this restriction is a bad thing (TM) for for freedom, how about us non-US citizens start taking an interest in WHOIS spidering.

  • I think what's worse is that the nerds/(l)user ratio is shifting towards the (l)user.
    It's getting longer between the good posts, you have to surf at at least +2.

    --------
  • Yes. So the ruling should have been about the use of this information, not the process used to get them. It's a dangerous precedent (sp?) for other services that use similar practive to retrieve information (the example of search engines).
  • well actually I've gotten about 3 pieces of junk mail from the site since I registered at Register.com about 1 year ago.

    I know its from that source because my name is spelled strangly and they send it to me at the semi-fictious enity that is the "company" my web site is registered to.

  • I know a lot of people here are very anti-regulation, but I think it would be great if case law established that web robots must obey the Robots Exclusion Standard [webcrawler.com]. Since it's a widely-known standard, I think it can be fairly argued that robots that choose to disregard /robots.txt are in danger of tresspassing to chattels [lectlaw.com]. Using the standard also would allow bots to fulfill their helpful role, while providing a clear distinction between what is and what is not acceptable.

    Sure, one might argue that people might be unaware of the standard, but that is seldom an excuse. I may be unaware of fire/electrical codes, but I'll still get in trouble if I don't adhere to them, because I'm putting others at risk and thereby imposing a cost upon society (fire trucks and insurance don't come free). Web crawlers that index data in violation of the Robots Exclusion Standard impose a cost on companies and society just as well, in the end requiring people to by bigger pipes, faster servers, and so on (thereby using more power, dumping more old computer components into landfills and more chipmaking chemicals into the environment).

    My point is that web crawler operators live in a society, just like everyone else, and they too must be held accountable for the consequences of their actions, particularly when they willfully disregard the requests of web site operators as expressed in /robots.txt
  • by signe ( 64498 ) on Friday January 12, 2001 @06:12AM (#511892) Homepage
    This is amusing. Now we get to see who actually clicked the link, and who posted blindly without bothering to read the article.

    -Todd

    ---
  • So, I open this shop, and I sell combs. I figure, who wants a comb, unless they got a nice hairdo. So I decide to give out free haircuts to anyone who comes in, figuring they'll want a comb afterwards, and boom - that's when I'll hit 'em. Alright, so then this principal comes in with his entire elementary school and says, "These young ladies and gentlemen would like their complementary haircut, please." So I give them their haircut, and the guy buys ONE comb. Then he says, "Hey, I just realised - they all need another haircut RIGHT NOW." So I says to the guy, "HEY! YOU'RE TRESSPASSING! GET THE HELL OUT OF MY STORE!"

    And that's how we ended up in court.

    Actually, I live in Texas, so I just shot the fucker.

    (In Texas, if someone is harming your property, or someone else's property, and you feel that the only way to stop them is to respond with deadly force, it is not felony homicide.)
  • "How about not placing a machine on a free, publicly accessible network if you`re not up for letting people use it?"

    Uh... You have just described the planet earth.
  • by Sartian ( 248427 ) on Friday January 12, 2001 @07:56AM (#511895)
    I'm afraid I cannot agree with you there. Bots can and do take up much more bandwidth AT ONCE than users do. I work for a search engine and this is something I know. The spiders (bots) we use at Lycos are very, very, very fast and if we are not careful we can bring an unsuspecting web-server to its KNEES as the spider recursively tries to fetch every document it is allowed as quickly as possible. One way we get around it is that we randomize our website documents list so that a spider isn't devoting all of its cycles to one website. We have been doing this a long time and know how to be "polite" when gathering data.

    Outside of the issue of bandwidth, there is the issue of profitability. Part of many websites income is derived from banner ads. If a bot scours a website to harvest the content, it prevents the end users from seeing some of the advertisements they would have normally have seen by exploring it on their own. One example of where this hurts is a Meta Search engine that trolls several search engines and produces a compiled list of search results. Search engines have millions of dollars invested in hardware, software, bandwidth and staff to make it all work. Every single query has a real monetary cost associated with it. Every free service has its cost. A Meta Search engine bot like that only does about %5 of the total work involved producing the results or content that it displays. Now, a site like that makes its OWN profit from users with minimal money out of its own pocket (of which none goes to the companies doing most of the real hard work).

    I completely understand why companies would get cranky about someone repeatedly grabbing computationally intense data from their site and profiting from it as they suck money and resources away from the provider of said data.

    One way websites are about to track when people visit a site is "tracking gifs". Usually very small 1x1 pixel images that give them a general idea of how many visitors a site is receiving. Reports are generated and they get PAID by advertisers based upon this info. Bots RARELY grab anything but content. If you go to a webpage with images embedded in it, your web browser individually requests (most of the time barring cached data) each image. Since bots don't tend to request this graphical "fluff" intended for hyoomans, owners of the site notice an increase in site traffic and resource drains and a decrease in "ad impressions" when a bot "isn't polite". Yes, I do realize that some make revenue in other ways, but bots can use up resources faster than humans can regardless. With people your traffic usually scales slowly up or down. You can add or remove hardware to deal with the demand. Sometimes when a bot hits your webserver it is such a huge spike in requests for data it kills the server. Then all the legitimate users get cranky. Or, a more minor form is that it makes the s e r v e r sllluuuugggggiiish.

    Anywhoo... Thats my $1.25 on the matter. Cheers. - Sartian

  • If I had some modpoints now, this would go down the way of all off-topic posts... ;)
  • My bad - not Hemos'. Flame him if you want for not checking every link in every story - with the volume of submissions what they are I can't say's I blame any of the good folks at /. (Why the bad link? A friend had just ICQd me that he was headed to Amsterdam for a P2P conf. and wanted the name of the place all the 'football' fans go to. Cut and pasted crosswise. I suppose he'll be wondering why he should go to a 'bot lawsuit in the City of Sin... ;-) And for those of you who got their panties in a bunch about the 'unacceptable' or declining quality of /. (ACs, anyone?) - so quit reading it already and run your own. MHO - pretty damn fine job of turning a homebrew blog into a major news source - whynchYOU try it!?
  • by skoda ( 211470 ) on Friday January 12, 2001 @08:04AM (#511898) Homepage
    Windows in a public building are obviously meant to be looked through. However, if you stood long enough, gazing through the windows of a local store, they could have you removed if there are "no loitering" laws. Even though you are using the sidewalk for standing and the windows for looking, as they were meant to be.

    Similarly, if you worked at one store and went to your competitor's, pen and paper in hand, and strolled the aisles noting their prices (so your store can meet/beat them), you might be asked to leave. Despite the fact that you are just writing down prices that are clearly there to be read.

    Finally, various retailers, esp. car dealers, place "No wholesaler or retailer" restrictions on their best sales, even though their products are meant to be bought and other retailers may want to do just that.

    It seems to me that analogous laws already exist. Just because something is available in the public realm it doesn't follow that anyone can avail themselves of it to any extent; at least not under current U.S. laws.
    -----
    D. Fischer
  • Best read in full here:

    http://channel.nytimes.com/2001/01/12/technology/1 2CYBERLAW.html [nytimes.com]

    Although here is this interesting bit from the middle of the article:

    The harm in the New York case was arguably less than that demonstrated in the eBay case. Register.com offered evidence that its computer system's resources were diminished by about 2.3 percent owing to Verio's robot. In her opinion, Judge Jones said that the harm estimate was "thoroughly undercut" by Vario in pretrial discovery, however. She added that Register.com's evidence of harm was "imprecise." Nevertheless, Judge Jones concluded that Verio's search robot occupied "some" of Register.com's system capacity. And because some unmeasured portion of Register.com's computer property was not available to the company, that was harm enough, she said -- especially when combined with the eBay notion that the failure to issue an injunction risks a pile-on from other robots in the wings.

    Michael A. Jacobs, a lawyer for Vario, said in an interview that Judge Jones, in effect, said that a showing of present harm was no longer a necessary requirement for trespassing on a computer web site. "In eBay, they showed a 1 or 2 percent" crunch on eBay's system capacity, he said. In his case, even though Register.com's allegation of 2 percent "blew up," he added, "the judge found a sufficient risk of harm. It was literally unprecedented."

    William F. Patry, a lawyer for Register.com, said that it was reasonable for Judge Jones to rule that a sufficient showing of harm was made. "If your property is the computer system and somebody comes in and uses it in a way that violates the terms of use," he asserted, the owner can say, " 'Wait, you're using my system under conditions that I said you couldn't.' "

    It may be true that the particular use "won't crash my system," Patry said, but any use of the personal property diminishes the owner's rights to a degree. He added that it was not "so crazy" for the court to rule that Register.com's computer response time might be significantly slowed if other companies began to use robots to hammer on Register.com's database.

  • "Question: would search engines be different, presumably because they also confer a benefit on the target by making it findable?)"

    This ruling is obviously rediculous; who hosts a website that can't handle 1 extra connection?

    But just to play devil's advocate, search engines are slightly different, since you can always specify in Robots.txt which robots, (none even) can access your site, and what they can access (effectively controlling their time on your site).

    I would like to see a law against robots which do not adhere to robots.txt though...

    Oh wait, this is the internet and it shouldn't be online if it's not meant to be accessed... sorry, I forget sometimes...

  • It is somewhat similar to why a single person dubbing a cd to tape but I can't copy a ton of cds and resell them. I can browse for prices as a shopper but I can't suck all their prices off their web site to always make my prices 5% lower. I am really worried about this direction of cyber law. As technology gives the average joe the ability to do everything on a large the scale, the "mass" part of the legality test is broken. Hence, the Metallica Naptster battle. Although the people are not profiting from it, the number of the copies ditributed pushes the courts to Metallica's side. I worry about the quite nontechnical courts ruling on matters of technology that will effect the US for years to come. I don't mind corporations squabbling in court, as is the case with register and verio. I do believe that access to a site by a specific entity should be throttled so as to not starve off the regular consumer. I do believe that breaking those limits should be considered illegal. What I don't want to see is the corporate world coming after the regular consumer who is not making money off the situation. The fact is, if music was sold for a modest profit and music companies were pickier about who they try to turn into super stars they could make a lot of money while charging much less money. Technology is forcing the music industry to improve its product and provide real value for the consumer. I see the likes of Napster and the free music sharing phenom. to just be another example of how competition provides better value to the consumers, as it is suppossed to. This form of competition should not be stiffled just because it may cut into the profits of current companies. The law should not be meant to protect the status quo but protect the regular citizens.
  • Requests from recently seen IP addresses should go behind requests from new ones.

    In what way is that fair? In fact, it would be fairer to give regular/frequent/recent users higher priority because they're the ones doing whatever it is you want them to do (like, viewing banner ads, if you're /., and completing transactions if you're a brokerage).

    Load-sharing boxes for server farms ought to have this feature. And it should go into Apache.

    There are better solutions [cisco.com] already in the marketplace for allocating resources for network services.

  • by PollMastah ( 174649 ) on Friday January 12, 2001 @05:45AM (#511910) Homepage

    This only proves that Slashdot really doesn't care to check a story before posting it. I don't see what's so hard about clicking on a link to see if it works? Or to see if it goes somewhere sensible? I mean, we're not even talking about checking facts here or anything. Why is it that something so basic as checking URLs seems no longer relevent to the Slashdot editors?! What are they doing now???

    Sorry for this rant. I hate the downward trend of Slashdot recently. This wrong link almost made me give up Slashdot forever... there are better, less crowded, less trolled places around that I think I'll move to.

Beware of Programmers who carry screwdrivers. -- Leonard Brandwein

Working...