Slashdot is powered by your submissions, so send in your scoop

Web Copyright Crackdown On the Way 224

Posted by kdawson on Friday March 05, 2010 @10:38AM from the eighty-percent-rule dept.

Hugh Pickens writes "Journalist Alan D. Mutter reports on his blog 'Reflections of a Newsosaur' that a coalition of traditional and digital publishers is launching the first-ever concerted crackdown on copyright pirates on the Web. Initially targeting violators who use large numbers of intact articles, the first offending sites to be targeted will be those using 80% or more of copyrighted stories more than 10 times per month. In the first stage of a multi-step process, online publishers identified by Silicon Valley startup Attributor will be sent a letter informing them of the violations and urging them to enter into license agreements with the publishers whose content appears on their sites. In the second stage Attributor will ask hosting services to take down pirate sites. 'We are not going after past damages' from sites running unauthorized content says Jim Pitkow, the chief executive of Attributor. The emphasis, Pitkow says is 'to engage with publishers to bring them into compliance' by getting them to agree to pay license fees to copyright holders in the future. Offshore sites will not be immune from the crackdown: almost all of them depend on banner ads served by US-based services, and the DMCA requires the ad service to act against any violator. Attributor says it can interdict the revenue lifeline at any offending site in the world." One possible weakness in Attributor's business plan, unless they intend to violate the robots.txt convention: they find violators by crawling the Web.

This discussion has been archived. No new comments can be posted.

Web Copyright Crackdown On the Way

Load All Comments

Search 224 Comments Log In/Create an Account

Comments Filter:

Robots.txt (Score:3, Insightful)

by Jaysyn ( 203771 ) writes: on Friday March 05, 2010 @10:39AM (#31370740) Homepage Journal

I'm sure these guys have no compunction against ignoring robots.txt if it makes them money by doing so.

Share
twitter facebook
- Re:Robots.txt (Score:5, Insightful)
  
  by yincrash ( 854885 ) writes: on Friday March 05, 2010 @10:43AM (#31370790)
  
  Seriously. Following robots.txt is not law, only convention. I'm sure it doesn't take much to convince themselves to ignore it. Money, "doing the right thing", etc. If you view the copyright infringers as pirates, then why should Attributor follow their wishes?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by HungryHobo ( 1314109 ) writes:
    
    First on the chopping block:
    Slashdot for it's copy-pasted copies of linked blogs with copy-pasted copies of magazine articles copy-pasted directly from press releases.
    - The 80 percent mark (Score:3, Insightful)
      
      by tepples ( 727027 ) writes:
      
      Slashdot for it's copy-pasted copies
      News publishers using Attributor probably won't attack Slashdot for excerpting one paragraph from a ten-paragraph story any time soon. From the summary:
      the first offending sites to be targeted will be those using 80% or more of copyrighted stories
      - Re: (Score:2)
        
        by HungryHobo ( 1314109 ) writes:
        
        I'm fairly sure they quote the entirety of very small articles every now and then.
        more than a few times a month? absolutely!
        I'm curious if they're going to start hitting forums when people do they "hey look at this guys" quote of a news article.
        It could really hurt a lot of free forums.
        
        Re: (Score:2)
        
        by Jaysyn ( 203771 ) writes:
        
        A lot of forums require credentials to view & have systems in place to keep automated accounts from being generated.
    - Re: (Score:2)
      
      by Jaysyn ( 203771 ) writes:
      
      That would be interesting. While Google has a lot more money, Geeknet would be a much softer target.
    - Re: (Score:3, Funny)
      
      by clone53421 ( 1310749 ) writes:
      
      More work for the /. editors! Horror!~
  - Re:Robots.txt (Score:4, Interesting)
    
    by Registered Coward v2 ( 447531 ) writes: on Friday March 05, 2010 @11:49AM (#31371642)
    
    Seriously. Following robots.txt is not law, only convention. I'm sure it doesn't take much to convince themselves to ignore it. Money, "doing the right thing", etc. If you view the copyright infringers as pirates, then why should Attributor follow their wishes?
    I'd go even farther to say that sites that use robot.txt to eliminate crawling are probably not major targets - if they don't show up in search engine sthen tehy probably don't generate enough traffic to be worth the effort. Sites that are high traffic are much better targets - their revenue stream form ads is prbabaly significant enough that they don't want to risk losing it. Once enough fall into line they can worry about the ones that are not indexed - in fact they may just want to kill them off to preserve traffic to licensed sites.
    
    Parent Share
    twitter facebook
  - Re: (Score:3, Informative)
    
    by Crudely_Indecent ( 739699 ) writes:
    
    Anyone interested in finding out what's really going on with a website would look at robots.txt first and ask themselves 'now why do they want the robots to avoid these pages?'
    Of course, some of those entries will be dead-ends (dynamic pages that make no sense to crawl, password protected pages that would detract from a sites rankings, etc...).
    What's going to be interesting is what happens when their method is identified and/or the IP addresses they're using to make those identifications. There is no way t
  - - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:2)
        
        by omnichad ( 1198475 ) writes:
        
        And what's the real difference in lost performance between a hit for robots.txt returning a 404 and a hit for robots.txt returning 200 and a few hundred bytes of text?
        
        Either your site is public or its not. And robots.txt is such a simple standard, why is it so hard to do? You don't even have to write your own.
    - Re: (Score:2, Insightful)
      
      by DaTroof ( 678806 ) writes:
      
      I can see ways that their service could be effective while respecting robots.txt settings. They'd simply need to crawl the indexes of other search engines. After all, if a violator is not accessible through Google or Bing, it's probably a low priority.
      - Re: (Score:2)
        
        by gbjbaanb ( 229885 ) writes:
        
        They'd simply need to crawl the indexes of other search engines
        after purchasing a licence to use the search engine's data, naturally :)
        
        Re: (Score:2, Informative)
        
        by DaTroof ( 678806 ) writes:
        
        after purchasing a licence to use the search engine's data, naturally :)
        Depending on the search engine and its terms of service, they might not even need to purchase a license. Google, Bing, and Yahoo all provide search APIs for third-party software.
  - - Re: (Score:3, Informative)
      
      by mea37 ( 1201159 ) writes:
      
      Really?
      Do you also believe that ToS violations constitute unauthorized access to a computer? That approach was tried recently by the U.S. prosecutors [cnet.com]. Ultimately the court didn't buy that position.
      So... why would robots.txt, which advises me of your wishes but to which I never actually agree, carry any more legal authority than a ToS document to which I do supposedly agree as a condition of using your system?
  - - Re: (Score:2)
      
      by clone53421 ( 1310749 ) writes:
      
      robots.txt is an opt-out. If it isn’t present, you can crawl.
      Furthermore, you are in no way legally obligated to even check robots.txt before crawling a site. It’s merely a standard of politeness to do so.
  - - Re: (Score:2)
      
      by clone53421 ( 1310749 ) writes:
      
      Wrong. robots.txt asks you to not index certain pages. It does not give permission to index the rest of the pages.
      Permission to read the pages is implicit in the fact that you’re serving them freely to whoever or whatever makes an HTTP request for them.
- Re:Robots.txt (Score:5, Insightful)
  
  by notgm ( 1069012 ) writes: on Friday March 05, 2010 @10:43AM (#31370798)
  
  is there some written law that holds people to following robots.txt? if not, how is it even possible to call it a weakness?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by HungryHobo ( 1314109 ) writes:
    
    nah, it's just considered bad manners.
    - Re: (Score:2)
      
      by poetmatt ( 793785 ) writes:
      
      I'm quite sure that people who own websites that their robots.txt is being ignored by a crawler are going to express that in quite hostile ways, in comparison.
  - Re: (Score:2, Interesting)
    
    by Joe U ( 443617 ) writes:
    
    If they are going to extend the DMCA to other countries, then let's extend computer trespassing laws to cover robots.txt violations.
    I'm being somewhat serious (but not super-serious). If courts want to hold that a website TOS is binding, then isn't the robots.txt binding as well?
    - Re: (Score:2)
      
      by LordAndrewSama ( 1216602 ) writes:
      
      if a websites TOS is binding, why not just put what's in the robots.txt file in the TOS in legalese, or just state in the TOS that the robots.txt file must be obeyed or whatever?
      - Re: (Score:3, Interesting)
        
        by Joe U ( 443617 ) writes:
        
        That's the point I was trying to make. I posted this somewhere else:
        http://blog.internetcases.com/2010/01/05/browsewrap-website-terms-and-conditions-enforceable/ [internetcases.com]
        So now you can turn around and sue them for crawling your site if you specifically disallow it in the terms and robots.txt.
        The results should be interesting to watch.
        
        Re: (Score:3, Insightful)
        
        by nacturation ( 646836 ) * writes:
        
        Right... because a judge will find that offer, consideration, and acceptance of a contract took place between a webserver and a bot? The court case you cite is irrelevant to an automated program that has no understanding and cannot accept conditions presented online.
        
        Re:Robots.txt (Score:4, Insightful)
        
        by Joe U ( 443617 ) writes: on Friday March 05, 2010 @01:59PM (#31373260) Homepage Journal
        
        Right... because a judge will find that offer, consideration, and acceptance of a contract took place between a webserver and a bot? The court case you cite is irrelevant to an automated program that has no understanding and cannot accept conditions presented online.
        Awesome, so anyone can DoS a server, send mass spam or distribute a virus as long as a bot does it, because a judge will rule that the bot acted on its own and wasn't developed or set loose by anyone at all.
        If the software wrote itself you might have a point, otherwise the people who wrote it are the ones responsible for how it acts.
        
        Parent Share
        twitter facebook
    - Re: (Score:2)
      
      by clone53421 ( 1310749 ) writes:
      
      If a robot passes the Turing test, does it have to check robots.txt before it crawls the website?
      If I manually crawl through all the pages on their site and bookmark all the links, am I a robot?
      Such difficult questions... how on earth would we legislate something?
  - Re: (Score:2)
    
    by Xest ( 935314 ) writes:
    
    Because if they use a robot, you can just identify it and feed it shit.
    It wont be long before people know the details of their crawlers and can just serve them something random.
- Re: (Score:2)
  
  by MtHuurne ( 602934 ) writes:
  
  If the infringing sites have a robots.txt that tells all crawlers to skip them, they will not show up in search engines. If they single out Attributor's crawler's user agent string, they would look very suspicious.
  - Re: (Score:2)
    
    by petermgreen ( 876956 ) writes:
    
    What if they only allow known crawlers from major search engines?
  - Re: (Score:2)
    
    by rnturn ( 11092 ) writes:
    
    That's only if their web crawler even looks at robots.txt. It's not required, only a courtesy. I'm sure they'll not be so courteous and claim that they need to do this because the violators they're looking for would block them anyway.
    The sure fire way to keep them out would be to find out what is IP address Attributor is using and block that at your firewall. The trouble with that is they could easily change their IP address or even employ something akin to a botnet to do their web crawling so that thei
- Re: (Score:2)
  
  by poetmatt ( 793785 ) writes:
  
  don't worry. They're going to break a lot of laws, break a lot of legs, and basically commit suicide. At least when it's through we'll have less dinosaur industries to deal with.
  They're literally planning to go to domain providers and threaten DMCA to get content taken down. Instead of, you know, DMCA'ing the website appropriately this is an end run around the legal process. Expect a quick smackdown. Why they would host such a company in California of all places to do this, where cali is the most clear abou
- Business Plan (Score:2)
  
  by DaveAtFraud ( 460127 ) writes:
  
  1) Put up a file sharing site with lots of music and movie files.
  2) Craft a robots.txt to keep out the RIAA and MPAA.
  ...
  Profit!!!
  
  Robots.txt is a convention that was never intended to restrict checking for illegal content. The idea behind robots.txt is only to keep site indexers such as Google, Yahoo, etc. out of certain directories.
  
  Cheers,
  Dave
DMCA.. (Score:3, Interesting)

by ltning ( 143862 ) writes: <ltning@@@anduin...net> on Friday March 05, 2010 @10:42AM (#31370774) Homepage

What on earth is the DMCA supposed to achieve, in the context of Ad-providers?
Sounds pretty scary to me.

Share
twitter facebook
- Re: (Score:2)
  
  by Yvan256 ( 722131 ) writes:
  
  In this case they're referring to the Downloadable Media Computer Advertising.
- Re: (Score:2)
  
  by julesh ( 229690 ) writes:
  
  What on earth is the DMCA supposed to achieve, in the context of Ad-providers?
  Sounds pretty scary to me.
  Agreed. I've never heard of this, and a quick scan of the legislation doesn't turn up anything that appears to relate to this; the categories of service it regulates appear to be (a) telecoms providers transmitting data at user request, (b) those hosting temporary copies of content (e.g. caches), (c) those hosting content at the request of third parties, and (d) search engines, directories and other link
Lessoned learned from RIAA (Score:5, Insightful)

by KnownIssues ( 1612961 ) writes: on Friday March 05, 2010 @10:54AM (#31370934)

Sounds like they've learned their lesson from the RIAA. I'm not saying I agree with them and think they are right to do this. But, if you're going to try to enforce your interpretation of the law, this is at least a sane philosophy of doing so. Not going after damages is a smart move.

Share
twitter facebook
Will that ultimately include slashdot? (Score:5, Interesting)

by elrous0 ( 869638 ) * writes: on Friday March 05, 2010 @10:54AM (#31370936)

A lot of aggregator sites like this one base a lot of their topical content on articles printed elsewhere. While most (incl. /.) don't print whole articles intact, a lot of them do quote heavily (what used to be called "fair use," back when that phrase actually meant anything). So their first step is to go after the sites that reprint the articles whole-cloth. But will they stop there?

Share
twitter facebook
- Re: (Score:2)
  
  by c-reus ( 852386 ) writes:
  
  Initially targeting violators who use large numbers of intact articles
  (emphasis mine)
  No, they will not stop there.
- Re: (Score:3, Funny)
  
  by Yvan256 ( 722131 ) writes:
  
  And will Slashdot be targeted again and again? (you know... all the dupes)
- Re:Will that ultimately include slashdot? (Score:4, Insightful)
  
  by MtHuurne ( 602934 ) writes: on Friday March 05, 2010 @11:30AM (#31371392) Homepage
  
  Unless an article is very short, quoting 80% of it is not fair use. So for now, I think they have every right to take steps against sites making money from their content without compensation.
  Yes, I am cynical enough to expect the reasonable 80% limit to be lowered over time until it reaches unreasonable levels. But let's hold the flames until they have actually crossed that line.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by elrous0 ( 869638 ) * writes:
    
    By the time they actually cross that last line, I suspect it will be too late.
    - Re: (Score:2)
      
      by MtHuurne ( 602934 ) writes:
      
      In my opinion preemptive protests against valid copyright enforcement only weaken the argument against copyright abuse.
  - Re: (Score:2)
    
    by cpt kangarooski ( 3773 ) writes:
    
    Unless an article is very short, quoting 80% of it is not fair use.
    Well, that depends on the circumstances. The amount of the work used, and the substantiality of the portion of the work used is a factor in determing if the use is fair, but there isn't a hard number.
- Re: (Score:3, Insightful)
  
  by Abcd1234 ( 188840 ) writes:
  
  Since when did Slashdot ever use 80% of an article verbatim?
  Sorry, no, any website doing *that* should be shut down. I hate those assholes. They're the reason why a search for a given term in Google pops up thousands of sites with the *exact same content*, just ripped from one another.
  - Re: (Score:2)
    
    by elrous0 ( 869638 ) * writes:
    
    Yes, but 80% is where they're *starting*. I'm asking if that's where they're going to *end* it.
    - Re: (Score:2)
      
      by Abcd1234 ( 188840 ) writes:
      
      Well, given the fair use doctrine still exists, there will always be a lower bound at which their legal actions will no longer have any basis.
    - Re:Will that ultimately include slashdot? (Score:4, Insightful)
      
      by natehoy ( 1608657 ) writes: on Friday March 05, 2010 @12:56PM (#31372466) Journal
      
      80% is a reasonable starting point. If they start lowering it, we'll have to express our righteous indignation then. Fair use, when interpreted, is generally considered a LOT lower than routinely cutting-and-pasting 80% of articles, so they have a long way to lower it before we can honestly call our indignation righteous.
      Seriously, this really isn't a "slippery slope" situation. It seems to be a well-thought-out and sane set of guidelines. If anything, they are being a bit generous for now, and they can still tighten this quite a bit without coming close to busting "fair use" or even "reasonable use".
      Basically they are saying, "if you routinely use 80%+ of our articles as your own content, we're asking you to stop. We won't sue you for any past uses, we just want to make it clear that this isn't cool any more."
      A fair usage (not the lack of quotes, I am not talking about a legal doctrine) would be to use about 20% of the source article (properly attributed) with a link back to the original article. Give credit where it's due (and cite your sources). Then add your own thoughts, or don't. But don't take whole-cloth articles and post them on your own site with your own ads.
      Every discussion board I've ever participated in has pretty much recommended some really close variant to this anyway. It usually reads something like "cite a paragraph or two at most and have a link to the source article plainly visible nearby".
      
      Parent Share
      twitter facebook
Offshore sites WILL be immune (Score:2, Interesting)

by unity100 ( 970058 ) writes:

all this harrassment is going to do will be to push the global small internet publishers to services in other countries. Datacenters, Ad services in u.s. will lose customers. There are already strong companies servicing in those areas in Eu. Eu will be happy to receive that amount of business.
the stupor of american corporatism is overwhelming. they can even go to the extent of shooting themselves in the foot.
- Re: (Score:2)
  
  by Jaysyn ( 203771 ) writes:
  
  All we can do is sit back & watch the fireworks.
- Re: Offshore sites WILL be immune (Score:4, Insightful)
  
  by Sockatume ( 732728 ) writes: on Friday March 05, 2010 @11:03AM (#31371066)
  
  Are you kidding? ACTA's going to harmonise everything so closely to the US that they'll be able to prosecute anyone.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by cpghost ( 719344 ) writes:
    
    Yes, but ACTA is not the whole world.
    - Re: (Score:2)
      
      by daveime ( 1253762 ) writes:
      
      When did that fact ever stop the US doing whatever the hell it wants ?
    - Re: (Score:2)
      
      by bickerdyke ( 670000 ) writes:
      
      No. Just the part of the world that's intrested in doing any buissness with any other part of the world.
      Yes, It may not be North Korea.
      - Re: (Score:3, Informative)
        
        by cpghost ( 719344 ) writes:
        
        According to this [wikipedia.org], only Australia, Canada, USA, EU, Japan, South Korea, Mexico, Morocco, New Zealand, Singapore and Switzerland are currently part of that treaty. This (currently) leaves more than enough room for a whole lot of other countries (some of them as big as Russia and China) that are not part of it.
  - well (Score:2)
    
    by unity100 ( 970058 ) writes:
    
    i dont think that france, germany, spain, scandinavian countries and rest of the eu will just sit and accept u.s. as dominator of the world information.
  - Re: (Score:2)
    
    by julesh ( 229690 ) writes:
    
    Are you kidding? ACTA's going to harmonise everything so closely to the US that they'll be able to prosecute anyone.
    If you think Vanuatu et al are going to be signing up to ACTA, then I want some of what you're smoking.
    Sure, most of the large economies will probably be signing, but there's no reason not to base an Internet business on a little island somewhere nice with friendly laws (and, as a nice side benefit, zero taxation).
- Ad networks geotarget their ads (Score:2)
  
  by tepples ( 727027 ) writes:
  
  As I understand it, advertisers targeting readers in the United States tend to choose ad networks that operate or at least have some sort of assets in the United States, not ad networks that operate in the European Union. Advertisers who target readers in the European Union probably will not want to pay to reach readers in the United States, especially for a product not available in the United States.
  - Re: (Score:2)
    
    by arthurpaliden ( 939626 ) writes:
    
    So .. when the ad is placed the customer selects the target country / region. Using IP addresses takes care of the rest. Yes there will be some missdirection but on the whole it works.
    - Re: (Score:2)
      
      by tepples ( 727027 ) writes:
      
      So .. when the ad is placed the customer selects the target country / region.
      So I take it you're imagining an EU based ad network that deals with advertisers in foreign markets. But how would such an ad network efficiently deal with US advertisers while having zero assets in the US or in any other country with a takedown system remotely like that of the US?
      - Re: (Score:2)
        
        by arthurpaliden ( 939626 ) writes:
        
        You log on to the advertisers site and submit your ad along with your credit card number. And you do it from a country that is not subject to US or US like control. Remember with the internet you do not need a brick and mortar or even a flesh and blood presence anyway to do business.
Please do so (Score:5, Insightful)

by OzPeter ( 195038 ) writes: on Friday March 05, 2010 @10:59AM (#31370994)

And in the process take down all those inane blogs whose sole purpose is to scrape and repost articles so they get an advertising hit.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by Anonymous Coward writes:
  
  While they're at it, can they take down forum/mailinglist mirrors too?
  It is extremely annoying when searching to find that the top 30 results all contain the exact same forum or blog post.
- Re:Please do so (Score:5, Insightful)
  
  by garcia ( 6573 ) writes: on Friday March 05, 2010 @11:22AM (#31371292)
  
  And in the process find all the commercial sites using my copyrighted Flickr photos for their own purposes without my permission or payment. I'm tired of sending invoices and dealing with companies who tell you that your photo wasn't worth the $300 you charge and instead send you $50 thinking that it will clear up the matter.
  I love the hypocrisy of all of this. They are just as much at fault as any of those aggregation blogs. They just have more money to be a pain in the ass.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Interesting)
    
    by clone53421 ( 1310749 ) writes:
    
    I'm tired of sending invoices and dealing with companies who tell you that your photo wasn't worth the $300 you charge and instead send you $50 thinking that it will clear up the matter.
    They’re basically giving you the finger. Don’t fuck around playing their little games... show them you mean business. Slap on a surcharge to cover your additional expense and send their name and remaining balance to a debt collector. It’s probably cheaper and less of a hassle than suing them in small claims court.
    IANAL... you may want to ask a real lawyer what your options are, but seems to me you have a few.
    - Re: (Score:2)
      
      by garcia ( 6573 ) writes:
      
      It's easier to make an ass out of them on the Internet. Twitter is an effective tool (especially with Google indexing it in real time) in the fight against these assholes.
      I eventually did get paid by a newspaper (include late charges) after three months. I have not been so successful with other businesses using my images in their marketing materials w/o my permission.
      Oh and debt collection (when it's $300) isn't worth my time--neither is small claims.
      - Re: (Score:2)
        
        by clone53421 ( 1310749 ) writes:
        
        Eh, I’ve been sent to debt collection for $50 doctor visits that the bill for got lost...
        I mean, after the initial reaction (seriously?), I called them up and paid it. But yeah... seriously?
        In any case I’d think $300 is significant enough to justify going after them with some heavier ammo than just bad rep on Twitter.
Not So Good for the Economy (Score:2)

by lobiusmoop ( 305328 ) writes:

"Offshore sites will not be immune from the crackdown: almost all of them depend on banner ads served by US-based services, and the DMCA requires the ad service to act against any violator. "
Not sure this is such a great idea - when you're broke you don't starve off the little income you're still getting... I'm inclined to think that in the near future, things will more likely go in the opposite direction, grey-legal stuff will be fully legalized to provide some as much extra economic stimulus as possible.
the article, for your convenience (Score:5, Funny)

by mdemonic ( 988470 ) writes: on Friday March 05, 2010 @11:05AM (#31371088)

A coalition of traditional and digital publishers this month will launch the first-ever concerted crackdown on copyright pirates on the web, initially targeting violators who use large numbers of intact articles.
Details of the crackdown were provided by Jim Pitkow, the chief executive of Attributor, a Silicon Valley start-up that has been selected as the agent for several publishers who want to be compensated by websites that are using their content without paying licensing fees.
In a telephone interview yesterday, Pitkow declined to identify the individual publishers in his coalition, but said they include “about a dozen” organizations representing wire services, traditional print publishers and “top-tier blog networks.”
The first offending sites to be targeted will be those using 80% or more of copyrighted stories more than 10 times per month.
In the first stage of a multi-step process aimed at encouraging copyright compliance instead of punishing scofflaws, Pitkow said online publishers identified by his company will be sent a letter informing them of the violations and urging them to enter into license agreements with the publishers whose content appears on their sites.
If copyright pirates refuse to pay, Attributor will request the major search engines to remove offending pages from search results and will ask banner services to stop serving ads to pages containing unauthorized content. The search engines and ad services are required to immediately honor such requests by the federal Digital Millennium Copyright Act (DMCA).
If the above efforts fail, Attributor will ask hosting services to take down pirate sites. Because hosting services face legal liability under the DCMA if they do not comply, they will act quickly, said Pitkow.
“We are not going after past damages” from sites running unauthorized content said Pitkow. The emphasis, he said is “to engage with publishers to bring them into compliance” by getting them to agree to pay license fees to copyright holders in the future.
License fees, which are set by each of the individual organizations producing content, may range from token sums for a small publisher to several hundred dollars for yearlong rights to a piece from a major publisher, said Pitkow.
Attributor identifies copyright violators by scraping the web to find copyrighted content on unauthorized sites. A team of investigators will contact violators in an effort to bring them into compliance or, alternatively, begin taking action under DMCA.
click the link to read the last 21%

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by noidentity ( 188756 ) writes:
  
  Click the link to read the first 21%
  The first offending sites to be targeted will be those using 80% or more of copyrighted stories more than 10 times per month.
  In the first stage of a multi-step process aimed at encouraging copyright compliance instead of punishing scofflaws, Pitkow said online publishers identified by his company will be sent a letter informing them of the violations and urging them to enter into license agreements with the publishers whose content appears on their sites.
  If copyright pira
- - Re: (Score:2)
    
    by clone53421 ( 1310749 ) writes:
    
    Depends on who cooperates and to what extent.
    First, Slashdot (dear Slashdot, please delete the comment); second, him personally (dear Slashdot, please give us his IP); and finally, Slashdot’s ISP (dear ISP of Slashdot, Slashdot isn’t cooperating with us, please shut down their domain).
  - Re:the article, for your convenience (Score:4, Interesting)
    
    by natehoy ( 1608657 ) writes: on Friday March 05, 2010 @01:13PM (#31372656) Journal
    
    No one.
    He posted the article, cited it as the original article (knowing there was a proper citation link above), and posted less than 80% of it. This is a completely legitimate use of the article as per Attributor's new rules. Two or three more words from the article would have made it an "80% rule" bust, but would still have been OK as long as he didn't make a habit of it. It's repeated use of more than 80% of source article text that Attributor wants to go after.
    Most discussion boards already limit direct citation to a paragraph or two, or approximately 20% of the article.
    So Attributor's 80% limit is making a clear statement that they are really only interested in pursuing people who make a routine habit of copying entire articles. And if the bulk of your content is coming from copying 100% of someone else's original news articles, you aren't exactly someone I want to waste my righteous indignation defending.
    
    Parent Share
    twitter facebook
so instead- google or other cache (Score:2)

by way2trivial ( 601132 ) writes:

easy enough to search google cache and bypass the robots.txt problem....
heck.. they SHOULD proclaim the spider name-- drum up a lot of informaiton
and focus on sites that mention it in robots.txt to check from other sources
- Re: (Score:2)
  
  by clone53421 ( 1310749 ) writes:
  
  Google cache doesn’t index pages that the robots.txt told it not to crawl...
Robots.txt is irrrelevant (Score:2)

by NevarMore ( 248971 ) writes:

If a site posts articles yet has them excluded by robots.txt doesn't that defeat the purpose of posting the article where it can be indexed and found?
In other words if an article is posted, but robots.txt says to not index it, that article isn't going to show up in a search. Its a bit like rebroadcasting an NFL game in a movie theatre with no one in the theatre to watch it.
my experience with Attributor (Score:5, Informative)

by bcrowell ( 177657 ) writes: on Friday March 05, 2010 @11:47AM (#31371610) Homepage

I've had an experience with Attributor myself, and it's given me a pretty low opinion of them. I'm the author of a CC-BY-SA-licensed calculus textbook, titled "Calculus." Someone posted a copy of the pdf on Scribd, as allowed by the license. So one day I got an email from one of the people who runs Scribd, saying that Attributor had sent them a takedown notice, which they were skeptical about. Attributor hadn't supplied any useful information about what they thought was a violation. I called Scribd, and they checked and said it was a mistake -- they were working for Macmillan, which publishes another book titled "Calculus." So here they were, serving a DMCA notice under penality of perjury, and they hadn't even checked whether the name of the author was the same, or whether any of the text was the same. Their bot just found that the title, "Calculus," was the same as the title of one of their client's books. Pretty scummy.

Share
twitter facebook
- Re:my experience with Attributor (Score:4, Informative)
  
  by bcrowell ( 177657 ) writes: on Friday March 05, 2010 @11:59AM (#31371784) Homepage
  
  Oops, important correction to the parent post: "I called Attributor, and they checked and said it was a mistake -- they were working for Macmillan..."
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by jfengel ( 409917 ) writes:
  
  It sounds like this sort of thing wouldn't happen under their new tactic, which actually does compare the text rather than just the title.
  Pretty stupid of them to have sent a takedown notice based on nothing more than the title.
  - Re: (Score:2)
    
    by clone53421 ( 1310749 ) writes:
    
    Does it OCR the pages itself, if the PDF hasn’t already been OCRd?
- Re: (Score:3, Interesting)
  
  by Hatta ( 162192 ) writes:
  
  So, did you press charges?
- - Re: (Score:3, Insightful)
    
    by coofercat ( 719737 ) writes:
    
    It's one thing to make a mistake, and entirely another to invoke the law to enforce a mistake. You're right, it's entirely possible the takedown was poorly written, but therein lies the problem with the takedown mechanism - there's no standard that it must reach before it can be served. Thus mistakes, honest or otherwise threaten people with very real, very wide-ranging and scary/expensive actions - completely in error. As such, as reasonable people, we expect anyone taking action as serious as a takedown w
Web Crackdown Full Stop (Score:3, Insightful)

by ObsessiveMathsFreak ( 773371 ) writes: <obsessivemathsfreak@noSpAM.eircom.net> on Friday March 05, 2010 @12:18PM (#31372014) Homepage Journal

It's not just copyright. The slow but steady alignment of copyright holders, oppressive governments, legal changes, media pressure and surveillance technology has wound itself around the internet worldwide, and now the real pressure is being applied. This is a secular change, largely unobservable over smaller intervals, but the end result is that the web in 10 and 20 years time will be a noticeably less free place than it is today. Everything you do online will be monitored, everything will be logged, everything will be legally defined and controlled, and every infringement will be subject to criminal penalties.
The parties responsible have the support of the politicians, the censors, the press, the money men and most of the public. We used to have the support of the geeks and their creativity in bypassing censorship. But let's face it; geeks have not created a truly disruptive technology since BitTorrent almost ten years ago. While Geekdom slept, the likes of Cisco and the major Telcos have constructed a frightening array of technologies for surveillance and control of the internet, and the fruit of their efforts can be seen in China, Iran and now even countries like Australia. Soon it will be seen all over the world.
The Web has changed. Governments are no longer going to tolerate the freedom and anarchy that it grants to the population at large. They now have the means, method and opportunity to put this genie back in the bottle. This crackdown is the first offensive on what is going to be a wide front. Expect the free net to lose.

Share
twitter facebook
- Re: (Score:2)
  
  by night_flyer ( 453866 ) writes:
  
  10 to 20 YEARS? I think you are being optimistic...
I hope their algorithm can keep up (Score:3, Interesting)

by aarenz ( 1009365 ) writes: on Friday March 05, 2010 @12:28PM (#31372158)

I suspect that many sites that are using this type of content will find ways of hiding that fact by using non-display characters, breaking the article into multiple pages and the like to cover the fact that they are using the content. Would love to see their system in action on some test sites to figure out how much you need to do to cover the content and make it not match the original.

Share
twitter facebook
- Re:i'm a little clueless here (Score:5, Insightful)
  
  by Tim C ( 15259 ) writes: on Friday March 05, 2010 @10:53AM (#31370920)
  
  This one [robotstxt.org].
  On the other hand, that's an utterly asinine comment to have made (the one you quote, not yours). Of course they'll ignore it, why on Earth wouldn't they? It is in no way binding, and robots are free to ignore it, just as site owners are free to block connections from specific incoming IP addresses, the owners of those IPs are free to switch to new ones, and so on, ad infinitum.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Interesting)
    
    by Joe U ( 443617 ) writes:
    
    Ok, here's an argument.
    http://blog.internetcases.com/2010/01/05/browsewrap-website-terms-and-conditions-enforceable/ [internetcases.com]
    So, the terms of use of a website are binding, at least according to this court. If the terms spell out mandatory following of robots.txt, is robots.txt now binding?
    - Re: (Score:2)
      
      by Tim C ( 15259 ) writes:
      
      I think the key there is the visibility of the terms:
      But in this case the court found the terms and conditions (including the forum selection clause) to be enforceable. In contrast to Specht, the ServiceMagic site did give immediately visible notice of the existence of the terms of the agreement.
      If I write a robot to crawl a site looking for certain keywords (e.g. Metallica), I will not necessarily ever have had visibility of those terms.
      - Re: (Score:2)
        
        by twidarkling ( 1537077 ) writes:
        
        Except that the recent RIAA case ruling that you don't need to have actually seen a copyright notice in order to be bound by it, due to the ubiquity of the notice, ToS are similarly ubiquitous, so you should be bound by that as well, seeing it or not.
    - - Re: (Score:2)
        
        by clone53421 ( 1310749 ) writes:
        
        No, but the person who deployed the robot can implicitly do so by its actions.
        Same argument used by the guy who had his cat click the EULA confirmations, and same flaw. He’s still liable.
- Re:i'm a little clueless here (Score:5, Interesting)
  
  by KingSkippus ( 799657 ) writes: on Friday March 05, 2010 @11:01AM (#31371030) Homepage Journal
  
  The Robots exclusion standard [wikipedia.org]. Not that it will stop them; as others have pointed out, if they think they're "doing the right thing," I'm sure they will not be concerned about such a standard.
  The worry here really isn't so much for the people who are hosting sites with infringing content. I'm sure a moral argument could be made that Attributor is well within the right to disregard the wishes of those who are breaking copyright law. However, I run several sites that have no infringing content whatsoever, sites with things that have content that, while not private, I don't particularly want spiders crawling. I'm not so naive to think that they don't do it anyway; I have server logs proving that they do. However, in this case, we have a company that is claiming to be legitimate completely ignoring my--someone who is not infringing--wishes and doing it.
  Put another way, by convention, my neighbors don't use binoculars to peer into my house windows to see what I'm doing although there's currently not really anything stopping them from doing so. Even though I don't particularly have anything to hide, if I find that they are violating our polite social contract, then I'll put up shades just because it's none of their damn business.
  I don't think that the robots.txt convention will be the thing that stops Attributor. I think that it will be that it won't take long for web site authors to figure out what user agents, IP address, etc. that Attributor is using and will block access from Attributor to their sites. Like I said, I have no infringing content on my sites, but if Attributor is going to ignore me politely asking their robots not to scan my sites, then I'm fully in the right to take further steps to forcibly prevent them from doing so.
  
  Parent Share
  twitter facebook
  - in other words (Score:2)
    
    by circletimessquare ( 444983 ) writes:
    
    this is the beginning of an arms race
  - Re: (Score:2)
    
    by wjousts ( 1529427 ) writes:
    
    Put another way, by convention, my neighbors don't use binoculars to peer into my house windows to see what I'm doing although there's currently not really anything stopping them from doing so.
    Curtains?
  - Re:i'm a little clueless here (Score:4, Informative)
    
    by fuzzyfuzzyfungus ( 1223518 ) writes: on Friday March 05, 2010 @11:17AM (#31371232) Journal
    
    Since, as you say, robots.txt will likely do nothing against them, the bigger question becomes "how do they plan to do their crawling?". Crawling from a well defined IP block, using software with user agent Attributor_copy_cop, will be laughably simple to block or present false noninfringing content to.
    
    Spoofing the UA strings and(if necessary) some of the behavior of common web browsers is a simple software problem, so I assume that they'll do that(unless they are terminally incompetent). Out of curiosity, though, does anybody know how easy and cheap it would be (using legitimate methods not botnet style stuff) for such a commercial entity to obtain a reasonably large number of, ideally "residential looking", IPs that change fairly often? Do you just call verizon and say "I want 500 residential DSL lines brought out to so-and-so location"? Would you obtain the services of one of the sleazy datacenter operators who caters to spammers and the like and knows how to switch IP blocks frequently? Do you pay to have second lines installed at your employee's houses, with company scanner boxes attached?
    
    Parent Share
    twitter facebook
    - Re:i'm a little clueless here (Score:4, Interesting)
      
      by ASBands ( 1087159 ) writes: on Friday March 05, 2010 @01:21PM (#31372774) Homepage
      
      One idea would be to use the many available cloud services like EC2, Google App Engine and Azure. The IP blocks those services come in are going to remain fairly regular, but they are so common that it might not be acceptable for a site to block everything from ghs.l.google.com (and whatever EC2 and Azure live on). It is still blockable, though, so it probably would have been better for them (from a technical standpoint) if they hadn't announced their existence and these sites had been slowly indexed by their service before anybody knew what was happening.
      Another (better) idea would be to use a service like Tor. Sure, their latency is going to skyrocket, but that's not a big deal since interactivity isn't a primary concern of an indexing service. It's still blockable, if infringing site admins block Tor nodes. This may or may not be doable, as I would imagine many users of said infringing sites use anonymizing networks for their normal traffic.
      Sure, either of the solutions I've come up with in five minutes can be circumvented, but the idea isn't to totally eliminate piracy, its to make it inconvenient enough to make getting the legitimate version easier.
      
      Parent Share
      twitter facebook
  - Binoculars (Score:2)
    
    by Mateo_LeFou ( 859634 ) writes:
    
    Prosser, in both his article and in the Restatement (Second) of Torts at 652A-652I, classifies four basic kinds of privacy rights:
    1. unreasonable intrusion upon the seclusion of another, for example, physical invasion of a person's home (e.g., unwanted entry, looking into windows with binoculars or camera, tapping telephone), searching wallet or purse, repeated and persistent telephone calls, obtaining financial data (e.g., bank balance) without person's consent, etc.
    http://www.rbs2.c [rbs2.com]
    - oh wait.. woops (Score:2)
      
      by Mateo_LeFou ( 859634 ) writes:
      
      pasted too soon
      "Only the second of these four rights is widely accepted in the USA. In addition to these four pure privacy torts, a victim might recover under other torts, such as intentional infliction of emotional distress, assault, or trespass.
      Unreasonable intrusion upon seclusion only applies to secret or surreptitious invasions of privacy. An open and notorious invasion of privacy would be public, not private, and the victim could then chose not to reveal private or confidential information. For exampl
  - Re: (Score:3, Informative)
    
    by yourlord ( 473099 ) writes:
    
    I welcome them to crawl my sites and ignore my robots.txt files. They won't get very far though. When my server detects that behavior it passes the IP to my firewall which adds it to the "drop these packets into a black hole" list.
    I have quite a large table of IP addresses of idiots that violated robots.txt.
- Go back to dial-up? (Score:2)
  
  by tepples ( 727027 ) writes:
  
  Sometimes I really wish we could just go back to the early 90's when big media thought the internet was a joke, we didnt need them then and frankly I usually think we would be better off without them now.
  Home Internet access in the early to mid 1990s was dial-up. Do you want to go back to that?
  - Re: (Score:2)
    
    by grapeape ( 137008 ) writes:
    
    If that was the tradeoff needed to prevent the internet from becoming one big corporately guided tour pay as you go presentation of what they want us to see or do...sure. Actually by the mid 90's I was on ISDN.
  - - Re: (Score:2)
      
      by grapeape ( 137008 ) writes:
      
      That is exactly my point. I wasnt trying to troll, simply pointing out that the internet was supposed to be a great equalizer, most media outlets have no desire to be part of the community they want to be the community and have gone out of their way to shut out anything that even resembles equality online. Linking has traditionally been the way that sites agregate news, many simply use rss summaries provided by the original content, what 80% are they going after? 80% of the rss summary would often mean i

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Robots.txt (Score:3, Insightful)

Re:Robots.txt (Score:5, Insightful)

Re: (Score:2)

The 80 percent mark (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Funny)

Re:Robots.txt (Score:4, Interesting)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re:Robots.txt (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Interesting)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:3, Insightful)

Re:Robots.txt (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Business Plan (Score:2)

DMCA.. (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Lessoned learned from RIAA (Score:5, Insightful)

Will that ultimately include slashdot? (Score:5, Interesting)

Re: (Score:2)

Re: (Score:3, Funny)

Re:Will that ultimately include slashdot? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:Will that ultimately include slashdot? (Score:4, Insightful)

Offshore sites WILL be immune (Score:2, Interesting)

Re: (Score:2)

Re: Offshore sites WILL be immune (Score:4, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

well (Score:2)

Re: (Score:2)

Ad networks geotarget their ads (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Please do so (Score:5, Insightful)

Re: (Score:3, Insightful)

Re:Please do so (Score:5, Insightful)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Not So Good for the Economy (Score:2)

the article, for your convenience (Score:5, Funny)

Re: (Score:3, Insightful)

Re: (Score:2)

Re:the article, for your convenience (Score:4, Interesting)

so instead- google or other cache (Score:2)

Re: (Score:2)

Robots.txt is irrrelevant (Score:2)

my experience with Attributor (Score:5, Informative)

Re:my experience with Attributor (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)