Forgot your password?
typodupeerror
The Media Privacy Your Rights Online

NY Times To Data-Mine Its Visitors 98

Posted by kdawson
from the et-tu-Grey-Lady? dept.
pilsner.urquell points out a story in the Village Voice from a stockholders' meeting at the New York Times. It seems that the media giant is now eager to data-mine visitors to its Web properties. Of course anybody with a site who profits from advertising is likely to be doing something of the sort. It's just a bit surprising that the Times would use the words "data mining" out loud in public. From the article: "Barely a year after their reporters won a Pulitzer prize for exposing data mining of ordinary citizens by a government spy agency, New York Times officials had some exciting news for stockholders last week: The Times company plans to do its own data mining of ordinary citizens, in the name of online profits... [T]he problem with reading papers electronically is that they can also read you."
This discussion has been archived. No new comments can be posted.

NY Times To Data-Mine Its Visitors

Comments Filter:
  • by Anonymous Coward on Thursday May 10, 2007 @09:57AM (#19066675)
    [T]he problem with reading papers electronically is that they can also read you.

    So, how are we supposed to make Soviet Union jokes after this??
    • Re:Obligatory?? (Score:5, Interesting)

      by goombah99 (560566) on Thursday May 10, 2007 @10:01AM (#19066745)
      Wow an insightful pithy first post. I suppose that since I assume all commerical sites, especially free one, are data mining me and selling me out in anyway they can I'm not worried by this. In fact I think it shows a lot of integrity by the NY times to announce their intentions ahead of time as it can only be bad PR.
      • by Intron (870560) on Thursday May 10, 2007 @11:11AM (#19067917)
        "shows a lot of integrity"

        Except they didn't put it in an announcement to the website visitors, they announced to their stockholders that they planned to make more money.

        Anyway, I think everyone visiting the NYT site from now on should do a search for "elephant porn" and we'll see how that affects their advertising budget.
        • Perhaps they didn't find the fact that they'll mine their user data that newsworthy. Personally I was surprised they weren't doing so already. Some people seem to think they have some sort of inherent duty to make a big fuss about it, instead of just posting about it in their financial reports. Whether it goes into the financial reports, or on the website, they're still making it public knowledge, which does show a lot of integrity. Not to mention that they've stated that they might do this for years in the
    • Re: (Score:1, Offtopic)

      by Himring (646324)
      No one ever uses the wow meme anymore. Let's give it a go:

      "The NY Times will melt your face if you read their paper electronically...."

      "Your face, that melt thing, yea, it'll happen at the Times....."

      "0h n0z, sed t3h peeplz, our faces r t3h m3lt!..."

    • I was actually going to shoot for a Soviet America joke, but seeing as the editor stole our thunder I guess not. It's a sad state of affairs when the editors start making stupid jokes in the articles, that's the job of the commentors. Now how am I ever going to earn that +5 funny that in no way contributes to my karma.

      Please slashdot, won't you think of the smart asses?

    • In Soviet Russia, cookies profit financially from the newspaper!
  • Hello Bug Me Not (Score:5, Insightful)

    by fishdan (569872) * on Thursday May 10, 2007 @09:58AM (#19066681) Homepage Journal
    OR some other similar service. When are sites going to learn that we CAN protect out privacy if the force us too. You catch more flies with honey...
    • by iminplaya (723125)
      I'm thinking along the same lines. Two can play at this. Just stuff it full of phony info. Fill 'em up with chaff and let them sort it out. I think it's pretty silly to give them any real info that won't benefit us in any way. Same goes for the MySpace crowd and similar sites. I can't sympathize much if you give out your real name, etc. So, Mr. New York Times, I can be honest, too. The name I give isn't mine. I will log in as the "Ex Presidents". Now, if they want to offer money for logging in, I'll be more
      • by NoTheory (580275) on Thursday May 10, 2007 @10:25AM (#19067093)
        The problem is that this is a functional analysis. Even if they don't have your legitimate contact details, they know what you've been browsing, and if at some point they can attach it to your legitimate contact details, then boom they've got the whole shebang. This is a privacy unfriendly move. It makes it more difficult for you to maintain your anonymity. Services like bug-me-not are insufficient because it requires you to try out multiple contact details, and maintain a list of valid contact details (which can be made all the more difficult of the organization is active in closing these accounts).

        Even if you think people should be more privacy conscious, this is a bad move, that makes everyone less private. The irony of the situation is really the only thing that makes it notable. Stupid NYT.
        • by iminplaya (723125)
          Well, there's always the other solution. Just stay away from them. Unfortunately, gen pop isn't going to care too much. They will just roll over, and the Times will get what they want, I suppose. But it still might be fun to flood them into oblivion. These are the kind of things that legitimize spoofing and other things that drive the IT guys nuts. I think we can have our way with them, and show them who's their daddy.
          • by plover (150551) *
            Not much point. I doubt you'd be able to flood them personally without a lot of outside help (or a zombie bot network), and all I think you'd get from other people is an apathetic "why should I bother?"

            People tried something like this with ad-refreshing pages a few years back, with the thoughts of driving up the ad costs of "evil" companies. I once tried letting one of those go in a background window, and all I did was clog the hell out of my network connection. I lost interest after about three minute

            • by iminplaya (723125)
              Yeah, you're right, but I didn't mean flood in that fashion. That would be uncouth. I do have a throw away email account for these things, so I use it to register. I never have used that "bug me not" thing. My one junk account works for all of them, since it is functional. It just doesn't have any real data except for the email address itself.
              • by plover (150551) *
                Oh, I see.

                Regarding BugMeNot, it's brilliant. As far as easy to use, I have the Firefox extension. [roachfiend.com] It reduces using BugMeNot to a simple a right click on the username field (or whatever) and select "Login with BugMeNot." It automatically fills in the username and password, and submits the signon request. It saves me the time of going through the signup process with a null or fake email address, and it saves even more time and trouble when the signup process requires email confirmation.

                If I'm wanting

        • Well I seriously doubt they will be data mining *across* valid logins. bug me not is a blessed thing. And their own zeal to kill each new bug me not login, only causes more and more people to switch to a new login as it's created.

          So my history and those of other technical people are probably pretty safe from mining anything useful.

          The downside is the vast majority of people don't know about this stuff and will be mined relentlessly.


        • I use the NYTimes.com login for Ken, one of my best friends ever. Not only were his details accurate (Ken was a marketing guy), but his username and password were the same for like... everything.

          Ken also passed away over six years ago. His login still works. Someday I'd like to see what happens when they finally put marketing data together for him.
          • by plover (150551) *

            Ken also passed away over six years ago. His login still works. Someday I'd like to see what happens when they finally put marketing data together for him.

            This is what I overheard at that meeting: "Who knew that dead guys clicked on more Toyota SUV ads than anything else? Let's try posting some of those above-the-urinal ads in the casket lids, see if we can get the count up for the Subaru market, too."

    • "You catch more flies with honey..." Actually, you catch more flies (customers) when you know what they want to buy, and how they shop or navigate your site. And you can only know that by watching what they do. This isn't new. When you go to someone else's property -- whether that be a grocery store, mall, bank, etc, there are cameras on you. Some for security. But some are to see how people shop, select, and buy. The same applies when you go to someone's server. This data isn't only useful for maki
    • You catch more flies with honey...

      That's also true here for a different reason:

      I already know that everybody with something to advertise wants to know absolutely everything about my potential consumption habits. That's just life. The fact that the NYT is willing to speak plainly about their data mining goals and methods is something I admire in a company...especially in a newspaper.
    • by jambarama (784670)
      NYTimes specifically allows non-logged in access to its articles through google news. So in this instance you don't even have to use bugmenot. Just search for the article on google news. I always appreciate it when people post google news links to NYT articles.

      Because of this, with the NYT, you can even disable/block cookies from their site entirely. The problem, other than this minor inconvenience, is that many other sites don't allow this. Since mirroring content like this is a no-go, sometimes the o
    • OR some other similar service. When are sites going to learn that we CAN protect out privacy if the force us too. You catch more flies with honey...
      I did use bugmenot until very recently. I came to the realization that it was just silly, at least for me. Revealing the articles I like is a more-than-reasonable price to pay for the news and information on the Times site, which does cost them considerable time and money to maintain.
  • by maxwell demon (590494) on Thursday May 10, 2007 @10:00AM (#19066725) Journal
    "[T]he problem with reading papers electronically is that they can also read you."
    Wow, a Soviet Russia joke directly in the summary!
  • by rodney dill (631059) on Thursday May 10, 2007 @10:01AM (#19066731) Journal
    I'm not sure why there is such a concern over data mining. As long as the mining is done from public sources then I see no problem. If the mining is from medical records, government records that are sealed or presumed to be private, or some other protected database then is becomes an issue.
    • by superbus1929 (1069292) on Thursday May 10, 2007 @10:22AM (#19067045) Homepage
      Where does it stop? Once you get comfortable with data mining, will you also have to get comfortable with more than just your IP attached? Will you be comfortable with someone having a full consumer database of John Doe, instead of just 10.10.10.220? Will you be comfortable with your profile being viewable to everyone that wants it? Will you be comfortable being positively unable to get away from Capitalism even for a second?

      I'm not trying to put on a tin foil hat by any means; if it was just "hey, so many people like Coke over Pepsi!", I'd be cool. But anything further than that, and I view it as a slippery slope.
      • by noidentity (188756) on Thursday May 10, 2007 @10:46AM (#19067457)
        There's a fundamental difference between a company doing demographics, and the government spying on citizens. The company doesn't care about any person in particular, just common trends, and simply changes how they design/market their products. At worst, it means they can more effectively sell you junk you don't need. The government's use of data is pretty much the opposite.
        • First, the companies are interested not only in common trends. They are probably not interested in the fact that your name is John Doe, but they are surely interested that the current viewer has bought lots of computer hardware (thus you might be able to sell some new stuff to him), and shows mainly interest to Linux (don't waste advertising space for Windows products). They are probably interested if the current visitor tends to spend lots of money on single items (show more ads for single, expensive thing
          • See, my whole view is this: I don't WANT people knowing what I bought. So therefore, I should have the choice not to participate. Companies making it mandatory as a condition for viewing their site? Nope, I have a problem with that, and will not participate, either by blocking those particular cookies, or just not going to the website.

            If I want to buy something? I'll Google it on my time. I don't need help from Doubleclick or Falkag.
          • "[...] but they are surely interested that the current viewer has bought lots of computer hardware (thus you might be able to sell some new stuff to him), and shows mainly interest to Linux (don't waste advertising space for Windows products)."

            "They" in this case refers to the web session on their server. This isn't a person examining your stats, just a machine. The things the marketing people generally look at are aggregates, not individuals. Though I guess you do have a point that they must be storing you
      • I think you are confusing data collection with data mining. Mining is just perusing available resources and making correlations, associations, and posssibly conclusions that others do not necessarily come up with. Data Collection would be where you actively seek to get the data from. If you are collecting information that is supposedly private, such as financial, medical, etc.. (without the consent of the individual) then the concerns arise. Data mining is just use of an existing set(s) of data. The marke
    • Re: (Score:2, Informative)

      by Anonymous Coward

      Stealing cookies?

      Web bugs?

      Script injection/invisible framing?

      And the easiest - seeing which site you came from before you hit their server.

      If you think "data mining" is going to stop at "this IP read these pages at this date," you're a sucker.

    • I agree. You access their information for free and they collect data about you. Big deal.
    • Re: (Score:3, Interesting)

      by krbvroc1 (725200)
      To quote the RIAA, think of it as stealing. Basically, in addition to already viewing advertisements, websites want to steal my Intellectual Property. See, there is a value placed on my data by the market, and that data is being collected and securitized without compensating me and in many cases without my permission.

      Each of us own the Intellectual Property in our heads. Like the RIAA, we need to stick together and demand either payment or permission for this information.
    • by h2g2bob (948006)
      I agree with this - there's a BIG difference between the government spying on you and a newspaper spying on you.

      Government: spies based on your whole life. Consequences include arrest, etc.

      Newspaper: spies based on what stories you read. Consequences include tailored adverts, etc.
  • by LMacG (118321) on Thursday May 10, 2007 @10:04AM (#19066773) Journal
    I have a login for the NYT. According to the information I provided, I'm a female born in 1901, living in ZIP code 90210.

    (For the record, at least one of those data points is incorrect).
  • That's fine, I pretend not to be the Googlebot... Thus getting in without having to register. When I have to register, I of course fake my information.

    Doesn't everyone do that?
  • by harks (534599) on Thursday May 10, 2007 @10:08AM (#19066845)

    Data mining, she told the crowd, would be used "to determine hidden patterns of uses to our website."
    So they used a scary phrase, but there isn't anything nefarious about noticing that people who read articles on subject X might want to see a link to article Y.
  • No news (Score:4, Insightful)

    by VincenzoRomano (881055) on Thursday May 10, 2007 @10:10AM (#19066873) Homepage Journal
    Almost all websites do it!
    This is a reason why cookies [wikipedia.org] are used and why almost all browsers provide mechansms to filter them out!
    • Even if you disable cookies, its trivial to pass a session id through the url to maintain a user's authenticated session. They'd still be able to determine which/what article you were reading and provide 'similar' links etc. Not to mention that most cookies are used to track and maintain user logins and server sessions, not to data mine... NYT is saying that they're explicitly going "to determine hidden patterns of uses to our website." using data mining, this isn't about Cookies, its about the tracking and
      • Almost everyone use cookies. And in the NYT website I don't see such IDs in the URLs!
        • by haibijon (893019)
          Really? It'd be nice if you read the article and checked your 'facts'. The article statest that NYT is GOING to be data mining, not that they've begun already. And just to check on the session handling I just disabled cookies and tried to view an article on the site, and sure enough the query string contains OQ=_rQ3D5Q26hpQ26orefQ3DsloginQ26orefQ3DsloginQ26 o refQ3DsloginQ26orefQ3Dslogin&REFUSE_COOKIE_ERROR=S HOW_ERROR

          Indicative of both a requirement of cookies (REFUSE_COOKIE_ERROR=SHOW_ERROR) for the u
          • Re: (Score:2, Informative)

            by RetroGeek (206522)
            If you do this, then you only need to change some part of the string to a random value, then hit enter.

            When the page refreshes, then click on the link you want to read.

            Wash, rinse, repeat.

            Sure they are tracking something, but it will not be you.

            There are lots of ways to monkey with this sort of thing.
            • by haibijon (893019)
              Definately true. Just trying to make sure that people realize that simply disabling cookies makes you untrackable, and in many cases cookies are required.
              • by haibijon (893019)
                Edit: The above comment should read: "simply disabling cookies *DOES NOT* makes you untrackable"
    • by Ben Chu (24542)
      Exactly I don't see this as being any different than the methods Google uses to display ads in Gmail or other types of market research. More sensationalism and paranoia.
  • This is one of the interesting catches of life online. In order to purchase full access (as opposed to open registration) for content for NYT online, you must suppy financial data in the form of a credit card. (PayPal not accepted. ) This means that NYT Online is able to match your browsing habits to all of the financial data on file for you.

    Although PayPal does provide some anonymity, it only officially guarantees goods sent to a real world address, thus losing full anonymity for purchases. Purchased credi
  • oblig. (Score:2, Insightful)

    by CrowbarKing (1100015)
    1. Reveal data mining 2. Win Pulitzer prize 3. Start data mining 4. ??? 5. Profit!
  • by paladinwannabe2 (889776) on Thursday May 10, 2007 @10:23AM (#19067071)
    Pretty much every site does data mining- I'm sure /. keeps track of how many people click on ads, read the article (only 2 so far), etc. /. probably even ties all this information to your account, so they have a better idea of what ads to display. I don't even have a problem with any of that. Once they start selling my information to other people is where I have a problem. I don't mind /. targeting me with ads, but I do mind my email address being targeted with spam.
  • I can't believe the NYT still requires people to make up random personal data to read their newspaper. Seriously, has anyone here ever given out real information when registering an account with the NYT? Even without services like bugmenot, the information they'll get from datamining their visitors will be too full of noise to be of any use.

    I just read an article in the economist, which was mostly about Murdoch trying to buy the Dow Jones, which owns the Wall Street Journal. But the economist implied tha
    • Re: (Score:3, Informative)

      by Jonathan (5011)
      I can assure you that "average" people *do* give out accurate information; when I tell my relatives that I generally just give random info, they tend to be shocked and say "But, but, that would be LYING".
  • Between the government, with its vast powers, and a commercial endeavor making a buck from readers reading at no cost. Particularly since option out of the NYT's data mining is as easy as not visiting the site, while staying out of the governments data warehouse would probably take something like being unborn.
    • by MrNaz (730548)
      That's true in theory. Now, welcome to the world where the line between government and the corporate sector has been erased under the boots of a million marching soldiers.
  • FROM: IT Data Mining Project
    TO: Marketing
    RE: VIP!

    Just a quick preliminary result that is too important to wait for the offical report.

    One of our readers, 'Anonymous7' is virtually a demographic by him/herself.

    Uses the internet from all over the world, on thousands of machines, reading our paper hundreds of times a day!

    Surely this person must have a major impact on data processing purchases worldwide.

    Surprising though, the person seems naive, computer security wise, because their password is the same as
  • [T]he problem with reading papers electronically is that they can also read you.
    In Soviet Russia, you read papers? Na
    In Soviet Russia, papers read you? Na
  • by Grashnak (1003791) on Thursday May 10, 2007 @10:50AM (#19067535)
    If they said, "We'll be tailoring our site to the visitors' interests, thereby enhancing their experience", no one would care, but once they say "data-mining", suddenly everyone is screaming "OMFG, the NYT is like the NSA! WTF? Remember the constitution dude!"
  • Dear Former Subscriber:

    Hey -- Mister "I Don't Need The Newspaper Any More, I've Got A Computer" -- remember us? Yep, it's your old pal, the New York Times. The one that you used to welcome to your house every morning before you bought a goddamn modem from goddamn Best Buy. The one you threw in the trash after you got your goddamn flashy high speed connection. Does that refresh your memory?

    Well guess what. We still have something that you can't get at your precious internet: investigative reporting. Did you
  • by anoopjohn (992771) on Thursday May 10, 2007 @11:14AM (#19067957) Homepage
    Recently there was this big debate on slashdot about google's purchase of doubleclick. Why would you care if your usage patterns are tracked by a company - without attaching it to your personal identity - and deliver targeted advertisements. There is no free lunch. You are paying for the free content by selling your usage patterns. They don't want to do it in any other way. You can leave it or take it. Perhaps at some point of time in the future there would be ad-free subscription based content. I doubt, though.

    I run a company and I face the same problem - How to reach the set of people who are most likely to be my customers. The more successfully I can do that, the lower would be my marketing cost, and the cheaper would the product be in the long run. Ultimately if we have a system where each person sees only those ads that he needs to see we would have a highly efficient marketing system with the lowest marketing costs. A reasonably big percentage of the cost of most products you buy are marketing costs. So if you would like them to be cheaper - stop complaining and start selling your usage data.

    There is only one issue here - privacy advocates have to ensure that there is no real breach of privacy in the process. If googlebot sees the mails i see there is no problem, but if googlebot reads my mail and checks against some preset filter and requests Mr X to take a look at my mail then it is a breach of privacy. As long as the identity is kept separate from the patterns there shouldnt be any problem
    • by anubi (640541)
      You have an insightful comment on what it takes to finance "free" content.

      I like most of it except tracking *me*. It unnerves me. Its too much like "stalking".

      It would unnerve me to go into a store, only to have a clerk stalk me all over the mall.

      Given this latest Sony Rootkit Fiasco, and seeing how our Government handles it, in comparison of how they handle other breaches of "rights" aka DMCA, I cannot trust my Government to stick up for what is righteous, as much as I can trust them to stick up for w

  • Doesn't seem too bad, but if it gets ugly I guess we will only start to use browser side scripting to create some random behavior in the background while we are reading. Some random noise would easily drown out our real browsing behavior (oh, he opened all these 10 articles at the same time, wonder which one he actually read?).
  • For sites like the NY Times, I use BugMeNot.com [bugmenot.com] and use someone else's login. After all, isn't recycling a perfectly good login better than getting a new one?
  • I tild them everything they need to know about myself:

    My name is Willie Horton.
    I was born 12 August 1951.
    My address is:
    Maryland House of Corrections Annex
    PO Box 534
    Jessup, MD 20794-0534
    E-mail: willie.horton@mail.com

    They may even realize that I'm the same Willie Horton whose image was used to defeat Mike Dukakis in the 1988 Presidential election... but I guess that's the price of fame.
  • Just another reason to use Bug Me Not.
  • I guess there is no need to point out the hypocracy on this one. I'm not going to hold my breath to wait for the liberal left's excuses as to why it's okay for "US", but not "THEM". This is why I'm so sick of politics.
  • Just don't read the new york times or use a secure web browser that
    doesn't leave any cookies or electronic paper trails. Simple as that.

Please go away.

Working...