Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Google The Internet News

CNET Deletes Thousands of Old Articles To Game Google Search (gizmodo.com) 48

According to Gizmodo, CNET has deleted thousands of old articles over the past few months in a bid to improve its performance in Google Search results. From the report: Archived copies of CNET's author pages show the company deleted small batches of articles prior to the second half of July, but then the pace increased. Thousands of articles disappeared in recent weeks. A CNET representative confirmed that the company was culling stories but declined to share exactly how many it has taken down. The move adds to recent controversies over CNET's editorial strategy, which has included layoffs and experiments with error-riddled articles written by AI chatbots.

"Removing content from our site is not a decision we take lightly. Our teams analyze many data points to determine whether there are pages on CNET that are not currently serving a meaningful audience. This is an industry-wide best practice for large sites like ours that are primarily driven by SEO traffic," said Taylor Canada, CNET's senior director of marketing and communications. "In an ideal world, we would leave all of our content on our site in perpetuity. Unfortunately, we are penalized by the modern internet for leaving all previously published content live on our site."

CNET shared an internal memo about the practice. Removing, redirecting, or refreshing irrelevant or unhelpful URLs "sends a signal to Google that says CNET is fresh, relevant and worthy of being placed higher than our competitors in search results," the document reads. According to the memo about the "content pruning" the company considers a number of factors before it "deprecates" an article, including SEO, the age and length of the story, traffic to the article, and how frequently Google crawls the page. The company says it weighs historical significance and other editorial factors before an article is taken down. When an article is slated for deletion, CNET says it maintains its own copy, and sends the story to the Internet Archive's Wayback Machine. The company also says current staffers whose articles are deprecated will be alerted at least 10 days ahead of time.
What does Google have to say about this? According to the company's Public Liaison for Google Search, Danny Sullivan, Google recommends against the practice. "Are you deleting content from your site because you somehow believe Google doesn't like 'old' content? That's not a thing! Our guidance doesn't encourage this," Sullivan said in a series of tweets.

If a website has an individual page with outdated content, that page "isn't likely to rank well. Removing it might mean, if you have a massive site, that we're better able to crawl other content on the site. But it doesn't mean we go, 'Oh, now the whole site is so much better' because of what happens with an individual page." Sullivan wrote. "Just don't assume that deleting something only because it's old will improve your site's SEO magically."
This discussion has been archived. No new comments can be posted.

CNET Deletes Thousands of Old Articles To Game Google Search

Comments Filter:
  • Snake oil (Score:5, Insightful)

    by Dan East ( 318230 ) on Wednesday August 09, 2023 @07:59PM (#63754880) Journal

    This sounds exactly like the kind of snake oil some marketing or "search engine optimizer" charlatan would push. Sounds good in a boardroom or other places where market-speak makes a bigger impact than sane facts.

    • Reading this: "Removing it might mean, if you have a massive site, that we're better able to crawl other content on the site. But it doesn't mean we go, 'Oh, now the whole site is so much better' "

      Translated into English, that means they can't even get their own story straight. Which is that their algorithms are so old and crufty that they can't even give you a straight answer. And that's kind of the point. Because if they could, you could game it. So the entire algo has turned into this messed up Nash equ

      • Re: Snake oil (Score:5, Insightful)

        by ArmoredDragon ( 3450605 ) on Wednesday August 09, 2023 @08:43PM (#63754938)

        In other words, Google says one thing, but their algorithm does something else. And they wonder why their search engine has become such shit in the last few years. Nevermind that really stupid shit happens often, like Google silently doing things like dropping terms from your search query, or transposing numbers without telling you.

        No joke, earlier today I was doing a search for a specific CVE number on Google, and it literally decided to drop the first digit and then transpose the last two digits despite that I had entered the correct one that I intended to find. Why exactly is anybody's guess, maybe because people search for the other number more? But how does that help me find what I'm looking for? It only does the opposite. And I know it's not because Google doesn't have the number I entered indexed, because surrounding it in quotes did get what I was looking for. And worse, they didn't even do the "did you mean...?" or "showing results for..." crap. Literally no notice, just quietly mangle the query.

        Google is utter shit now. Still slightly better than the alternatives, but still utter shit compared to what it used to be.

        • Hey, but the Levenshtein distance was close! Too bad you were searching for an exact number, instead of searching for one of the 200 misspellings of Britney Spears. That would have worked great. Recently their struggle is how to juggle the different search use cases while trying to inject AI wizardry into the mix, in a desperate effort to remain relevant, where they are bursting apart at the seams. Nobody knows how it works anymore. Not even them.

          • Too bad you were searching for an exact number, instead of searching for one of the 200 misspellings of Britney Spears.

            Generalizing or modding queries into "popular", produces more results. But with huge amounts of content on virtually any subject, that isn't helpful.

            Few users scroll through more than a few pages of results. Simply having more of those, only serves to bury more specific or relevant results.

            Search engines should strive for those specific, less popular, unique / uncommon search terms as provided by users. In your example: maybe user was looking for some other Britney Spears than the singer. Or

        • by AmiMoJo ( 196126 )

          You need to learn how to use a search engine. We had to do it back in the day too, because before Google you had to craft a regular expression of sorts to get any useful results.

          If you surround the number of your CVE with quote marks, it will find that exact number.

          As for their algorithm, it works as described. They scan the front pages of websites more often than they scan deeper links, because the front pages are where all the updates happen. Deleting the old stuff does nothing.

          • You need to learn how to use a search engine. We had to do it back in the day too, because before Google you had to craft a regular expression of sorts to get any useful results.

            If you surround the number of your CVE with quote marks, it will find that exact number.

            You need to learn how to read. If you would have read my post, you would have seen how I specifically mentioned that I had to do that just to get what I needed.

            If you go back to school, you might learn how to read. If you learn how to read, you can avoid making stupid replies like this.

            Besides, the problem is that you even have to do that to begin with. It's stupidly tedious to surround every word with quotes. A decade ago, you only needed quotes if you wanted a specific phrase as AND was ALWAYS implicit in

        • by Calydor ( 739835 )

          Not just Google. Computers in general, and the internet in particular, is heading way too far into "Here, let me think for you, because you're too stupid to do it yourself" territory.

        • > In other words, Google says one thing, but their algorithm does something else.

          That's my feeling as well.

          CNET is not dumb, they have remained at the top of the rankings for years. Maybe they are just making a mistake in believing deleting old links is good, that's certainly possible. But I would tend to give them the benefit of the doubt when it comes to analyzing their own traffic.

        • I work for Google Search (I'm the one quoted in the article) If you use quotes, we won't drop terms. Happy to look into your example if you can recall the query. If you didn't use quotes, we might have transposed because the digits transposed are very similar to some other digit-related query. That's how spelling correct similarly works. When it work as you want, it's great. When we get it wrong, understandably annoying. We try not to get it wrong, so would love to pass the example along.

      • I work for Google; I'm the person who shared this information. The translation isn't correct, because it's confusing indexing (the process of being included for possible ranking) and ranking (whether something actually shows up in response to a query). IE: the story is straight.

        The primary question in all this has been about ranking. Does getting rid of "old" content somehow cause our systems to see a site overall as "fresh" and rank it better. No, it will not. That's what my initial tweet was about: https:

    • by anonymouscoward52236 ( 6163996 ) on Wednesday August 09, 2023 @08:50PM (#63754952)

      No, it makes perfect sense. Google likes to kill off old projects. CNET is sacrificing old articles to the garbage can in an attempt to appease Google's blood lust for killing off stuff. LOL. (Let's hope archive.org keeps them?)

    • The end result would be a site with 'the light is on but nobody home' situation.

    • There is something very amusing about the image of a man standing next a search engine, pointing at search engine optimization hacks, and calling THOSE snake oil.

      SEO shenanigans _work_ because ranking algorithms run on reptile excretions. I'd draw a little snake tail dangling out of the search engine machinery.

    • CNET SEO conversation:: "How do we get google to rank all of our old shit higher?". Spitballer:"Let's hide a bunch of it, announce that it helped our rankings, and then see what Google does."
  • by CmdrPorno ( 115048 ) on Wednesday August 09, 2023 @08:07PM (#63754886)

    That's the real question of the day.

    • How does Slashdot still exist ðY

      • by ac22 ( 7754550 )

        Self-moderating. Cheap to run, a lot of free high-quality content from internet strangers (relatively speaking).

        • by Ol Olsoc ( 1175323 ) on Thursday August 10, 2023 @06:26AM (#63755530)

          Self-moderating. Cheap to run, a lot of free high-quality content from internet strangers (relatively speaking).

          Yes, this.

          While Slashdot isn't perfect, there are intelligent people here who can post intelligent conversations. And yes, asshats will get mod points, and do weird things with them, like when continuing a conversation will get modded offtopic, or making a simple observation flamebait or troll.

          And some of the users can be a bit of a pill. but you can easily ignore them.

          But go to Facebook for a while, and see how the general level of poster smarts is really low.

    • by hawk ( 1151 )

      never mind "still around"--what's this "content" it talks about finding on cnet???

  • by istartedi ( 132515 ) on Wednesday August 09, 2023 @08:21PM (#63754916) Journal

    The Enshitification will continue until morale improves.

  • Mental Note (Score:4, Insightful)

    by sound+vision ( 884283 ) on Wednesday August 09, 2023 @08:47PM (#63754948) Journal

    Making a mental note now: CNET dead. No intention of providing useful content, or even to archive what useful content they may have produced in the past.

    Not that I ever liked it enough to bookmark it, but I do remember reading at least some relevant reporting there. Now, machine-generated articles and SEO? I doubt I'll click on a link to them again.

    • Re:Mental Note (Score:5, Insightful)

      by Voyager529 ( 1363959 ) <voyager529 AT yahoo DOT com> on Wednesday August 09, 2023 @08:56PM (#63754962)

      Making a mental note now: CNET dead. No intention of providing useful content, or even to archive what useful content they may have produced in the past.

      There were two golden ages of CNET...first around the early 2000s when they had a weekly TV broadcast of tech news highlights, and a second one around the time the iPhone arrived and they had a good blend of product reviews and general video content.

      It was downhill when they lost Molly Wood, Veronica Belmont, and Tom Meritt; once they started reviewing cars, it seemed like they were going for a 'different direction'.

      In their prime, CNET had some respectable reviews. Bias was present (they certainly slanted towards Apple and Apple-like products), but in general their reviews were far more technical than Consumer Reports and frequently included useful benchmarks that competitors didn't.

      Farewell, CNET; you'll still be useful for the same thing I've used CNET for since about 2013: the domain responds to ping and it's a site I know nobody uses so I know I'm doing a fresh DNS query when I ping it.

      • "...and it's a site I know nobody uses so I know I'm doing a fresh DNS query when I ping it."

        That's one of the best geek burns I've heard since TBBT went off the air

      • For a brief time in the early 2000s they had CNET radio on 910 AM in San Jose. I actually liked it for awhile. I can't remember the morning guy but he was alright. Night ended with David Lawrence syndicated show that was both good and bad. It was clear the shelf life was limited though.

        Like all radio though there is only so much to talk about - tech, news, politics, sports.

  • You would think that by adding a robots.txt and excluding archival content, CNET could still "sends a signal to Google that says CNET is fresh, relevant and worthy of being placed higher than our competitors in search results," ( I know that's already a stretch) while retaining the content, not breaking old links and so on.

    It's not like "old stuff" [drdobbs.com] doesn't have value and isn't worth keeping around, right ?

    • by gmack ( 197796 )
      Yeah, great plan. So now you get nothing when you look up some 5 year old model number instead of the CNET article describing the specs.
      • But, at least if you went to CNET and searched IT directly, you'd find the results. Their current approach is it's gone everywhere, forever (unless archive.org saves you).

  • by rsmith-mac ( 639075 ) on Wednesday August 09, 2023 @11:47PM (#63755160)

    Unfortunately, I'm not surprised to see CNET do this. While they're a site that's going down the tubes to begin with (bring back the old logo!), they're also, frustratingly, not wrong to be culling old articles.

    I've been in the position to see the results of several smaller sites orchestrate similar cullings for SEO reasons. And despite Google's claims, all of the sites benefited with search-driven traffic improving at least a few percent in a week, and rankings on critical terms improving as well. In the case of most of these sites, the improvement is not night-and-day, but it it is measurable, and it is significant to the bottom line.

    With that said, I don't believe Google is actively being malicious - that is, I'm sure they're not trying to punish sites with old content. However, what I've noticed is that long-lived sites with lots of historical content started sinking as Google took on some of their more recent efforts to weed out low quality content. Low quality content has always been a problem, but in the last couple of years in particular, unscrupulous operators have been taking advantage of the pandemic employment environment (and more recently the rise of generative AI systems) to double-down on rapidly generating new content. Meanwhile, Blackhat SEO services have been able to help content mills get their content ranked shockingly high; not enough to take the top spots, but high enough to get some real, meaningful traffic out of it.

    The net impact, I feel, is that Google is currently losing the war on blackhat SEO operations and their associated content mills. That is, Google's index and ranking systems can no longer reliably tell the difference between good content and bad content, as many of the quality signals they have previously relied on have been copied by mills. Even more recent efforts by Google to prod sites into supplying E-E-A-T info within articles (Experience, Expertise, Authoritativeness, and Trustworthiness) have quickly been undermined by mills doing the same. The mills are all liars, of course, but Google can't determine that.

    What we're seeing is an explosion of content at the same time as the effectiveness of search engines is crashing. Google has to weed through more crap than ever before, and the crap is winning; the signal is harder to find than ever before. The worst part is that I'm not sure what Google can do about it. Search is one of the products they make an honest effort to improve upon (since it's what brings all the boys to the yard), but they've been backed into corner with no obvious escapes. There simply isn't a good signal of quality left with websites for Google to rely on.

    Which, to bring things full circle back to CNET, is why we're seeing them purge old content. Since Google can't tell the difference between that content and content from mills (much of which is derived from that old, legitimate content), the old content has become a liability. As best as I can tell, the best way right now to show Google that you're not a mill is not to have too many articles, especially old articles, as those are some of the hardest articles for Google to digest (are they quality articles? Or a mill faking it to look better?). Which means that for legitimate sites to survive, they have to start playing increasingly byzantine SEO games to stay ahead of the mills. And often, adopt blackhat-lite SEO strategies to give them the edge over people who aren't playing fairly to begin with.

    All of which Sucks (with a capital S). It sucks for the readers, it sucks for the content creators, and it sucks for Google. But Google decides if commercial sites live or die. They are the de facto gateway to the world wide web. So commercial sites will do whatever they need to in order to survive - which is often the same thing content mills are doing to make a buck.

    And you wonder why Google is going so hard on generative AI (Bard) now. Their best hope for stopping this madness is to stop directing people to external sites to begin with, that way content mills have no reason to exist. It also means commercial sites will have no reason to exist - and thus won't be there to provide the inputs GenAI needs - but that's tomorrow's problem. Google can always scrape Reddit to get some (usually) human input...

  • by 93 Escort Wagon ( 326346 ) on Thursday August 10, 2023 @12:22AM (#63755196)

    Why not hide it via robots.txt so the stories are hidden from Google but still available if someone actually cares? Or is CNET's site search just Google?

  • Remember when tot gave you the answer you needed?
    It is a pity that OpenAI removed the browser for ChatGPT 4. It was really useful and I wasted much less time searching

  • This is what we call re-writing history by deleting ideas, thoughts, and events that we disagree with. Don't be fooled by the cuddly little title that appears to innoculous.
  • Google's job is(was) to index content, not control it. They only have this power because you give it to them. Publish your content for your audience, not for Google.
  • 410 Gone (Score:5, Insightful)

    by Misagon ( 1135 ) on Thursday August 10, 2023 @05:18AM (#63755446)

    I help maintain a wiki with lots of references to articles on various tech news-sites, including Cnet. While sites deleting their old articles isn't that uncommon, you can often find them on archive.org's WaybackMachine.
    But not always. And it is really annoying when they don't.

  • A long time ago, Google notified a company that users were complaining about their insufficient quality articles and that they needed to take action or get buried in the search, they took drastic actions and threw piles of money at it and still went effectively bankrupt. I should know. It was Demand Media's ehow.com branch and I worked for them.
    This is not the only example. CNET knows what happens if they don't clean it up.
    • Google notified a company that users were complaining about their insufficient quality articles...It was Demand Media's ehow.com branch and I worked for them.

      Oh, that chestnut of a website. You're making me agree with Google and I don't like it.

      Maybe I just got profusely unlucky, but one hundred percent - literally EVERY SINGLE eHow article I ever read - was a useless "have-you-AD-tried-AD-turning-it-AD-on-AD-AD-AD-and-off-again" piece written at the English level of a middle school student. Never once was there a 'next step' that wasn't something pointlessly simplistic.

      Now, that may have served eHow well for other things; there's certainly room for entry level

  • CNet's older articles and review were great for finding old hardware reviews so you could find similar-hardware replacements for really old, but production required, hardware. Case in point I need to find Windows XP-era SCSI cards for a very old industrial label system. The drivers for the label system require Windows XP SP2. SP 3 breaks things, and to replace the label system will cost several million USD. I was searching the old reviews of hardware from that time period, and CNet has the most robust on

  • by ElizabethGreene ( 1185405 ) on Thursday August 10, 2023 @08:54AM (#63755796)

    I hate link rot as much as the next person, but at least their doing it the right way by seeding it in the Internet Archive. My employer changes blog and documentation platforms every few years and a huge amount of still useful content has been lost. If you're lucky you can find a copypaste of it on another site with some googling, but I've had to crack out my old TechNet Subscription CDs to get some stuff.

    It's a problem.

  • I would not be surprised of CNET were taking it offline in order to sell the dataset for AI training purposes. Or for their own AI training program...

  • It sounded like BS, I asked Google, they responded right away, it's fake news.

    https://twitter.com/searchliai... [twitter.com]

  • I've wasted a bloody fortune paying SEO to game google. The only worst thing to happen to the internet is the amazon affiliate link, which totally destroyed finding real information about a product. Now you get 1000 returns which are AA links with the same SEO optimized verbiage. Often, an old magazine article is the best source for actual information....

As you will see, I told them, in no uncertain terms, to see Figure one. -- Dave "First Strike" Pare

Working...