Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Privacy News

New York Times Source Code Stolen Using Exposed GitHub Token (bleepingcomputer.com) 52

The New York Times has confirmed that its internal source code was leaked on 4chan after being stolen from the company's GitHub repositories in January 2024. BleepingComputer reports: As first seen by VX-Underground, the internal data was leaked on Thursday by an anonymous user who posted a torrent to a 273GB archive containing the stolen data. "Basically all source code belonging to The New York Times Company, 270GB," reads the 4chan forum post. "There are around 5 thousand repos (out of them less than 30 are additionally encrypted I think), 3.6 million files total, uncompressed tar."

While BleepingComputer did not download the archive, the threat actor shared a text file containing a complete list of the 6,223 folders stolen from the company's GitHub repository. The folder names indicate that a wide variety of information was stolen, including IT documentation, infrastructure tools, and source code, allegedly including the viral Wordle game. A 'readme' file in the archive states that the threat actor used an exposed GitHub token to access the company's repositories and steal the data. The company said that the breach of its GitHub account did not affect its internal corporate systems and had no impact on its operations.
The Times said in a statement to BleepingComputer: "The underlying event related to yesterday's posting occurred in January 2024 when a credential to a cloud-based third-party code platform was inadvertently made available. The issue was quickly identified and we took appropriate measures in response at the time. There is no indication of unauthorized access to Times-owned systems nor impact to our operations related to this event. Our security measures include continuous monitoring for anomalous activity."
This discussion has been archived. No new comments can be posted.

New York Times Source Code Stolen Using Exposed GitHub Token

Comments Filter:
  • 275 GB (Score:5, Interesting)

    by christoban ( 3028573 ) on Monday June 10, 2024 @09:05PM (#64539675)

    Why so large? Do they store every image for every story?

    • Yeah it does seem like R.O.U.S. (Repositories Of Unusual Size) -- with apologies to the Princess Bride -- what are they doing with 5,000 repositories??

      • by Anonymous Coward

        LLMs are huge and they need many of those in order to censor things and push the correct narrative in order to influence their readers to vote for who they want to. Back in the days, they were just reporting news but now things have changed!

        • They have every single "reporter" and "editor" in the entire organization working on that around the clock, thank you. Don't need no stinkin' AI!

        • I see. The world is filled with fools being blindly led by their hidden masters, but you are intelligent enough to see through their deception. Thank you for filling us in.

    • Re:275 GB (Score:5, Insightful)

      by Tony Isaac ( 1301187 ) on Monday June 10, 2024 @09:36PM (#64539725) Homepage

      Not so large. Google's source code repo is 85 TB. https://cacm.acm.org/research/... [acm.org]

    • by tokul ( 682258 )

      > Why so large?
      if software binaries are not in main software repos, they will be in that github.

    • by dohzer ( 867770 )

      You sound like my employer, purchasing computers with 512GB hard drives like it's 2015 and then wondering why their engineers have run out of room after installing a few 120GB IDEs.

      • by HiThere ( 15173 )

        That's an extremely large IDE. How does it justify it's size?

        (OTOH, I use a souped-up text editor (geany), so take my opinions with a grain of salt, but 120GB???)

        • VS is that big if you install every feature, framework, and language with all their libraries. There are also a lot of testing VMs.

        • by dohzer ( 867770 )

          Lots of vendor specific tools, packages and IP files (I presume) for embedded firmware development.
          Xilinx/AMD and Altera/Intel tools seem to continuously increase in size. Combine either Vivado or Quartus with offline packages/cache for a custom Linux build using Yocto, and you're basically out of space on a small hard disk these days.
          Probably far larger than the typical high-level software IDEs though.

    • Re:275 GB (Score:5, Insightful)

      by znrt ( 2424692 ) on Monday June 10, 2024 @10:22PM (#64539787)

      that can be every version of the web edition, every inhouse tool, every inhouse business app, every inhouse library for those ... 5000 repos sounds indeed a lot but that can include alos data or crowdsourced projects. nytimes has been around for a while, given today's ubiquitous bloat i'm surprised that's *only* 270gb.

    • I could create a human-level intelligence with far less data than that.

      https://academic.oup.com/gigas... [oup.com]

      an entire human genome of 3 GB can be compressed to 4 MB by referential compression

    • and just as importantly why the hell did a single token have complete access to 5000 repos. Sounds like complete incompetence at enforcing good security hygiene.
      • by sodul ( 833177 )

        You would be surprised how far some developers will go to work around security because they just need things to work. While browsing GH a while back to look for examples of how to integrate GH actions to a cloud based service, I found hardcoded API tokens on a public repo. I informed the owners, they responded, and did nothing to correct the issue.

        A few years ago I found out that an engineer sent the root password for self driving cars over the internet to remote control said vehicles. His response was that

        • Not surprised, hell we caught one dev team that had all there passwords, accounts, tokens on a wiki and to top it off no auth on the site. Was amusing them having to explain the outage when security disabled all accounts published that way. Still it shows lax security to get to a state where tokens have access to basically everything.
    • https://files.catbox.moe/jx7ks... [catbox.moe] (link from the original 4chan thread) has the full repository list that's in the leak. Again, each folder there is a separate Git repository - the leak has repos with full Git history.
    • From what I've heard it includes the source to all the articles that had highly customized pages.

  • Pay Wall (Score:5, Funny)

    by jamesjw ( 213986 ) on Monday June 10, 2024 @09:32PM (#64539719) Homepage

    All fun and games until you build the git repo against your own site and you need to subscribe to the New York times to access it...

  • Uh... that's written in HTML and Javascript. Anyone could grab that.

  • by Anonymous Coward

    I'm not surprised this happened. Many devs are usually under the gun, so having a GitHub token used everywhere and stuffed in an app, perhaps hard-coded is likely done, just because it saves time. Most likely, the dev whose token that got exposed is going to face zero consequences, while IT likely will have people fired because it happened "on their watch".

  • Would like to know the timeline:
    - When did the NYTimes detect the source code theft?
    - When did the NY Times report it to the SEC since they are a publicly traded company, and other government agencies?
    - When did the NY Times report it to the general public, before or after the quarterly earnings were released?
    - Did the NY Times cover it up and if so, for how long?
    - Which executives and managers knew of this breach and when?

    What are the effects on stock investors since millions of NY Times stock shares were

    • by znrt ( 2424692 )

      nytimes stock is doing fine.

      it's embarrassing, but your worries are exaggerated. apparently no private info from customers or sensible sources (if they even have those) was leaked, and i don't think the nytimes' source code has any value for anyone except the nytimes. the paywall maybe? for sure that wasn't rocket science and leaking the source won't contribute much to how it is probably being already bypassed, paywalls tend to only block the low hanging fruit, if anything it can be changed in no time.

      it's

      • by gaws ( 10083464 )

        > apparently no private info from customers or sensible sources (if they even have those) was leaked

        False. A list of freelancers and all their contact information, including phone numbers and home addresses, was included in the leak.

  • “Our security measures include continuous monitoring for anomalous activity”

    perhaps they do now. was that system in place when ALL OF THE SOURCE CODE WAS DOWNLOADED???? is that a normal thing to do at NYT?

    • by HiThere ( 15173 )

      Do they really care? Why should they? Do you think that source code holds any important secrets? (I'd be surprised if it held anything more significant than a bunch of passwords or access tokens. Which they'd then need to change to prevent access to the working version.)

  • Anyone have a copy? Anything cool in it? How much of it is curl commands to chatgpt or random sentence construction based on a target name?

    • by StreetNaija ( 6370318 ) on Tuesday June 11, 2024 @03:35AM (#64540109) Homepage
      magnet:?xt=urn:btih:e334bd248955a7b215cd069214d3a278a5d2d229&xt=urn:btmh:1220e52032287dbfe568e17e10882af2502d22e34725a03b2c0e2053d2fca1ff731a&dn=repos_archives&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.tracker.cl%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969%2Fannounce&tr=udp%3A%2F%2Fopentracker.io%3A6969%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce&tr=https%3A%2F%2Ftracker.tamersunion.org%3A443%2Fannounce&tr=udp%3A%2F%2Ftracker1.myporn.club%3A9337%2Fannounce&tr=udp%3A%2F%2Ftracker.tiny-vps.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.dump.cl%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.bittor.pw%3A1337%2Fannounce&tr=udp%3A%2F%2Fryjer.com%3A6969%2Fannounce&tr=udp%3A%2F%2Fp4p.arenabg.com%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.free-tracker.ga%3A6969%2Fannounce&tr=udp%3A%2F%2Fnew-line.net%3A6969%2Fannounce&tr=udp%3A%2F%2Fbt.ktrackers.com%3A6666%2Fannounce&tr=https%3A%2F%2Ftracker.renfei.net%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.loligirl.cn%3A443%2Fannounce
    • Magnet on 4chan (second post with magnet there, first one has no seeds): https://boards.4chan.org/t/thr... [4chan.org]
  • The issue was quickly identified and we took appropriate measures in response at the time.

    Yes sir, we've surely secured that stable door a little better than before. Let's not talk about our source code that's out there now.

    • People looking at the code isn't so much of a problem. Somebody pushing malicious code at a timing of their choosing on the other hand could be a big problem.
  • Perhaps this dump contains the answer to why it takes 25 minutes on the phone to cancel an NYT crossword subscription.

Machines certainly can solve problems, store information, correlate, and play games -- but not with pleasure. -- Leo Rosten

Working...