New York Times Source Code Stolen Using Exposed GitHub Token (bleepingcomputer.com) 52
The New York Times has confirmed that its internal source code was leaked on 4chan after being stolen from the company's GitHub repositories in January 2024. BleepingComputer reports: As first seen by VX-Underground, the internal data was leaked on Thursday by an anonymous user who posted a torrent to a 273GB archive containing the stolen data. "Basically all source code belonging to The New York Times Company, 270GB," reads the 4chan forum post. "There are around 5 thousand repos (out of them less than 30 are additionally encrypted I think), 3.6 million files total, uncompressed tar."
While BleepingComputer did not download the archive, the threat actor shared a text file containing a complete list of the 6,223 folders stolen from the company's GitHub repository. The folder names indicate that a wide variety of information was stolen, including IT documentation, infrastructure tools, and source code, allegedly including the viral Wordle game. A 'readme' file in the archive states that the threat actor used an exposed GitHub token to access the company's repositories and steal the data. The company said that the breach of its GitHub account did not affect its internal corporate systems and had no impact on its operations. The Times said in a statement to BleepingComputer: "The underlying event related to yesterday's posting occurred in January 2024 when a credential to a cloud-based third-party code platform was inadvertently made available. The issue was quickly identified and we took appropriate measures in response at the time. There is no indication of unauthorized access to Times-owned systems nor impact to our operations related to this event. Our security measures include continuous monitoring for anomalous activity."
While BleepingComputer did not download the archive, the threat actor shared a text file containing a complete list of the 6,223 folders stolen from the company's GitHub repository. The folder names indicate that a wide variety of information was stolen, including IT documentation, infrastructure tools, and source code, allegedly including the viral Wordle game. A 'readme' file in the archive states that the threat actor used an exposed GitHub token to access the company's repositories and steal the data. The company said that the breach of its GitHub account did not affect its internal corporate systems and had no impact on its operations. The Times said in a statement to BleepingComputer: "The underlying event related to yesterday's posting occurred in January 2024 when a credential to a cloud-based third-party code platform was inadvertently made available. The issue was quickly identified and we took appropriate measures in response at the time. There is no indication of unauthorized access to Times-owned systems nor impact to our operations related to this event. Our security measures include continuous monitoring for anomalous activity."
275 GB (Score:5, Interesting)
Why so large? Do they store every image for every story?
Re: (Score:3)
Yeah it does seem like R.O.U.S. (Repositories Of Unusual Size) -- with apologies to the Princess Bride -- what are they doing with 5,000 repositories??
Re: (Score:1)
LLMs are huge and they need many of those in order to censor things and push the correct narrative in order to influence their readers to vote for who they want to. Back in the days, they were just reporting news but now things have changed!
Re: (Score:1)
They have every single "reporter" and "editor" in the entire organization working on that around the clock, thank you. Don't need no stinkin' AI!
Re: (Score:2)
I see. The world is filled with fools being blindly led by their hidden masters, but you are intelligent enough to see through their deception. Thank you for filling us in.
Re:275 GB (Score:5, Insightful)
Not so large. Google's source code repo is 85 TB. https://cacm.acm.org/research/... [acm.org]
Re: (Score:1)
I guess it's true that everyone's a software company these days. Even newspapers.
Re: (Score:1)
> Why so large?
if software binaries are not in main software repos, they will be in that github.
Re: (Score:3)
You sound like my employer, purchasing computers with 512GB hard drives like it's 2015 and then wondering why their engineers have run out of room after installing a few 120GB IDEs.
Re: (Score:2)
That's an extremely large IDE. How does it justify it's size?
(OTOH, I use a souped-up text editor (geany), so take my opinions with a grain of salt, but 120GB???)
Re: (Score:1)
VS is that big if you install every feature, framework, and language with all their libraries. There are also a lot of testing VMs.
Re: (Score:2)
Lots of vendor specific tools, packages and IP files (I presume) for embedded firmware development.
Xilinx/AMD and Altera/Intel tools seem to continuously increase in size. Combine either Vivado or Quartus with offline packages/cache for a custom Linux build using Yocto, and you're basically out of space on a small hard disk these days.
Probably far larger than the typical high-level software IDEs though.
Re: (Score:1)
Pretty sure this is UnknownSoldier. He forgot to remove his .sig! :D
Re:275 GB (Score:5, Insightful)
that can be every version of the web edition, every inhouse tool, every inhouse business app, every inhouse library for those ... 5000 repos sounds indeed a lot but that can include alos data or crowdsourced projects. nytimes has been around for a while, given today's ubiquitous bloat i'm surprised that's *only* 270gb.
Re: (Score:3)
I could create a human-level intelligence with far less data than that.
https://academic.oup.com/gigas... [oup.com]
an entire human genome of 3 GB can be compressed to 4 MB by referential compression
Re: (Score:2)
Try "partially analog". There are aspects of the neural system that are "digitally encoded", though not necessarily to base 2.
Basically, if you want a signal to be copied without degradation, then you use a digital encoding. If you're after fast and unmediated reaction, you use analog. But you often need to mix those reactions.
Re: (Score:2)
Re: (Score:3)
You would be surprised how far some developers will go to work around security because they just need things to work. While browsing GH a while back to look for examples of how to integrate GH actions to a cloud based service, I found hardcoded API tokens on a public repo. I informed the owners, they responded, and did nothing to correct the issue.
A few years ago I found out that an engineer sent the root password for self driving cars over the internet to remote control said vehicles. His response was that
Re: (Score:2)
Re: 275 GB (Score:2)
Re: (Score:1)
Seems like they create a new repo for every topic.
Re: (Score:1)
From what I've heard it includes the source to all the articles that had highly customized pages.
Re: (Score:1)
Yeah, that occurred to me after I submit. I'm a "submit now, ask questions later" kind of guy.
Pay Wall (Score:5, Funny)
All fun and games until you build the git repo against your own site and you need to subscribe to the New York times to access it...
Oh no, they got Wordle! (Score:2)
Uh... that's written in HTML and Javascript. Anyone could grab that.
Re: (Score:2)
Say you don't understand how code works without saying you don't understand code...
Wardle's version of Wordle was 100% self contained, and not too badly obfuscated. The daily puzzle word solutions, all the date math... everything. When the sale was announced I grabbed a copy on the off chance the New York Times did anything jenky with it.
I haven't bothered with the Times' version because all they've "added" is to make the game paywall-friendly.
GitHub token security? (Score:1)
I'm not surprised this happened. Many devs are usually under the gun, so having a GitHub token used everywhere and stuffed in an app, perhaps hard-coded is likely done, just because it saves time. Most likely, the dev whose token that got exposed is going to face zero consequences, while IT likely will have people fired because it happened "on their watch".
Exactly when did they report it to the SEC? (Score:2, Interesting)
Would like to know the timeline:
- When did the NYTimes detect the source code theft?
- When did the NY Times report it to the SEC since they are a publicly traded company, and other government agencies?
- When did the NY Times report it to the general public, before or after the quarterly earnings were released?
- Did the NY Times cover it up and if so, for how long?
- Which executives and managers knew of this breach and when?
What are the effects on stock investors since millions of NY Times stock shares were
Re: (Score:3)
nytimes stock is doing fine.
it's embarrassing, but your worries are exaggerated. apparently no private info from customers or sensible sources (if they even have those) was leaked, and i don't think the nytimes' source code has any value for anyone except the nytimes. the paywall maybe? for sure that wasn't rocket science and leaking the source won't contribute much to how it is probably being already bypassed, paywalls tend to only block the low hanging fruit, if anything it can be changed in no time.
it's
Re: (Score:1)
> apparently no private info from customers or sensible sources (if they even have those) was leaked
False. A list of freelancers and all their contact information, including phone numbers and home addresses, was included in the leak.
were you monitoring? really? (Score:2)
“Our security measures include continuous monitoring for anomalous activity”
perhaps they do now. was that system in place when ALL OF THE SOURCE CODE WAS DOWNLOADED???? is that a normal thing to do at NYT?
Re: (Score:2)
Do they really care? Why should they? Do you think that source code holds any important secrets? (I'd be surprised if it held anything more significant than a bunch of passwords or access tokens. Which they'd then need to change to prevent access to the working version.)
Re: (Score:2)
Re: (Score:2)
If you did, any chance you could share it? Thanks!
Re: (Score:2)
Leaked source code (Score:2)
Anyone have a copy? Anything cool in it? How much of it is curl commands to chatgpt or random sentence construction based on a target name?
Re: Leaked source code (Score:5, Informative)
Re: Leaked source code (Score:2)
Re: (Score:3)
Why should they care? It's not like their source code SHOULD be very important. That's not what a "news organization" is about.
Standard PR response says: (Score:2)
The issue was quickly identified and we took appropriate measures in response at the time.
Yes sir, we've surely secured that stable door a little better than before. Let's not talk about our source code that's out there now.
Re: (Score:3)
Deepest secret (Score:2, Funny)