![Youtube Youtube](http://a.fsdn.com/sd/topics/youtube_64.png)
How a Computer That 'Drunk Dials' Videos is Exposing YouTube's Secrets (bbc.co.uk) 55
An anonymous reader shares a report: How many YouTube videos are there? What are they about? What languages do YouTubers speak? As of 14 February 2025, the platform's will have been running for 20 years. That is a lot of video. Yet we have no idea just how many there really are. Google knows the answers. It just won't tell you.
Experts say that's a problem. For all practical purposes, one of the most powerful communication systems ever created -- a tool that provides a third of the world's population with information and ideas -- is operating in the dark. In part that's because there's no easy way to get a random sampling of videos, according to Ethan Zuckerman, director of the Initiative for Digital Public Infrastructure at the University of Massachusetts at Amherst in the US. You can pick your videos manually or go with the algorithm's recommendations, but an unbiased selection that's worthy of real study is hard to come by.
A few years ago, however, Zuckerman and his team of researchers came up with a solution: they designed a computer program that pulls up YouTube videos at random, trying billions of URLs at a time. You might call the tool a bot, but that's probably over selling it, Zuckerman says. "A more technically accurate term would be 'scraper'," he says. The scraper's findings are giving us a first-time perspective on what's actually happening on YouTube.
[...] The first question was simple. How many videos have people uploaded to YouTube? [...] Zuckerman and his colleagues compared the number of videos they found to the number of guesses it took, and arrived an estimate: in 2022, they calculated that YouTube housed more than nine billion videos. By mid 2024, that number had grown to 14.8 billion videos, a 60% jump.
Experts say that's a problem. For all practical purposes, one of the most powerful communication systems ever created -- a tool that provides a third of the world's population with information and ideas -- is operating in the dark. In part that's because there's no easy way to get a random sampling of videos, according to Ethan Zuckerman, director of the Initiative for Digital Public Infrastructure at the University of Massachusetts at Amherst in the US. You can pick your videos manually or go with the algorithm's recommendations, but an unbiased selection that's worthy of real study is hard to come by.
A few years ago, however, Zuckerman and his team of researchers came up with a solution: they designed a computer program that pulls up YouTube videos at random, trying billions of URLs at a time. You might call the tool a bot, but that's probably over selling it, Zuckerman says. "A more technically accurate term would be 'scraper'," he says. The scraper's findings are giving us a first-time perspective on what's actually happening on YouTube.
[...] The first question was simple. How many videos have people uploaded to YouTube? [...] Zuckerman and his colleagues compared the number of videos they found to the number of guesses it took, and arrived an estimate: in 2022, they calculated that YouTube housed more than nine billion videos. By mid 2024, that number had grown to 14.8 billion videos, a 60% jump.
Quis crawls ipsos crawlers? (Score:4, Funny)
Re: (Score:2, Troll)
Re: Quis crawls ipsos crawlers? (Score:2)
Google doesn't search randomly. It downloads known pages (home pages for domain names, for example), scrapes them for URLs and then downloads those. Rinse and repeat.
It is possible to keep pages out of Google's search by not placing links to them in pages that will be crawled. And keep the links out of GMail and other stuff Google tends to exploit.
Re: (Score:3)
Google doesn't create random URLs, no, it builds lists of likely real URLs based upon root index pages and links from other sites.
For an idea of what this computer is trying doing, here's the same thing but using telephone numbers: https://www.youtube.com/watch?... [youtube.com]
Re: (Score:2)
Re: (Score:3)
My grand daughter is mildly autistic, and loves YouTube
The stuff that she manages to find there is absolutely mind-blowing, even with all child safety controls in place
I remind myself how I used to comb the family encylopaedia as a youth, so I find some solace in that
Re: (Score:2)
Isn't that just Google
If you take the trouble to read the summary, it says "we have no idea just how many [videos] there really are. Google knows the answers. It just won't tell you."
Re: (Score:1)
That is not a wild guess, that is an estimation. Often in life that is all you're going to get, still better than a "wild guess".
Re:Value of Wild Guesses? (Score:5, Interesting)
Maybe you should shut up and learn about statistics.
I strongly suspect these researchers used the same methodology that was used successfully in WW2 to estimate the total number of German tanks based on the serial numbers of ones that had been captured or destroyed.
Math works.
Re: (Score:1)
Maybe you should shut up and learn about statistics.
I strongly suspect these researchers used the same methodology that was used successfully in WW2 to estimate the total number of German tanks based on the serial numbers of ones that had been captured or destroyed. Math works.
The unique identifier that YouTube uses for videos is a randomly generated 11 character string, there's no sequence, so you can't extrapolate.
Re:Value of Wild Guesses? (Score:5, Informative)
Re: (Score:3)
And they haven't moved to 12. So you can generate random URLs and try them. The percentage of hits is a statistical estimate of how full the address space is.
Re: (Score:3)
The unique identifier that YouTube uses for videos is a randomly generated 11 character string, there's no sequence, so you can't extrapolate.
There are multiple chapters in my university statistics book that not only say you can extrapolate but tell you precisely how to and what information you can get from this along with detailed information on how much you need randomly generate in order to come to a statistically significant result.
Statistics is hard. There's a reason many books are written about the subject, I really recommend you read a basic one though before you comment on matters of statistics again.
Re: (Score:1)
Also, Monte Carlo methods - e.g. getting an area by checking if random points are inside or outside a shape
https://www.google.com/search?... [google.com]
Re:Value of Wild Guesses? (Score:5, Informative)
Similar type of thinking maybe, but not the same.
Basically, for those curious: the estimates of tank numbers was done by looking at tank serial numbers. If you had just one serial number, and no other intelligence to help, you simply doubled the serial number to get the number of tanks, as you could assume it was fairly unlikely you had a number at the extreme ends of the serial number spectrum.
But as you collected serial numbers, you'd start to see that they were appearing within a range, and could narrow that number considerably. It was still not a perfect answer, but if you have:
1, 7, 11, 22, and 23
The chances are slim that there are 100 tanks. I can't recall the exact calculations they used, but a quick "Take the mean and double it" gives me 25, which is a very credible number. If there are many, many, more than 25 tanks (say, 50), why haven't we seen any numbers higher than 23? It may be a small number of samples, but picking 5 items and never getting anything in the 24-50 range when you have a range of 1-50 is really statistically unlikely.
Of course, it does make presumptions and you have to account for that. For example, if the first 25 tanks were sent to Belgium, and the next 25 to the Russian front, and the Russians didn't share any data with you, then your metric may be skewed. But nonetheless, it's not a bad start.
So... any way, this is different. They're generating random URLs and seeing if they work. Youtube URLs generally fit certain patterns, all of the form https://www.youtube.com/watch?... [youtube.com]{SOME ID) where an ID might look like dQw4w9WgXcQ or ENiZ-mgkJqM.
Doing a basic crawl of the site will give you the ID formats and some useful figures like what characters are more popular in certain spots than others. That means you can then start generating random URLs that could be legitimate YouTube URLs.
At this point you'd have three figures to work with: the theoretical maximum number of URLs of the form above, the number you tested, and the number that actually had videos. The more samples you test, the more you should start to have a stable ratio of tested:working, and then you can use that ratio to extrapolate how many of the theoretical maximum number of URLs would be likely valid.
It's similar to guessing the number of, say, green M&Ms in a batch of 1,000 randomly coloured ones. You could count all 1,000, but you might instead grab a handful, notice that 30% of them are green, and extrapolate that there are probably 300 green M&Ms in the batch. If you're unsure you might shake the bag and grab another handful, but if 35% of those are green, you pretty much know you're in the right ballpark.
This is called the Monte Carlo method [wikipedia.org].
Statistics is cool. I wish more attention was paid to teaching it in schools.
Re: (Score:2)
In college my maths instructor was Rabi Bhattacyra [arizona.edu]
I found it hard to understand him at times, and the US students he faced often frustrated him, but the work he has done in statistical analysis is ground breaking, I suggest anybody who has the opportunity to read his work
Re: (Score:2)
School unfortunately teaches, if one has this problem, use this formula: The limitations of that method are not taught and how to think about describing the problem, is not relevant. This is why programming is difficult. Children are not given tools to abstract facts.
To that end, I read Use and abuse of statistics [archive.org], 1962. It is a junior school description of the limitations of statistics. There is also "How to lie with statistics", 1954.
Re: (Score:2)
To assume that one organization, ANY organization could spin up enough hardware that is just "sitting around" to "DOS attack" Google is just laughable.
I don't see captchas on any of my accounts doing anything with google. What are you doing that is requiring captcha's all the time?
Re: (Score:2)
AI Generated Garbage Content (Score:5, Interesting)
Re: (Score:2)
the flood of AI generated garbage content infesting YouTube.
"My cousin's best friend's roommate was making fun of me for being lazy by staying in my room all day, when I caught them conspiring to sell my stuff and kick me out I stopped paying the 10k per month keeping them afloat and left. Days later I had 32 missed calls and numerous panicked texts..."
Don't get me wrong, I don't mind stories and often listen to audiobooks to help me sleep - if the AI voice is good and the story remains somewhat coherent I'll throw one on if I'm having trouble sleeping, saves mone
Re: (Score:2)
Watching AI generated youtube content makes my eyes hurt, they need to work on the inter-frame stabilization imo
Re: (Score:3)
I'd hazard a guess that the 60% jump can be attributed mostly to the flood of AI generated garbage content infesting YouTube.
Lets also not overlook the obvious when looking at any trends from 2020 to 2024. Since that’s also happens to be when a few billlion humans were forced out of their place of employment (creating massive job loss), resulting in millions of amateur YouTubers created several metric fucktons of content out of boredom and desperation.
Not really surprising that social media content spiked.
Re: (Score:2)
I haven't seen one of these AI generated videos, do you have an example?
I had this happen to me (Score:2)
I had a couple of exes that would drunk dial me in the middle of the night. Never underestimate the power of number block in today's cellphones.
So why... (Score:5, Insightful)
By mid 2024, that number had grown to 14.8 billion videos
If that's true, then why does YouTube only show me the same 40 or so videos over and over again in my feed?
Re:So why... (Score:4, Informative)
Here are some official stats shared last year:
The average number of views on YouTube videos is 5868. The median is 35
68%: the proportion of *videos with zero views
38%: the number of YouTube videos with fewer than 5 views
44%: the number of YouTube videos with fewer than 100 views
93%: the number of YouTube videos with less than 1000 views
34% of YouTube videos concern gaming
Re: (Score:2)
Your stats don't stat.
How can you have 68% with zero views and only 38% with fewer than 5 views, since zero is *definitely* fewer than five?
Are you just makin' stuff up?
Source or it is xkcd stat trope time.
Re: So why... (Score:2)
I'm going to guess that's non-zero. I didn't write it
Re: (Score:2)
So those two categories account for 106% ov the videos then.
Either way it appears to be full of shit.
Re: (Score:2)
from the article (yes I clicked the link, I know I'm not suppose too) 4% had zero views and less than 5 view seems to be about 20% (the graph is not that good a bit hard to read), and the median is 17 to 32 views, don't know where the gp found the number it is quoting.
Re: (Score:2)
the same 40 or so videos over and over again in my feed?
Probably because, for many people, YouTube is used like cable TV was back in the day - something you put on for noise while you focus on something else. Feeding you something you'd already seen and possibly enjoyed lets you fulfil that need by providing something familiar to have on in the background so you can focus on the other task.
It's like throwing Star Wars on for the umpteenth time, sometimes you really want to watch it while others it's just something you can have on in the background while doing s
Re: (Score:2)
Are you kidding? I have to resist clicking videos knowing YouTube will flood my feed with more videos of a similar type. Click on any of these and find out:
Canadian Train Plowing A HUGE Snowdrift https://www.youtube.com/watch?... [youtube.com]
TOP 10 HARD LANDINGS https://www.youtube.com/watch?... [youtube.com]
Can 10,000 Lego Bricks Stop a 300-Ton Hydraulic Press? https://www.youtube.com/watch?... [youtube.com]
Re: (Score:2)
URL War Dialing (Score:3)
Just come to the point (Score:3)
How many cat videos are we talking about?
Asking for a friend.
Re: (Score:2)
How many uploaded? (Score:3)
Or how many still exist? I've bookmarked "interesting" videos for years. And found that a not insignificant number of them have just disappeared.
Re: (Score:2)
The number they got to is based on statistical analysis of today. They are trying to figure out how many videos are on YouTube, not how many have been historically uploaded.
Re: (Score:2)
Do you trust that to really be random? Why?
Is the uniform assumption valid? (Score:2)
The researcher mentioned the corresponding example of sampling the US phone number space. However, the distribution of phone numbers is not uniform, at least in part because most area codes are not allocated and it's not clear if phone numbers are uniformly distributed within allocated area codes. Similarly for YouTube videos, it's not clear if YouTube URLs are uniformly distributed. It wouldn't be at all surprising if Google stratified the URL space based on some semantic meaning.
Maybe the researchers d
Re: (Score:2)
That can be tested.
Google will tell you (Score:2)
The article says google won't tell you and "Experts say that's a problem."
Experts in what? I asked google and it told me:
"In 2025, about 2.6 million videos are uploaded to YouTube every day, which is equivalent to 518,400 hours of content."
That's an AI generated answer, of course. So who knows, it could be completely wrong.
Whatever the data may be, the data for recommendations is in need of a reset. At least I'd like to be able to reset it for my account.
Every time I look it's the same garbage I didn't want
Nine billion names (Score:2)
Latest Search Engine Feature (Score:2)
Google will buy this company and add a (beta) feature, which is new button labeled "I Feel Drunk".
Well, maybe they already did that. But it's not a "random Youtube" and it's not even a button. It's just called search results, or sometimes, "Generative AI".
There is by the way a button (tab) on Youtube labeled "New To You" but I'm not sure exactly what it's supposed to be doing. Last time I clicked it, I got a video showing how to make Chloroform. How did they know I was going on a date tonight? Analytics and
No Swedish? (Score:2)