Have 100GB Free? Host Your Own Copy of Wikipedia, With Images 151
First time accepted submitter gnosygnu writes "Want your own copy of English Wikipedia with images? Got 100 GB of disk space? Then open-source app XOWA may be of interest to you. The project released torrents yesterday for the 2013-11-04 version of English Wikipedia. There's 100 GB of sqlite databases containing 13.9 million pages, and 3.7 million images — readable from any Windows, Linux, or Mac OS X system. Image downloads for other wikis are building, but you can still use XOWA to read the text-only version for other wikis like Wiktionary, Wikisource, Wikiquote and 660 more. Next time you find yourself stranded without the internet, you can pull out your own copy of Wikipedia for use."
Article Ownership (Score:5, Funny)
It comes with software that automatically reverts your edits and insults you.
Re:Article Ownership (Score:5, Funny)
It comes with software that automatically reverts your edits and insults you.
Citation Needed.
Re:Article Ownership (Score:5, Funny)
It comes with software that automatically reverts your edits and insults you.
Citation Needed.
1 http://news.slashdot.org/comments.pl?sid=4488409&cid=45527247 [slashdot.org]
CAN I HAZ LOCAL CP? (Score:2)
WANTZ UNCYCLOPEDIA [wikia.com]
Re: (Score:1)
Citation [wikipedia.org]
Re: (Score:1)
Ah yes, I was worried it wouldn't have the full wikipedia experience.
Without the abuse, it just isn't wikipedia man.
Re: Article Ownership (Score:2)
It's ok. You can still edit the articles to say whatever you and then cite them on Slashdot. No one can defy you!
Re: (Score:2)
Re: Article Ownership (Score:2)
What really impresses me is that it downloads the entire Wikipedia, with "no internet connection required!" That's an engineering feat if I've ever heard of one!
Re: (Score:2)
The send it on DVD via station wagon.
Finally! (Score:5, Funny)
Finally I can have my own version of wikipedia so I can correct all those changes I haven't been allowed to enter into the official version!
Re: (Score:2)
Re: (Score:3)
Ah, a Wikieditor/fanboy. Admit it: you will be torrenting this 100GB copy just so you can delete every article, then do it all again.
Re: (Score:2)
I hear this from a lot of slashdotters. No one bothers to give examples.
Here's one for you. In 2006 I needed eye surgery, an artificial lens for my left eye. My surgeon suggested a new design that had been out since 2003. Before the new design there were two types: monofocal and multifocal. Multifocal was like having bifocals in your eyeballs, with monofocals you needed reading glasses.
The new design is called an accomodating lens and sits on struts inside the lens capsule, so it will focus using the eye's
Re: (Score:2)
I have no idea about the internal mechanisms behind this though, i only edit/add a few lines at a time - i might just be lucky.
Re: (Score:2)
I'm not doubting your story - especially since you're someone I generally trust well on slashdot (you're my "friend" here); however:
Go on, try to edit something. It can't be done.
A little while ago (back in 2008 looking at the article history), I found this article about MFPs [wikipedia.org] to be horribly weak and focused only on home devices with no mention of office or production devices at all.
Working in the MFP industry, I was able to add a lot of information and give good citations for it; so I did so. Other than the occasional spammer trying to advertise their
Re: (Score:2)
I've always suspected that it was one of Bausch&Lomb's competitors who removed my edits since thye new IOL was so superior (even if it was $1000 more expensive, being under patent). I can see where you would have been more successful with your attempt.
Re: (Score:1)
And then I read the webcomic Namesake, and went to Wikipedia to check one thing about Alice Liddel which was featured on the webcomic (which is very good by the way, and is quite famous now).
I noticed that there was a section of the wikipedia notice called "Alice Liddel in fiction" and which didn't feature Namesake, so I added Namesake.
It was immediately reverted, citing "no link".
So I reverted the revert and added a link.
It was i
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
You say that and laugh, but wait until someone that manages their own DNS, and with an evil intention gets a good idea...
That reminded me of the Upsidedownternet [ex-parrot.com]. ;-)
Re: (Score:3)
Finally I can have my own version of wikipedia so I can correct all those changes I haven't been allowed to enter into the official version!
Or you could just switch to using Conservapedia.
Re:Finally! (Score:5, Insightful)
That's pretty much impossible to get into now (as a new editor), because you're either banned for being too sane to pass ideological purity, or banned for being so insane you're mistaken for a troll.
Re: (Score:1)
Re: (Score:3)
You've always been able to download every page and image. Am I missing something?
http://dumps.wikimedia.org/ [wikimedia.org]
It's that time of year again (Score:2, Funny)
Does it include the seasonal donation nag banners?
Holidays are coming! Holidays are coming!
Rats. It won't QUITE fit on a microSD card... (Score:2)
...yet. But I guess most phones won't easily read sqlite databases yet, either. I suppose it won't kill me to lug around a full-sized SD card.
Still looking forward to the library-of-Congress-on-a-card from Rainbows End.
Re:Rats. It won't QUITE fit on a microSD card... (Score:5, Funny)
Rats. It won't QUITE fit on a microSD card...
Just exclude the star trek / star wars related entries; that should pare it down. And besides we all have it all committed to memory anyway right? :p
Re: (Score:3, Informative)
...yet. But I guess most phones won't easily read sqlite databases yet, either. I suppose it won't kill me to lug around a full-sized SD card.
Still looking forward to the library-of-Congress-on-a-card from Rainbows End.
Most phones _won't_? Four out of five smartphones today have sqlite preinstalled and ready for use: http://developer.android.com/reference/android/database/sqlite/package-summary.html
Re: (Score:2)
Nokia S40 phones are no longer being made dude. Its 5/5 phones these days.
Re: (Score:1)
...yet. But I guess most phones won't easily read sqlite databases yet, either.
The structured storage for Android Apps is just SQLite databases. Of course Android doesn't include a database management tool for the end user, but in the background it can read SQLite just perfectly.
Re: (Score:1)
The LoC is 10 TB (uncompressed) in "volume". Not 10 EB.
Compressing the LoC with efficient algorithms and it fits on any modern HDD.
Re: (Score:3)
Yeah, I was misremembering the line:
"The British Museum and Library, as digitized and databased by the Chinese Informagical Coalition. The haptics and artifact data are lo-res, to make it all fit on one data card. But the library section is twenty times as big as what Max Huertas sucked out of UCSD. Leaving aside things that never got into a library, that's essentially the record of humanity up through 2000. The whole premodern world."
128PB, 97% in use.
Day after tomorrow... (Score:2)
When the supercold storm blasts through your town, your device will freeze. And I'll still be able to read the pages of my Universalis as I tear them to burn them for heat.
Quite a bit smaller than I'd have thought. (Score:4, Interesting)
I'd have put en.wikipedia at at least a couple of terabytes. Not inconceivably large, but with some housecleaning I could actually get 100GB free.
Re: (Score:3)
Re: (Score:2)
Re: Quite a bit smaller than I'd have thought. (Score:3)
Re: (Score:2)
yeah, 3.7 million images under 100gb? Do I even want to look at these? I can't imagine how compressed and low res those would have to be.
Re: (Score:2)
Fortunately, you don't have to imagine. The simplest of arithemetic will reveal that's an average of about 20kB per image. If we assume as near-worst-case an uncompressed 16-bit pixmap format, that means 100px x 100 px or so; realistically, most of them are probably jpegs, so search your hard drive...
find / -name '*.jpg' -size -25k -size +15k
And take a look at what you have in that range. Then keep in mind that that's an average -- there'll be some much better and some even smaller/compresseder.
Re: (Score:2)
that's an average of about 20kB per image. If we assume as near-worst-case an uncompressed 16-bit pixmap format, that means 100px x 100 px or so; realistically, most of them are probably jpegs
Exactly my point. :-)
Re: (Score:2)
I downloaded all of the (current revision) text a few years back from some of their public data dumps. Stored in a handful of massive XML files, it ended up only being around 3GB. I'd guess it isn't much bigger now, and that the vast majority of the 100GB is simply due to images.
legitimizing torrents (Score:5, Insightful)
That's a good thing. The more we use torrents for the distribution of legitimate content, the more such distribution methods will become legitimized.
Re: (Score:2)
Re: (Score:2)
It's already legitimate and doesn't need legitimizing.
Of course that doesn't mean that just because your favorite popular zero-day movie/series/albums/ebooks/software site of rather unauthorized nature magically gains "but what about the copy of wikipedia!?"-protection from the likes of MPAA/RIAA/Wiley?/BSA.. at least not in most courts of law.
Re: (Score:2)
You seem to be confusing "legal" and "legitimate". It's legal, but not necessarily considered legitimate. In particular, many ISPs seem to interfere with torrent traffic. The more people use it for non-copyright-infringing purposes, the more pressure there is on ISPs to back off on their interference.
Re: (Score:2)
While popularly torrents get messed about with in terms of available bandwidth, the same applies to several other P2P protocols. It's the painted nature of the beast - lots of potentially high-bandwidth connections established for essentially low-priority purposes - that hurts it in that respect. (Yet) An(other) archive of wikipedia isn't going to change that - unless you can think of a convincing reason to submit to ISP decisionmakers that would cause them to believe that throttling the download and/or u
What? (Score:2)
> XOWA is a free, open-source application that lets you download Wikipedia to your computer. No internet connection required!
This is supremely impressive; download Wikipedia without an internet connection!
Re: (Score:2)
Re: (Score:2)
First, you tie your request to download Wikipedia to this pigeon's leg and let it fly off.
Next, you wait for the reply.
Finally, you load the reply into your computer.
NOTE: Reply will come in printed format - one article per pigeon. A few million pigeons may be required, but don't worry. We send them all at once to keep you from having to wait.
Re: (Score:2)
Sounds good to me, there's certainly no shortage of pigeons. It'll be good to put them to work doing something useful!
Re: (Score:2)
This is supremely impressive; download Wikipedia without an internet connection!
Someone's never heard of BD-R.
What? (Score:1)
Re: (Score:2)
Some of us have to do it. When the boat's connection goes down (e.g. because bad weather misaligns us with the satellite for days on end), that's it ; no internet. Also no emails, or phone calls except through the ship-to-shore radio set. It's bliss!
When was that version copied? (Score:3)
Offline/remote situations (Score:2)
This will be great for offline/remote/low speed situations. Imagine being on a merchant ship or even a cruise ship with a pricey connection package. Scientific expeditions etc.
How about preloading it on OLPC?
What if your high school kid can't do his homework without getting distracted online, but says he needs Wikipedia for research. Bam, here's your air-gapped PC son.
Now this is truly (Score:2)
Don't Panic (Score:5, Interesting)
Truly amazing times we live in.
Re: (Score:2)
Next year or so 100GB phones will be commonplace...and you will have your Hitchhiker's Guide.
Pffth. I don't need that. I just need to remember that it's "mostly harmless".
Revisions? (Score:4, Interesting)
Presumably the wikipedia is under revision control.
Does this give you the whole thing so that you can forever after sync with the master?
Or just the most recent versions of the articles?
Should there be a bittorrent for syncing huge revision control data bases?
already did this ( today, text version only ) (Score:3)
Re: (Score:2)
Re: (Score:2)
What's new about this? (Score:1)
I've been mirroring a local copy of Wikipedia for a long time, with images. What's new about this app compared to the dozens of others that already do this?
Finally! (Score:1)
I was wondering when I could replace my CD of Encarta 96.
SQLite? (Score:2)
Not very entertaining (Score:2)
Wikipedia is only so entertaining if you are stranded somewhere with no other way to pass the time.
Now, if they give us a torrent of the complete TVTropes site....
Only 100GB? (Score:1)
That's ALL it takes up?? My goodness! Wikipedia can fit on my largest USB drive?? haha.. I expected it to be in the multi-TB range!
Re: (Score:2)
It's text only. Images and videos not included.
Re: (Score:2)
Hmm, I thought it was text only. Must've missed that part. I stand corrected.
Checksums anyone (Score:2)
Aside from all the jokes (Score:2)
This is really a cool thing to have as an option. 100G isn't that much today when a TB might cost you 30 bucks.. ( rather surprised its that small... ) and with how 'vunerable' everything is on the net today it wouldn't hurt to have an archive before the next take down notice or commercial buy-out. ( or shut-down due to loss of funding )
For offline Wikipedia use, just use ZIM (Score:2)
If all you want is an offline Wikipedia reader, just use Kiwix [kiwix.org]. It uses the ZIM format [openzim.org] which was created specifically for offline use and runs on Win/Mac/Linux/Android or anything else if you want to compile it yourself.
While the full English Wikipedia ZIM sans pictures is a bit old (January 2012), it has the benefit of being only 10GB and split up into 2GB chunks so it will fit on a FAT32 device like your phone's SD card.
Re: (Score:3)
Re: (Score:1)
I suggest a website like say wikipedia.org
Re: (Score:1)
Re: (Score:2)
Re: (Score:3)
You are right. That's a silly summary they put on. They should say something like 'No Internet connection required while browsing/searching through the wiki' (one of their feature).
Navigate between offline wikis. Click on "Look up this word in Wiktionary" and instantly view the page in Wiktionary.
Re:No internet connection required! (Score:5, Funny)
I prefer ZModem myself.
But if you don't have that you can probably use XModem.
Re: (Score:2)
Enjoy.
Re: (Score:1)
Re: (Score:2)
Direct Dialup connection. That is how I downloaded files before I had Internet access.
Re: (Score:2)
How do I download it if I don't have an internet connection? Does this require special hardware?
Order Wikipedia on DVD, from Wikipedia themselves. http://dumps.wikimedia.org/dvd.html [wikimedia.org]
Re: (Score:1)
Re: (Score:3)
And yet you commented only 16 minutes after the AC...
Re: (Score:2)
Re: (Score:1)
Re: (Score:3)
As a long long time editor...
Look at the quality of information.
I agree, you did a terrible job. Please, quit editing!
Re: (Score:2)
This is a news-for-nerds site. It’s reasonable to assume dates are in ISO format. :)
Re: (Score:2)
Re:2013-11-04 (Score:5, Informative)
Actually, ISO 8601 dates (YYYY-MM-DD) are unambiguous: far better than the ambiguous AA/BB/YYYY notation, since Americans interpret it as MM/DD/YYYY but in some other countries it's regarded as DD/MM/YYYY.
As an added plus, a lexical sorting of YYYY-MM-DD dates is also a temporal sorting. Not so with either of the other two formats.
http://en.wikipedia.org/wiki/ISO_8601 [wikipedia.org]
Re: (Score:2)
Yeah, even worse, the scripting runtime on Windows auto-parses AA/BB/YYYY into Date types, but it defaults to USA regardless of system locale... unless it can't be interpreted as a valid date.
If you enter
12/02/1999
That's the second of December, regardless of actual system locale...
13/02/1999
And that's the 13th of February (possibly just in locales like GB).
Not sure if this has ever been fixed, but it was a royal PITA when I used to do ASP classic pages.
Re: (Score:2)
I agree that ISO 8601 is much better, but people will still put the year last in informal usage no matter how much you try to convince them otherwise. Among the countries that I've visited (not an exhaustive list obviously), only the US (usually) uses "/" as the separator. The others usually use "." or "-". And only the US has the month first. So an informal convention that usually works for me when there is ambiguity is to interpret "/" as meaning month first, anything else day first.
Re: (Score:2)
Re: (Score:2)
I've seen some online specifications in the format YY/MM/DD or maybe it's YY/DD/MM, practically impossible to determine for the past 14 years.
Re: (Score:2)
year month day, so it can be sorted easy.
Re: (Score:2)
Because all offline wikipedia readers require you to download the wikipedia dump, and the english wiki isn't dumped that often, and this is wiki converted to HTML with downscaled images as far as I can understand.
Re: (Score:2)
How is this different from wikitaxi which has been available for years. http://www.yunqa.de/delphi/doku.php/products/wikitaxi/index [yunqa.de]
Dumps for Wikitaxi typically don't have images. Though it is a great tool.
Re: (Score:2)
Yes. Plenty of ways to do that. Google is your friend.
You can even buy a handheld piece of hardware ( that runs forth! ) if you like. http://www.thewikireader.com/ [thewikireader.com]
Re: (Score:2)
Can I have a slightly smaller copy without the images and references?
Use Wikitaxi (Windows only, works in Wine): http://www.yunqa.de/delphi/doku.php/products/wikitaxi/index [yunqa.de]
Get dumps from here:
http://dumps.wikimedia.org/enwiki/ [wikimedia.org]
look for: pages-articles.xml.bz2
You have to process the dump. One I did earlier in the year resulted in a 15GB file.