Robust Hyperlinks: The End of 404s? 105
Tom Phelps writes, "URLs can be made robust so that if a Web page moves to another location anywhere on the Web, you can find it even if that page has been edited. Today's address-based URLs are augmented with a five or so word content-based lexical signature to make a Robust Hyperlink. When the URL's address-based portion breaks, the signature is fed into any Web search engine to find the new site of the page. Using our free, Open Source software (including source code), you can rewrite your Web pages and bookmarks files to make them robust, automatically. Although Web browser support is desirable for complete convenience, Robust Hyperlinks work now, as drop-in replacements of URLs in today's HTML, Web browsers, Web servers and search engines."
Re:Either that... (Score:1)
Oh, and holding down left-shift on my keyboard didn't seem to help any.
Robust until... (Score:1)
Re:Wasn't this what URI's were supposed to address (Score:1)
From what I'm reading here, the form of those URLS this guy is generating is actually illegal syntax. That is, with the '?' character, that is intended as a query and any proper web server would attempt to run a CGI type script with it.
If you want to know more about URNs, and my implementation of them in Java (replaces most of java.net) go to http://www.vlc.com.au/~justin/java/urn/ [vlc.com.au]
Re:Hijacking redirectors ?? (Score:1)
Some details (and a complaint) about how it works (Score:1)
< A HREF="http://my.outdatedsite.com/page?robusturlkey words=farts+sandler+zippo+methane+boom" >
So the "robust keywords" are just an HTTP query string attached to the usual URL. When the server goes to produce a 404, it presumably calls a CGI (the distribution's jar file probably contains a 404 servelet or some such beastie) which re-directs (301 or whichever) to google.com with an appropriate query string based on the keywords in "robusturlkeywords".
As an HTTP junkie, I have to say I'm not too fond of it; you're ruining the whole point of 404 semantics. (Kinda like sites that redirect you to their homepage when you give them a bogus URL - it irks me to no end.) It would be much more straightforward (and less prone to attacks and the general unreliability of search engines) for server administrators to start maintaining proper 301-Moved Permanently databases and perform lookups in those whenever the server hits a 404 condition.
Just MHO.
Either that... (Score:1)
Re:Either that... (Score:1)
Now if only web sites/servers can be robust (Score:1)
Oops, I'm redundant.(OFF-Topic) (Score:1)
Re:Mirror It Please Somebody (Score:1)
I think the reply is (Score:1)
Of course, that doesn't stop many of those same people from complaining about lack of Java on Linux
Apparently de factor "standards" only count when they come from the Good Guys.
Irony (Score:1)
[OK, it was a server down or unreachable error, but it was funnier the other way]
Has it been Slashdotted already
Too good to be true ;-) (Score:1)
the end of 404s, but what about /.s (Score:1)
Re:There's always a "but(t)." (Score:1)
I can just see it... pr0n sites won't no longer need all those senseless keywords in their meta-tags to show up on innocent-looking keywords you feed a search engine...
No - now all they have to do is to stuff the "Robust Redirector" with some makeshift-keywords they extracted by spidering over a load of webpages, and presto! --- You've Got PR0N!1!
That's kind of like they do now, with sitenames that are popular "speling" errors of other sites...
Also, who's going to prevent people using the same keywords for their page, and how is the process of choosing between n possible redirections going to be handled, as it should be "transparent" to the user?
I guess there's a lot of thought-work left before this reasonably can go live... and still, how many of you have Smart Browsing enabled in Netscape, and how does this differ, privacy-wise?
(Mmmmh... Portscan... ARGH)
np: Boards Of Canada - Unknown Track 2.mp3 (Live)
As always under permanent deconstruction.
Oh, the irony (Score:1)
Interesting, for unique keywords (Score:1)
How will it help me if my URL changes?
quack
Re:Sounds very iffy to me (Score:1)
On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".
From freashmeat you can see that the appropriate file for it is called Robust.jar, so I think you're probably correct there :)
JavaScript has nothing to do with Java. The fact that the file ends inRe:Damn! (Score:1)
Re:Mirror It Please Somebody (Score:1)
The document is analyzed and a few unusual words are selected. These are used in a signature which is either put in links (within the anchor tag) or in a URL like this:
The advantage of putting it in the URL is that bookmarks may work. Implementation can be in server or client and there are advantages to both methods. If it's in a noninformative client then you might not be aware of redirection (unless the wrong page is retrieved and it is obvious).
Irony patrol (Score:1)
Re:Won't work with Linux, sorry. (Score:1)
Yet another small attempt to make Windows the 'better' OS for the Internet...
This is just *another* case of Linux falling behind due to it's lack of support for common Internet standards. Where is our ActiveX? COM?
Falling behind? I'm grateful that Linux doesn't have ActiveX (read: a huge security hole).
Granted, I can occasionally watch as the Java ads on Slashdot cause Netscape for Linux to crash, but that seems to be the extend of Linux's so-called internet connectivity.
What? The extent of Linux's 'internet connectivity'? What crack have you been smoking lately? Linux is more intimately tied to the net than any other OS (except for other Unixen) due to the fact that TCP/IP is an integral part of Linux/*BSD/etc. Just because Linux doesn't support a Microsoft-developed technology, it's all of a sudden not suitable for the Internet?
And you wonder why people are forced to use windows+IE?
It has much more to do with the fact that there are no 'major' apps available for Linux (by major, I mean the industry standard - Photoshop, Illustrator, most M$-crap) than it does about ActiveX. Before anybody jumps at me and says 'What about The GIMP?', Adobe Photoshop is the industry standard for pixel-based graphics design and photo editing. Most professionals (including myself) are experienced with Photoshop. To retrain oneself for a different program is harder than learning it from scratch.
If they want to make use of the latest technologies, for example 'Robust URLs' (though maybe they should have invested in a Robust Server), then Linux, sadly, can't keep up. We as a community are being left behind in the Internet arms race.
Why? I'm sure that someone will develop a Linux/*BSD implementation of Robust URLs and the incompatibility is solved. The Linux community is not being left behind at all, just because that can't use a few CraptiveX controls.
Fortunatly, I have a few ideas:
Get a task force composed of Richard Stallman, Bruce Perens, and ESR to develop and debug ActiveX support for Linux. Estimated time: 2 months.
Bad idea! Supporting ActiveX on Linux is (in my eyes, FWIW) tantamount to giving out your root password. Anything that allows automatically downloaded/embedded code to have FULL ACCESS to my hardware is inherently evil and should be destroyed. And Authenticode? Give me a break...that only tells you who to blame if you get a trojan and not whether the control is safe or not...
Form an Open Source Browser Committee to create a new, Open Source web browser that supports all the latest standards (CSS, DOM, DNA) Estimated time: 3 months.
Well, we do have Mozilla [mozilla.org], even though it is not GPL'ed, it's Open Source.
Push for Perl to be embedded in all new web browsers so that CGI programs can be run on the user's machine, which will reduce server loads. Estimated time: 1 month.
Should be quicker than that - just provide an interface to the existing Perl implementation.
Design a new, Internet-ready desktop for Linux, Give it a web browser, probably the new one I described above, and embed it in everything: file manager, word processor, start button, etc. Estimated time: 4 months.
This is a great idea, which will (if implemented correctly) make the barrier-to-entry much lower than it currently is. Graphical configuration tools are also needed (but don't change the underlying architecture, let those who want to use the console).
I think that with these items accomplished, Linux will truly begin to shine as a web platform, even for the newest users.
I fully agree, except for the ActiveX support. Just because Microsoft develops it doesn't mean that Linux should strive to be compatible (else we will eventually have another Windows).
Disclaimer - My comments do not represent the views of ABC19 WKPT and are my own.
_______
Scott Jones
Newscast Director / ABC19 WKPT
Commodore 64 Democoder
read before you write! (Score:1)
With Harvest [ed.ac.uk], indexing software that is several years old, an indexing engine that identifies documents by their MD5 signature is easy to build, I've done this. So what these people are proposing isn't exactly rocket science
distribute the redirectors (Score:1)
All of the required technology is present in Harvest [ed.ac.uk], it just never became popular. My guess is that cool ideas have to be reinvented in Berkeley before the world gets to see them applied at large, see Yahoo! for another example.
Needed: database of cross-links to make this work (Score:1)
Not done alot of web programming I see (Score:1)
If you have an idea don't pass the buck and say all these "famous" OpenSourcers need to do this. Go do it yourself...then maybe you won't be so quick to say how easy and quick it would be.
May be answered.. (Score:1)
Re:Wasn't this what URI's were supposed to address (Score:1)
There are several different proposed URN systems being worked on right now (the document even mentions some, such as PURLs and handles). The big problem with these new specs is that there are a larger number of conflicting requirements dependsin on what you really want to do, so they're unlikely to be able to settle on just one proposal (they've been trying for several years).
Still, after looking through the `robust Hyperlink' documents, basically all of the old URN specs that I've seen are better than this, so I hope it doesn't distract people too much.
Why can't we eliminate 404's already? (Score:1)
Re:finally! no more lost porn! (Score:1)
Maybe that was their motivation
I just wish I could stop all of these pop up windows.
May rely on typos--broken by spell checkers? (Score:1)
(I've already sent them an email about this.)
Chris
Re:Hijacking redirectors ?? (Score:1)
live free or die
pay toll ahead
Two Different Webs... (Score:1)
But what if I'm looking for something specific? The web has been nearly useless to me when I wanted to find information on ancient illuminated Arabic text, or pictures of Microsoft Bob in action (for a parody).
So do "robust hyperlinks" help me or hurt me? Say I get a dog who has certain unsavory habits with regards to my cats, and I want to look up links about "interspecific coprophagia". Also assume for a moment that the next Korn clone band names themselves "coprophagia". Good search engines allow me to exclude entries that have certain words, but what happens when "robust hyperlinks"-based software assures me that http://www.coprophagiaonline.com/new_releases/ive
...are we just using new technology to make search engines even more frustratingly inaccurate?
lexical-signature= "sex+mp3+porn+alissa%20milano+beanie%20baby+jesus
404 Error will never die... (Score:1)
Then again, the domain name just won't be funny anymore if 404 Errors go away. *sigh*
--Ruhk
More Porn. (Score:1)
Porn sites start copying the five words of large portal and news sites and in the event of a 404 for one of those sites you automatically get redirected to the site you really "wanted" to visit anyway.
Anybody know if this is going to be an actual standard or just something usefull until a new truly robust adressing system gets adopted. It might be on the site but that's sort of unreachable right now.
Great idea: leave forwarding message (Score:1)
When you desert one host or modify your site, why don't you leave forwarding messages (or 302 responses) to tell people where to find your new content?
How's that for a great idea?
Re:Won't work with Linux, sorry. (Score:1)
Besides, even if you could succeed at making this happen, Micro$oft would be sure to change the code slightly so as to break your version, and if it hurts some of their customers in the process, well what do they care?
No, this is a fundamentally unworkable plan.
--
Brad Knowles
Berkeley? Pick up the cloo phone, it's for you! (Score:1)
Oh GREAT. Just what we need--so much for the whole "you can't 'accidentally' find porn on the Internet" argument. This just throws that out the window, because all a porn site needs to do is hijack the right search keywords and wait for cnn.com to have a broken link.. *poof* millions of users get sent to porn.
Not only that, but it makes site debugging a pain in the ass.
Thanks Berkeley!
we can only pray... (Score:1)
Re:404 Gallery (Score:1)
You like 404s ? Try this one: http://www.g-wizz.net/wibblewibblewibble.swf [g-wizz.net].
Yes, that file extension is a hint...
This is old news... (Score:1)
If a visitor couldn't reach the site because Geocities had taken it down, he just needed to feed "paer9udtzk6gn8modfi" (paraphrased, of course) into Altavista to be pointed to the new location.
Re:Not unlike Freenet (Score:1)
WOW.
This is a fantastically great idea.
How long before we get URLS like freenet://contraband_information.html ?
-k
Re:Not unlike Freenet (Score:1)
I took a look at this, and it looks quite neat. If Freenet manages to get this right, I hope it really takes off. I especially like the idea of not having to dole out tons of cash or make do with a free web service in order to get something published.
-RickHunter
--"We are gray. We stand between the candle and the star."
--Gray council, Babylon 5.
the true dead links (Score:1)
Re:Off topic, but interesting. (Score:1)
It's worse than that. That state tried to penalize someone for covering the slogan. When someone tried to exercise his freedom of (non)speech by putting electrical tape over the slogan, the state took him to court. I seem to recall the case going on for a long while through several appeal processes where the state tried to force people to spout slogans about freedom. The irony was apparently completely lost on the bureaucrats enforcing the slogan.
Re:Not unlike Freenet (Score:1)
FREENET [lights.com] is already a widespread term, referring to MANY local public-access community supported ISPs. A quick lookup gives 16 countries with 233 separate groups.
It is unfortunate that nobody told you of the name overlap before this, but using "freenet" for your web will only generate anger among people already familiar with the community free ISP usage.
Hmmmm - Is it possible that the socialists (free public access to whatever) and the libertarians (Where were you when they took our freedoms?) have really never heard of each other's Freenet until now? I'm only familiar with the ISP usage, where it is
Re: (Score:1)
Maybe I'm missing something but... (Score:1)
Even the best search engines only index a small percentage of the entire web and then they are hideously out of date.
Not to mention the problems of someone hijacking your unique id by stuffing the search engine with bogus words.
(Disclaimer - I haven't read the actual article due to it being
Cool (Score:1)
Re:Either that... (Score:1)
Kidding, And here I doing a left bitwise shift then I looked at my keyboard for a sec. heehee
Unique tokens, or fetch by MD5 (Score:1)
Well, it would be much easier to include a token somewhere (e.g., in a comment) that would be unique to this page. A randomly generated string of 20 ASCII characters would do the job.
But this is prone to the same highjaking attack as the original scheme.
A much better solution would be to fetch by MD5: teach search engines to compute MD5 sums of every document they index, then include MD5 sum somewhere in the URL.
That would also allow for better caching!
robust link apache mod (Score:1)
infact, it wouldn't have to be an apache mod - any kind of executable that could be cron'd to check links every so often could have the same effect.
i'm not sure how this would fit in with the whole signature thing. i suppose we could just pgp sign our web pages and but the signing in comments.
but as with most of my ideas, someone's probably already coded this.
Re:Won't work with Linux, sorry. (Score:1)
That problem is with Netscape, not Linux. Yes, I often have problems with Java crashing Netscape, but that happens regardless of whether I am using the Windows version or the Linux version. Point is, Linux is great, Netscape is okay, but Netscape's implementation of Java leaves a lot to be desired in the way of stability.
=================================
Re:Won't work with Linux -- WRONG!!! (Score:1)
This is very important for a few reasons. (Score:2)
1) Growing sites that may change servers, or domain names (add/on to dedicated URL, change domain name for legal/incorporation/buyout reasons), will see the massive traffic bleed they suffer until everyone realizes their site has changed virtually disappear. Yes, putting a redirect page on your "old home" may help, but for things like RSS file addresses, and other external connectors, which may have an effect on your site, this is a problem.
Ultimately, of course, for this to TRULY work there needs to be technology like this built into not only browsers, but virtually any software that uses HTTP communication (XML parsers, bots, spiders, etc).
2) I want to start offering streaming video on my site, and the single biggest obstacle for doing that is COST. Bandwidth, unless you OWN the pipe, is NOT cheap. I can (albeit in a somewhat underhanded fashion) set up a script to register, say, 24 different "free site" pages with the content to be the "correct" version of my page once an hour, and, unless the content is in VERY heavy demand, essentially have a free method of streaming video on my site.
Egads, I'm already feeling dirty about what I just said. Okay, maybe that's a little TOO unethical. But I guarantee someone will do it.
Sounds very iffy to me (Score:2)
That said, the concept seems iffy. Based on the above, the fact that it works in all existing browsers, suggests to me that the form of the URL is the following:
>a href="http://robusturl.server.com?http://my.outdat edsite.com&keyword1="whatever"<
Namely, that anchors that use this URL will be sent to this server (apparently fixed in place), then redirected either to the working page, or to the appropriate search engine results. This means that the robust server will be running scripts. While I don't believe that the indent as described here would be to catalog all matches, all you need is one unscrupulous company that uses this and can now trace where you are and where you are going to quite easily with a bit of modification. I really don't like this potental, and personally I'll take a 404 anyday over potental privacy problems.
On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".
Wasn't this what URI's were supposed to address? (Score:2)
Re:Not unlike Freenet (Score:2)
--
404 Gallery (Score:2)
reinventing the wheel... (Score:2)
anyone who's looked at the http spec for more than a millisecond will see that it already handles this case quite gracefully with the 3xx series of responses, including:
301 Moved permanently
302 Moved Temporarily
I think
Re:Hijacking redirectors ?? (Score:2)
Perhaps one of the keywords should be the previous URL? In fact, perhaps a better solution would be a new Meta tag of "Prev-URL" (or something similar) that search engines could look at and use to update their databases?
On an anecdotal note (or is that redundant?), I remember searching once, for the web site of a Land Rover owners club (I think it was Ottawa Valley Land Rovers in Canada) and was directed to a auto parts store in Australia -- turned out that the web pages had the names of lots of auto clubs in meta tags. The idea was to get people searching for the clubs to go to the store's site.
smart and dumb (Score:2)
send flames > /dev/null
Good Idea but 90% of 404's are deleted pages (Score:2)
Why, because 90% of 404's are a result of the page been taken down completely (especially if it's on geocities or xoom or some free provider).
A program that you could install for your browser like NetAccelerate (loads links off current page into cache when the bandwidth isn't been used) but simply loads the links far enough to detect a broken link or not would be very handy. Although it wouldn't solve any problems it would alteast stop you from getting your hopes up when you've finally found a link to a page that claims to be what you've been searching for for an hour.
Re:Sounds very iffy to me (Score:2)
Java != JavaScript, people!
--Earl
Nice idea, shame about the... (Score:2)
<ASSUMPTION>The 'word description' is going to be capable of describing a page adequately, and uniquely, per page, like an MD5 digest, rather than a simple text descriptor. The latter would just be silly.</ASSUMPTION>
I can see some value to this if the page is static and likely to be relocated, rather than rewritten, or deleted, but how is this going to work if the page is, dynamically generated from a database, and the whole site is prone to reorganisation (like what Microsoft's seems to be).
It might help more if there was a way to uniquely identify snippets of content within a page, and provide a universal look-up scheme based on unique fingerprints of these 'snippets'. Although I'm sure that pouts it straight into XPointers territory, isnt it...?
And an 'opt-out' system is necessary. There are lots of reasons one might want particular content to be transient.
Re:The real solution ... (Score:2)
The real solution ... (Score:2)
This will also allow site owners to see who's linking to them, but obviously it should be utterly transparent (so that you can still link in private, but then you wouldn't get updates).
At some point we'll get there, it's just a matter of time. Questionable schemes such as the topic of this story are just a kludge, and probably not worth the effort.
Damn! (Score:2)
PoC
It's down already (Score:2)
Well it sounds like an interesting concept bu unfortunately I can't get to the site already. Surely it's too soon for the /. effect?
Re:Sounds very iffy to me (Score:2)
On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".
From freashmeat you can see that the appropriate file for it is called Robust.jar, so I think you're probably correct there :)
We need URLs first (Score:2)
This sounds great - practical solutions to a real problem.
OTOH, there are already far too many sites where there just isn't an accessible URL anyway. Some are frame-based, some are dynamically generated. They all have the problem of not being bookmarkable (from within the browser's normal "Bookmark Here" function). Some do try to solve this though, by separately publishing a bookmark that will take you back to the same content.
If this idea is to really work, then it needs to be supported by dynamic sites publishing their Robust Hyperlinks, even for pages that don't have a "traditional" URL to begin with.
Re:Wasn't this what URI's were supposed to address (Score:2)
Definitely a heads-up for anyone looking for a quick technical fix to the problem.
Here's another way to do it (Score:2)
What about it the link tag in the html also contained the date/time it was created. This way the browser would now how old it was. It the browser sent this to the server as a header then if the server couldn't find it it could check some database or whatever to see what the directory structure was like at that time and work out what redirect to use. If bookmarks also contained this date/time then surely the server could tell the browser to update the bookmark (after warning the user, of course).
This would be pretty cool on an interactive site where the server could rearrange query strings or whatever if the serverside scripting had been given a big overhaul/re-organization.
Basically, surely the server itself, and not some search engine would best know how to fix a broken link and it would only requires a couple of new headers and should be easy to implement at least on the client side.
-----------------------------------------------
"If I can shoot rabbits then I can shoot fascists" -
thoughts (Score:2)
-----
Alexa's solution to "404 errors" (Score:2)
Alexa also collects detailed information about what you look at with your browser, although they of course claim to use it only in the aggregate.
I see a problem with this... (Score:2)
So you get a 404 and you want to use a search site to find where it went? That's fine if it's been long enough since the move to give the web crawlers time to find it... there's a lot of web space out there to search!
But here's the good one: what if someone decides to hijack your web site by simple keyword spamming? All they have to do is set up their own page with the right keywords, get it indexed, and anyone who uses an "old" link will get redirected to them instead! And if web pages can be defaced, they can be removed, too, thus forcing the 404 and the search!
Better yet, use wholesale keyword spamming to get all those "dead" web pages pointing to your e-commerce site!
Re:404 Gallery (Score:2)
You're in the midst of nowhere
a droplet in a mist,
you musta typed in something weird
this URL, it don't exist.
kwsNI
There's always a "but." (Score:2)
... as in, "It's a good idea, but!" As has been pointed out, there are potential privacy issues. For the "average" user, though, I don't think this is a terribly big deal. What becomes a problem, then, is access to the Robust URL redirector (as I understand it from posts, the site seems to either be simply down, or a victim of the /. effect). Since all Robust URLs have to pass through the redirector, what happens if the redirector is down? What happens if the redirector is unreachable?
Furthermore, simply feeding keywords to a search engine doesn't guarantee finding your page quickly, or even finding it at all. Designers would have to include unique keywords - words that might not even apply to their page - so that a Robust URL search would turn up only their page. Not only does this bloat HTML code, but it also confuses people using search engines in the usual way.
Certainly a good idea, as many people hate 404s (bah, they're just a fact of life), but it seems like it's got more than a few bugs left in it.
Not unlike Freenet (Score:3)
--
Re:Wasn't this what URI's were supposed to address (Score:3)
Dynamic content (Score:3)
Also, somebody else mentioned that they had a project on SourceForge which was basically like the Web, but in a completely distributed manner. This makes a lot more sense to me. The notion that my bits must cross a continent to retrieve data on a certain TOPIC seems a bit archaic. I shouldn't know or care where the data of the topic is stored...I just want it. Also, having a distributed web like this, as the person suggests, will make it a lot harder to invade privacy or censor material.
Hijacking redirectors ?? (Score:3)
Don't mean to be the Devil's Adocate, it is just my game programming / design skills kicking in. Whenever someone adds a usefull feature, you must look at the ways people will try to exploit this.
"Live free or Die" - Ironically, seen on a license plate.
Replacing a broken link with a Google search? No. (Score:3)
Frankly, I'd rather just get the 404 than waste time digging through erroneous links.
By the way, there are hypertext systems that address this issue in ways that actually solve the problem - the now defunct HyperG system was very intelligent about redirecting requests.
Try ftp'ing instead (Score:5)
Eric