Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
News

Robust Hyperlinks: The End of 404s? 105

Tom Phelps writes, "URLs can be made robust so that if a Web page moves to another location anywhere on the Web, you can find it even if that page has been edited. Today's address-based URLs are augmented with a five or so word content-based lexical signature to make a Robust Hyperlink. When the URL's address-based portion breaks, the signature is fed into any Web search engine to find the new site of the page. Using our free, Open Source software (including source code), you can rewrite your Web pages and bookmarks files to make them robust, automatically. Although Web browser support is desirable for complete convenience, Robust Hyperlinks work now, as drop-in replacements of URLs in today's HTML, Web browsers, Web servers and search engines."
This discussion has been archived. No new comments can be posted.

Robust Hyperlinks: The End of 404s?

Comments Filter:
  • Not to mention that I'm on a Dvorak keyboard, and that makes his e-mail __y.'abaw@nmu.blw.'__ :)

    Oh, and holding down left-shift on my keyboard didn't seem to help any.

  • ... the porn servers start embedding shitloads of common 5-word phrases in their pages so every 404 takes you straight to "101 pussies for today" or wherever.
  • Already done. Most of the URN work is already hammered out, and a few of the older RFC's need to be updated a bit.

    From what I'm reading here, the form of those URLS this guy is generating is actually illegal syntax. That is, with the '?' character, that is intended as a query and any proper web server would attempt to run a CGI type script with it.

    If you want to know more about URNs, and my implementation of them in Java (replaces most of java.net) go to http://www.vlc.com.au/~justin/java/urn/ [vlc.com.au]

  • Sadly NH still hasn't gotten around to changing it to "Live Free or Die, Punk"
  • Actually, a quick (15-second) perusal of the actual materials shows that their approach is more like:

    &lt A HREF="http://my.outdatedsite.com/page?robusturlkey words=farts+sandler+zippo+methane+boom" &gt

    So the "robust keywords" are just an HTTP query string attached to the usual URL. When the server goes to produce a 404, it presumably calls a CGI (the distribution's jar file probably contains a 404 servelet or some such beastie) which re-directs (301 or whichever) to google.com with an appropriate query string based on the keywords in "robusturlkeywords".

    As an HTTP junkie, I have to say I'm not too fond of it; you're ruining the whole point of 404 semantics. (Kinda like sites that redirect you to their homepage when you give them a bogus URL - it irks me to no end.) It would be much more straightforward (and less prone to attacks and the general unreliability of search engines) for server administrators to start maintaining proper 301-Moved Permanently databases and perform lookups in those whenever the server hits a 404 condition.

    Just MHO.

  • or use relative addressing.

  • Yes, the drunk/misaligned typist school of crytography. A little-known branch that died out soon after it was proposed in a Berlin beerhall.

  • Robust links may be great but I can't connect to the site to learn more about it.
  • After posting this, I notice the comment submitted before me said the same thing I just wrote. (The comment after me also says the same thing) It would be nice if we were allowed to delete our own comments to save some moderators some redundant moderating.
  • So much for robust hyperlinks ;-)
  • "ActiveX and COM aren't common Internet standards they are just the work of a proprietary company!"

    Of course, that doesn't stop many of those same people from complaining about lack of Java on Linux ;-). And personally, I don't see a whole lot of difference between ActiveX and Perl (or TeX or Python or TCL or...), neither are standardized in any even vaguely meaningful sense of the word.

    Apparently de factor "standards" only count when they come from the Good Guys.
  • I got a 404 Not Found error when I tried the link!

    [OK, it was a server down or unreachable error, but it was funnier the other way]

    Has it been Slashdotted already
  • If something is too good to be true, it probably is. Besides, chances are someone'll patent it sooner or later. ;-(
  • love the idea of no more 404s, but, seeing as the server is appropriately and thoroughly slashdotted (or was when i looked), what about a url that's robust enough to survive the 'effect'?
  • I can just see it... pr0n sites won't no longer need all those senseless keywords in their meta-tags to show up on innocent-looking keywords you feed a search engine...

    No - now all they have to do is to stuff the "Robust Redirector" with some makeshift-keywords they extracted by spidering over a load of webpages, and presto! --- You've Got PR0N!1!

    That's kind of like they do now, with sitenames that are popular "speling" errors of other sites...

    Also, who's going to prevent people using the same keywords for their page, and how is the process of choosing between n possible redirections going to be handled, as it should be "transparent" to the user?

    I guess there's a lot of thought-work left before this reasonably can go live... and still, how many of you have Smart Browsing enabled in Netscape, and how does this differ, privacy-wise?

    (Mmmmh... Portscan... ARGH)

    np: Boards Of Canada - Unknown Track 2.mp3 (Live)


    As always under permanent deconstruction.

  • So the link to Robust Hyperlinks doesn't work. Sigh.
  • This sounds interesting if your using unique keywords for something like a family web site and you have a unique sername. However, what will happen when I havea site that needs to use "Hot", "Sex", "Babes", "XXX", "Nude" for my keywords? How many other sites are going to have the exact same keywords? Or more seriously, how about "Smith", "Family", "Web", "Page"?

    How will it help me if my URL changes?

    quack
  • On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".

    From freashmeat you can see that the appropriate file for it is called Robust.jar, so I think you're probably correct there :)

    JavaScript has nothing to do with Java. The fact that the file ends in .jar implies that the system was implemented in Java. So, it probably uses Servlets which can quite easily produce content which will work on all browsers.
  • Look more carefully at that 404 message. It's a joke.
  • Summary:
    The document is analyzed and a few unusual words are selected. These are used in a signature which is either put in links (within the anchor tag) or in a URL like this:
    http://www.cs.berkeley.edu/~daf/?lexical-signature =bregler+interreflection s+zisserman+cvpr+iccv

    The advantage of putting it in the URL is that bookmarks may work. Implementation can be in server or client and there are advantages to both methods. If it's in a noninformative client then you might not be aware of redirection (unless the wrong page is retrieved and it is obvious).

  • Hm. The site seems to be slashdotted.
  • Read the page. It's based on ActiveX and requires IE 4 or better.

    Yet another small attempt to make Windows the 'better' OS for the Internet...

    This is just *another* case of Linux falling behind due to it's lack of support for common Internet standards. Where is our ActiveX? COM?

    Falling behind? I'm grateful that Linux doesn't have ActiveX (read: a huge security hole).

    Granted, I can occasionally watch as the Java ads on Slashdot cause Netscape for Linux to crash, but that seems to be the extend of Linux's so-called internet connectivity.

    What? The extent of Linux's 'internet connectivity'? What crack have you been smoking lately? Linux is more intimately tied to the net than any other OS (except for other Unixen) due to the fact that TCP/IP is an integral part of Linux/*BSD/etc. Just because Linux doesn't support a Microsoft-developed technology, it's all of a sudden not suitable for the Internet?

    And you wonder why people are forced to use windows+IE?

    It has much more to do with the fact that there are no 'major' apps available for Linux (by major, I mean the industry standard - Photoshop, Illustrator, most M$-crap) than it does about ActiveX. Before anybody jumps at me and says 'What about The GIMP?', Adobe Photoshop is the industry standard for pixel-based graphics design and photo editing. Most professionals (including myself) are experienced with Photoshop. To retrain oneself for a different program is harder than learning it from scratch.

    If they want to make use of the latest technologies, for example 'Robust URLs' (though maybe they should have invested in a Robust Server), then Linux, sadly, can't keep up. We as a community are being left behind in the Internet arms race.

    Why? I'm sure that someone will develop a Linux/*BSD implementation of Robust URLs and the incompatibility is solved. The Linux community is not being left behind at all, just because that can't use a few CraptiveX controls.

    Fortunatly, I have a few ideas:
    Get a task force composed of Richard Stallman, Bruce Perens, and ESR to develop and debug ActiveX support for Linux. Estimated time: 2 months.


    Bad idea! Supporting ActiveX on Linux is (in my eyes, FWIW) tantamount to giving out your root password. Anything that allows automatically downloaded/embedded code to have FULL ACCESS to my hardware is inherently evil and should be destroyed. And Authenticode? Give me a break...that only tells you who to blame if you get a trojan and not whether the control is safe or not...

    Form an Open Source Browser Committee to create a new, Open Source web browser that supports all the latest standards (CSS, DOM, DNA) Estimated time: 3 months.

    Well, we do have Mozilla [mozilla.org], even though it is not GPL'ed, it's Open Source.

    Push for Perl to be embedded in all new web browsers so that CGI programs can be run on the user's machine, which will reduce server loads. Estimated time: 1 month.

    Should be quicker than that - just provide an interface to the existing Perl implementation.

    Design a new, Internet-ready desktop for Linux, Give it a web browser, probably the new one I described above, and embed it in everything: file manager, word processor, start button, etc. Estimated time: 4 months.

    This is a great idea, which will (if implemented correctly) make the barrier-to-entry much lower than it currently is. Graphical configuration tools are also needed (but don't change the underlying architecture, let those who want to use the console).

    I think that with these items accomplished, Linux will truly begin to shine as a web platform, even for the newest users.

    I fully agree, except for the ActiveX support. Just because Microsoft develops it doesn't mean that Linux should strive to be compatible (else we will eventually have another Windows).

    Disclaimer - My comments do not represent the views of ABC19 WKPT and are my own.
    _______
    Scott Jones
    Newscast Director / ABC19 WKPT
    Commodore 64 Democoder
  • Please take the trouble to read the first paragraph of their article before making such comments. What they want to do is append a signature, something like an MD5 hash that depends only on the document content.

    With Harvest [ed.ac.uk], indexing software that is several years old, an indexing engine that identifies documents by their MD5 signature is easy to build, I've done this. So what these people are proposing isn't exactly rocket science

  • If documents are identified by their digital signatures, the indexing space (of possible signatures) can be divided up among a whole network of redirectors, each responsible for a small subsace of signatures. Each rdirector would have to be replicated, of course.

    All of the required technology is present in Harvest [ed.ac.uk], it just never became popular. My guess is that cool ideas have to be reinvented in Berkeley before the world gets to see them applied at large, see Yahoo! for another example.

  • Seems to me what is needed to make this (Robust 404s) work is a database whereby all the URLs that refer to your pages go first, to be redirected to your page. When you change a page, you notify the database and off-site redirections follow the moved page. If you're careful on your own site, you never kill off a URL but instead have it refer forward .... doesn't work, of course, if your main URL moves but. Of course, this may be what the original article talks about, but it's either 404 or /.ed so I can't read it.
  • Well I would say something like ActiveX is bad is not common at all. But I prefer to address the embedded perl statement, since I program in Perl. Why would I want the perl to run on the client machine? The problem with Javascript, JScript or any other client-side technology is the client. Major vendor refuse to follow any sort of standard forcing me to write 4 different version of the code to do the same thing and detect what platform and browser its running on. Cross-platfrom does not mean to me writing it for each platform then choosing the correct one to run. Server-Side technology such as PERL, PHP, C++ etc... allow me to access databases, generate dynamic code, but still spit out plain ole HTML.

    If you have an idea don't pass the buck and say all these "famous" OpenSourcers need to do this. Go do it yourself...then maybe you won't be so quick to say how easy and quick it would be.
  • I can't get through to the site to see if they address the most common 404 problem I have. The problem is that I sdo a search, find the page, never been then before, but now it has moved. How am I supposed to get this extended data about the page if the page moved before I ever saw it without webs earch engines storing this information too... Sure, Google can do it because of caching, but the others would be out of luck. In any case, 404 can never go away, things come up, things go down, things move. It may be possilbe to fix moving problems, but once a page goes down, it goes down :) Maybe forcing everyone to chmod directories so we get 403s instead, then 404s wouldn't be around so much :)
  • URI's are the generic term; you mean `URN'.

    There are several different proposed URN systems being worked on right now (the document even mentions some, such as PURLs and handles). The big problem with these new specs is that there are a larger number of conflicting requirements dependsin on what you really want to do, so they're unlikely to be able to settle on just one proposal (they've been trying for several years).

    Still, after looking through the `robust Hyperlink' documents, basically all of the old URN specs that I've seen are better than this, so I hope it doesn't distract people too much.
  • IANA 404 research scientist [plinko.net], but... Why can't my browser just open a connection to the web page, and if the heading starts with "404", not load the page and simply flash a warning that the page is not available?


  • Maybe that was their motivation :)

    I just wish I could stop all of these pop up windows.
  • The software seems to pick out the most unusual words in a page. Typos can get quite unusual. One of their papers [berkeley.edu] gives an example that uses "peroperties" as an index word. On the target page, it's clearly a typo for "properties". If the authors of that page ever bothered to spell-check it, that word would go away, and the paper would be that much harder to find.

    (I've already sent them an email about this.)

    Chris
  • actually, the welcome signs are even better

    live free or die
    pay toll ahead
  • The web is great for the sorts of things that lots of people (particularly fellow geeks) are interested in: software, OS issues, MP3s, goat pornography, and Mahir Cagri.

    But what if I'm looking for something specific? The web has been nearly useless to me when I wanted to find information on ancient illuminated Arabic text, or pictures of Microsoft Bob in action (for a parody).

    So do "robust hyperlinks" help me or hurt me? Say I get a dog who has certain unsavory habits with regards to my cats, and I want to look up links about "interspecific coprophagia". Also assume for a moment that the next Korn clone band names themselves "coprophagia". Good search engines allow me to exclude entries that have certain words, but what happens when "robust hyperlinks"-based software assures me that http://www.coprophagiaonline.com/new_releases/ive_ got_the_word_yo.asp is a document on canine interspecific coprophagia based on the presence of several uncommon words...

    ...are we just using new technology to make search engines even more frustratingly inaccurate?

    lexical-signature= "sex+mp3+porn+alissa%20milano+beanie%20baby+jesus% 20christ+coprophagia+free%20pics+online% 20investing
  • ... Not so long as I own the domain name! *muahahahahahahahahhaaaaaaa*

    Then again, the domain name just won't be funny anymore if 404 Errors go away. *sigh*

    --Ruhk
  • I can see it now.
    Porn sites start copying the five words of large portal and news sites and in the event of a 404 for one of those sites you automatically get redirected to the site you really "wanted" to visit anyway.

    Anybody know if this is going to be an actual standard or just something usefull until a new truly robust adressing system gets adopted. It might be on the site but that's sort of unreachable right now.
  • I've just had a really great idea!

    When you desert one host or modify your site, why don't you leave forwarding messages (or 302 responses) to tell people where to find your new content?

    How's that for a great idea?

  • My understanding is that ActiveX is actual binary code, so you'd basically have to incorporate Windows95/98/NT/2000 into your OS in order to succeed at doing ActiveX. In fact, I believe that Linux already has something along these lines -- it's called "Wine".

    Besides, even if you could succeed at making this happen, Micro$oft would be sure to change the code slightly so as to break your version, and if it hurts some of their customers in the process, well what do they care?

    No, this is a fundamentally unworkable plan.
    --
    Brad Knowles
  • Oh GREAT. Just what we need--so much for the whole "you can't 'accidentally' find porn on the Internet" argument. This just throws that out the window, because all a porn site needs to do is hijack the right search keywords and wait for cnn.com to have a broken link.. *poof* millions of users get sent to porn.

    Not only that, but it makes site debugging a pain in the ass.

    Thanks Berkeley!

  • ...just what it says... We can only pray. I hate 404's and i am assuming you do too. I hope with all my heart and soul this works...
  • You like 404s ? Try this one: http://www.g-wizz.net/wibblewibblewibble.swf [g-wizz.net].

    Yes, that file extension is a hint...

  • I think it was back in 1995 when I saw a warez page (on Geocities) which used a feature like that.
    If a visitor couldn't reach the site because Geocities had taken it down, he just needed to feed "paer9udtzk6gn8modfi" (paraphrased, of course) into Altavista to be pointed to the new location.
  • WOW.

    This is a fantastically great idea.

    How long before we get URLS like freenet://contraband_information.html ?

    -k

  • I took a look at this, and it looks quite neat. If Freenet manages to get this right, I hope it really takes off. I especially like the idea of not having to dole out tons of cash or make do with a free web service in order to get something published.


    -RickHunter
    --"We are gray. We stand between the candle and the star."
    --Gray council, Babylon 5.
  • by far the largest number of problems i have with chasing information down (information that was not removed, but simply moved to another location) is because it has been moved OFF the world wide web and into the INVISIBLE WEB, meaning that it is accessible through a query to some database. the thing is, that the final location of these content pieces is generally known in advance to the site that is hosting them - and then the easiest way for users to relocate content would be to attach to it tags that define its location as a function of time.
  • "Live free or Die" - Ironically, seen on a license plate.

    It's worse than that. That state tried to penalize someone for covering the slogan. When someone tried to exercise his freedom of (non)speech by putting electrical tape over the slogan, the state took him to court. I seem to recall the case going on for a long while through several appeal processes where the state tried to force people to spout slogans about freedom. The irony was apparently completely lost on the bureaucrats enforcing the slogan.
  • I'll re-iterate what the AC said, only without the flamebait.

    FREENET [lights.com] is already a widespread term, referring to MANY local public-access community supported ISPs. A quick lookup gives 16 countries with 233 separate groups.

    It is unfortunate that nobody told you of the name overlap before this, but using "freenet" for your web will only generate anger among people already familiar with the community free ISP usage.

    Hmmmm - Is it possible that the socialists (free public access to whatever) and the libertarians (Where were you when they took our freedoms?) have really never heard of each other's Freenet until now? I'm only familiar with the ISP usage, where it is
  • Comment removed based on user account deletion
  • If you are relying on a search engine to "reconnect" the link you are going to have problems.

    Even the best search engines only index a small percentage of the entire web and then they are hideously out of date.

    Not to mention the problems of someone hijacking your unique id by stuffing the search engine with bogus words.

    (Disclaimer - I haven't read the actual article due to it being /.ed so I probably am missing the point entirely!)
  • by code0 ( 122493 )
    This would be cool. No more 404s! That's the problem with the web. I also like the idea that this is being put in open source so that we can all benifit. At least it isn't Microsoft...
  • Your email thing is just to hard to figure out!

    Kidding, And here I doing a left bitwise shift then I looked at my keyboard for a sec. heehee

  • Well, it would be much easier to include a token somewhere (e.g., in a comment) that would be unique to this page. A randomly generated string of 20 ASCII characters would do the job.

    But this is prone to the same highjaking attack as the original scheme.

    A much better solution would be to fetch by MD5: teach search engines to compute MD5 sums of every document they index, then include MD5 sum somewhere in the URL.

    That would also allow for better caching!

  • how about an apache mod that automatically checks the urls as they are sent and changes them. then there'd be no need for any browser modifications.
    infact, it wouldn't have to be an apache mod - any kind of executable that could be cron'd to check links every so often could have the same effect.

    i'm not sure how this would fit in with the whole signature thing. i suppose we could just pgp sign our web pages and but the signing in comments.

    but as with most of my ideas, someone's probably already coded this.
  • >Granted, I can occasionally watch as the Java ads on Slashdot cause Netscape for Linux to crash, but that seems to be the extend of Linux's so-called internet connectivity.

    That problem is with Netscape, not Linux. Yes, I often have problems with Java crashing Netscape, but that happens regardless of whether I am using the Windows version or the Linux version. Point is, Linux is great, Netscape is okay, but Netscape's implementation of Java leaves a lot to be desired in the way of stability.


    =================================
  • I went through the entire site, including the white papers. I looked at the actual Java code. Not a lick of ActiveX anywhere. Whomever posted this anonymously is either smoking crack, working for M$, or both. Robust Hyperlinks is pure Java.
  • I'll explain the 2 that come to mind right away:

    1) Growing sites that may change servers, or domain names (add/on to dedicated URL, change domain name for legal/incorporation/buyout reasons), will see the massive traffic bleed they suffer until everyone realizes their site has changed virtually disappear. Yes, putting a redirect page on your "old home" may help, but for things like RSS file addresses, and other external connectors, which may have an effect on your site, this is a problem.

    Ultimately, of course, for this to TRULY work there needs to be technology like this built into not only browsers, but virtually any software that uses HTTP communication (XML parsers, bots, spiders, etc).

    2) I want to start offering streaming video on my site, and the single biggest obstacle for doing that is COST. Bandwidth, unless you OWN the pipe, is NOT cheap. I can (albeit in a somewhat underhanded fashion) set up a script to register, say, 24 different "free site" pages with the content to be the "correct" version of my page once an hour, and, unless the content is in VERY heavy demand, essentially have a free method of streaming video on my site.

    Egads, I'm already feeling dirty about what I just said. Okay, maybe that's a little TOO unethical. But I guarantee someone will do it.

  • First, I did try to access the link in the article, but the berkeley server appears to be down or slow.

    That said, the concept seems iffy. Based on the above, the fact that it works in all existing browsers, suggests to me that the form of the URL is the following:

    >a href="http://robusturl.server.com?http://my.outdat edsite.com&keyword1="whatever"<

    Namely, that anchors that use this URL will be sent to this server (apparently fixed in place), then redirected either to the working page, or to the appropriate search engine results. This means that the robust server will be running scripts. While I don't believe that the indent as described here would be to catalog all matches, all you need is one unscrupulous company that uses this and can now trace where you are and where you are going to quite easily with a bit of modification. I really don't like this potental, and personally I'll take a 404 anyday over potental privacy problems.

    On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".

  • I'm pretty sure URL's where just a makeshift URI and some day the IETF was going to figure out how to do URI's right. Am I wrong?
  • This has been discussed to death on our mailing lists. Basically our view is that if Freenet is as popular as we hope it will be, then "Freenet" is the perfect term for it, it is possibly more deserving of the term than the other projects which currently use it. If, on the other hand, Freenet is not a success, then this won't affect anyone and it won't matter.

    --

  • Some 404's [attaway.org] are just a way to pass time. Sometimes I go from site to site looking for pages that don't exist just to see what happens.
  • ...poorly.

    anyone who's looked at the http spec for more than a millisecond will see that it already handles this case quite gracefully with the 3xx series of responses, including:

    301 Moved permanently
    302 Moved Temporarily

    I think /. even uses these once a story has been archived.

  • A very valid point:
    Will this still work even if someone tries to add lots of context words to the search engines so it comes to their page instead?

    Perhaps one of the keywords should be the previous URL? In fact, perhaps a better solution would be a new Meta tag of "Prev-URL" (or something similar) that search engines could look at and use to update their databases?

    On an anecdotal note (or is that redundant?), I remember searching once, for the web site of a Land Rover owners club (I think it was Ottawa Valley Land Rovers in Canada) and was directed to a auto parts store in Australia -- turned out that the web pages had the names of lots of auto clubs in meta tags. The idea was to get people searching for the clubs to go to the store's site.

  • I guess for those of us who don't want to make that move just yet, we can have our 404 document, say, "Sorry I am just a dumb server and don't know where the page has gone." Come back later when I get smarter.

    send flames > /dev/null

  • This sounds like a good idea but you'll still see plenty of 404s if this gets into action.
    Why, because 90% of 404's are a result of the page been taken down completely (especially if it's on geocities or xoom or some free provider).

    A program that you could install for your browser like NetAccelerate (loads links off current page into cache when the bandwidth isn't been used) but simply loads the links far enough to detect a broken link or not would be very handy. Although it wouldn't solve any problems it would alteast stop you from getting your hopes up when you've finally found a link to a page that claims to be what you've been searching for for an hour.
  • Ummm, .jar files do *NOT* indicate JavaScript.

    Java != JavaScript, people!

    --Earl
  • <ASSUMPTION>The 'word description' is going to be capable of describing a page adequately, and uniquely, per page, like an MD5 digest, rather than a simple text descriptor. The latter would just be silly.</ASSUMPTION>

    I can see some value to this if the page is static and likely to be relocated, rather than rewritten, or deleted, but how is this going to work if the page is, dynamically generated from a database, and the whole site is prone to reorganisation (like what Microsoft's seems to be).

    It might help more if there was a way to uniquely identify snippets of content within a page, and provide a universal look-up scheme based on unique fingerprints of these 'snippets'. Although I'm sure that pouts it straight into XPointers territory, isnt it...?

    And an 'opt-out' system is necessary. There are lots of reasons one might want particular content to be transient.

  • Yes, but thats only one side of it, the pull side. Eventually systems will evolve to the point where a push model exists along-side the pull model for robustness. Unfortunately data structures change, companies reorganize, and no type of pointer will really ever suffice. It will have to change at some point. The robustness of a push model will facilitate these scenarios. It's not a question of if, it will happen, eventually.
  • And the logical next step is inter-server communication. At some point we'll end up with a defined way for servers to communicate with each other, so that when an object is moved or removed, the server that "owns" that object can notify other servers that own objects with links to it. The worse case would be better than what we have now, if the object has been removed, the other server could mark it as unavailable and notify the site owner that it needs to be updated. Some site management utils already have a process for checking broken links (pull model), we need a push model.

    This will also allow site owners to see who's linking to them, but obviously it should be utterly transparent (so that you can still link in private, but then you wouldn't get updates).

    At some point we'll get there, it's just a matter of time. Questionable schemes such as the topic of this story are just a kludge, and probably not worth the effort.

  • I am getting a 404 not found on the sites' homepage.

    PoC

  • Well it sounds like an interesting concept bu unfortunately I can't get to the site already. Surely it's too soon for the /. effect?

  • On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".

    From freashmeat you can see that the appropriate file for it is called Robust.jar, so I think you're probably correct there :)

  • This sounds great - practical solutions to a real problem.

    OTOH, there are already far too many sites where there just isn't an accessible URL anyway. Some are frame-based, some are dynamically generated. They all have the problem of not being bookmarkable (from within the browser's normal "Bookmark Here" function). Some do try to solve this though, by separately publishing a bookmark that will take you back to the same content.

    If this idea is to really work, then it needs to be supported by dynamic sites publishing their Robust Hyperlinks, even for pages that don't have a "traditional" URL to begin with.

  • There is a good paper [w3.org] by the man himself on the problem of URL persistence.

    Definitely a heads-up for anyone looking for a quick technical fix to the problem.

  • Simply having a search string included seems a bit of a kludge to me.
    What about it the link tag in the html also contained the date/time it was created. This way the browser would now how old it was. It the browser sent this to the server as a header then if the server couldn't find it it could check some database or whatever to see what the directory structure was like at that time and work out what redirect to use. If bookmarks also contained this date/time then surely the server could tell the browser to update the bookmark (after warning the user, of course).

    This would be pretty cool on an interactive site where the server could rearrange query strings or whatever if the serverside scripting had been given a big overhaul/re-organization.

    Basically, surely the server itself, and not some search engine would best know how to fix a broken link and it would only requires a couple of new headers and should be easy to implement at least on the client side.


    ------------------------------------------------ -
    "If I can shoot rabbits then I can shoot fascists" -
  • The situation:
    • My page has been moved for some reason or another.
    • The old page no longer exists at all, i.e. I don't have a redirect on it. (side note, surprisingly enough, many providers will be happy to keep your redirects around for an almost infinate length of time. It's not like they take up a lot of space or bandwidth.)
    • I built the first page with a specific set of keywords and I kept those keywords on the new page
    • The search engines FINALLY got around to spidering/accepting my site. (Note that it can currently take up to 6 months to be spidered and Yahoo may not reaccept you site.)
    And this allows us what?
    • Well, it means we have to make sure we register with all the possible search engines, including the ones we usually don't care about.
    • It means someone will come up with a "find that 404" search engine that you'll have to submit to as well.
    • Meanwhile, people will notice that you've moved and will create redirect porn pages with your keywords and register them with the 404 search engine.
    • Microsoft will add something to Front page to create default keywords that send your 404 to microsoft.com
    • The new stardards are not part of the official Web Standards so Mozilla will not support it and w3.org will barf errors out about your HTML code.
    • Someone will figure out how to use this technology so that they can set up emergency /. effect mirror sites.
    • Someone will get smart and figure that trick out really quick and take advantage of it."I'm sorry, the page you want has been slashdotted, welcome to geocities."


    -----
  • Alexa [alexa.com] has had a solution to 404 errors for years. They have a large archive of the web, and will give you a copy of a deleted page. Unfortunately, the Alexa client has ballooned into a combination advertising delivery system and portal. They're just now adding Amazon's shopping system. It's turning into a piece of bloatware.

    Alexa also collects detailed information about what you look at with your browser, although they of course claim to use it only in the aggregate.

  • This makes one big whopper of an assumption: that the web page has moved and still exists somewhere. Well, the major cause of 404s that I know of is web sites simply going away.

    So you get a 404 and you want to use a search site to find where it went? That's fine if it's been long enough since the move to give the web crawlers time to find it... there's a lot of web space out there to search!

    But here's the good one: what if someone decides to hijack your web site by simple keyword spamming? All they have to do is set up their own page with the right keywords, get it indexed, and anyone who uses an "old" link will get redirected to them instead! And if web pages can be defaced, they can be removed, too, thus forcing the 404 and the search!

    Better yet, use wholesale keyword spamming to get all those "dead" web pages pointing to your e-commerce site!
  • Yeah, like Userfriendly [userfriendly.org]. I love their 404. [userfriendly.org]

    You're in the midst of nowhere
    a droplet in a mist,
    you musta typed in something weird
    this URL, it don't exist.


    kwsNI

  • ... as in, "It's a good idea, but!" As has been pointed out, there are potential privacy issues. For the "average" user, though, I don't think this is a terribly big deal. What becomes a problem, then, is access to the Robust URL redirector (as I understand it from posts, the site seems to either be simply down, or a victim of the /. effect). Since all Robust URLs have to pass through the redirector, what happens if the redirector is down? What happens if the redirector is unreachable?

    Furthermore, simply feeding keywords to a search engine doesn't guarantee finding your page quickly, or even finding it at all. Designers would have to include unique keywords - words that might not even apply to their page - so that a Robust URL search would turn up only their page. Not only does this bloat HTML code, but it also confuses people using search engines in the usual way.

    Certainly a good idea, as many people hate 404s (bah, they're just a fact of life), but it seems like it's got more than a few bugs left in it.

  • by Sanity ( 1431 ) on Thursday March 02, 2000 @03:58AM (#1232562) Homepage Journal
    I am working on a project that will do something like this - and a whole lot more. The primary intention is to create an information publication system similar to the world wide web, but where censorship is much more difficult or impossible. However there is more to the system than that, it incorporates intelligent decentraised caching making it much more efficient than the world wide web, and also intelligent mirroring meaning that information on the system will never be slashdotted as this site appears to be! The homepage may be found at http://freenet.sourceforge.net/ [sourceforge.net]. We are looking for testers and developers right now in preparation for our first release which will happen in the next few weeks.

    --

  • by SimonK ( 7722 ) on Thursday March 02, 2000 @04:26AM (#1232563)
    You're not wrong. There is in fact a proposal about the form and resolution of URNs (which are location independent) from the IETF. I don't know its status.
  • by Hard_Code ( 49548 ) on Thursday March 02, 2000 @07:37AM (#1232564)
    As far as I can tell this scheme relies on checksums of the static content of web pages to find the correct web page. So what does this do to dynamically generated content?

    Also, somebody else mentioned that they had a project on SourceForge which was basically like the Web, but in a completely distributed manner. This makes a lot more sense to me. The notion that my bits must cross a continent to retrieve data on a certain TOPIC seems a bit archaic. I shouldn't know or care where the data of the topic is stored...I just want it. Also, having a distributed web like this, as the person suggests, will make it a lot harder to invade privacy or censor material.
  • by UnknownSoldier ( 67820 ) on Thursday March 02, 2000 @03:46AM (#1232565)
    Will this still work even if someone tries to add lots of context words to the search engines so it comes to their page instead?

    Don't mean to be the Devil's Adocate, it is just my game programming / design skills kicking in. Whenever someone adds a usefull feature, you must look at the ways people will try to exploit this.

    "Live free or Die" - Ironically, seen on a license plate.
  • by rambone ( 135825 ) on Thursday March 02, 2000 @04:32AM (#1232566)
    Like any search, the search that tries to reunite your 404 error with the correct address is going to be wrong quite often.

    Frankly, I'd rather just get the 404 than waste time digging through erroneous links.

    By the way, there are hypertext systems that address this issue in ways that actually solve the problem - the now defunct HyperG system was very intelligent about redirecting requests.

  • by EricWright ( 16803 ) on Thursday March 02, 2000 @03:46AM (#1232567) Journal
    From the freshmeat announcement [freshmeat.net], you can ftp it from here [berkeley.edu]. I was able to connect just fine...

    Eric

Utility is when you have one telephone, luxury is when you have two, opulence is when you have three -- and paradise is when you have none. -- Doug Larson

Working...