OSD Database Downloadable As XML 46
Providing a list of applications stable enough to recommend to non-gurus is a worthy endeavor, so it's great to see this project slowly becoming more useful. There are gaps to plug going forward, though. The default text strings can be ambiguous, and the information provided on individual projects doesn't always give much to go on. For instance, look at the Mosix page, where you'll find that "This product has no Latest version yet," "This product doesn't fix anything," and "This product is not like any other," but no email contact information for Mosix authors. Similarly ambiguous pages are provided for Gnucleus and OpenOffice.
I exchanged some email with Steve on the state of the entries in the database, and asked about how the missing information could be filled in. He told me that while project maintairers (and site administrators) are the only ones who can update entries, users can contact the administrators of individual projects directly through the OSD site to suggest changes or clarification.
"We're trying to make things easier for the maintainers. ... I think there is a serious lack of product maintainers to help authors," he said. To that end, Mallet may soon provide example projects for software authors to emulate, and is in the early stages of a unified project-listing tool which would update listings on various web sites. Given the number of sites that offer downloads or simply track various software projects, that could be a boon to developers.
Hopefully, this will turn into the sort of tool that you can show a boss or teacher to answer the bugaboo of Free / Open Source being unready for prime time (or just overwhelming and undifferentiated).
Unreadable (Score:1)
Somewhat amusing (Score:2)
Hmmm, where is CVS? (Score:1)
Re:Fixed XML Files (Score:1)
Re:embedding ODP (& OSD) content in your web site. (Score:2)
Someone could leverage XML RDBMS like DBXML [dbxml.org] which is based on the "XML:DB" [xmldb.org] standard.
If enough people are interested, I could try downloading dmoz myself and "massage" it into some dbxml store on my own system and build a web-based interface to query it, I've just been really busy with other stuff lately though.
If you happen to read this and are interested, shoot me an email at valmont@wildstar.net and we can take it from there.
Re:That document is not valid XML! Standards? Anyo (Score:2)
But this document is not even well-formed XML. In other words, it is not XML at all. It's plain text with some tags.
For details on what it means for an XML document to be well-formed or valid, see the spec at the W3C [w3.org]
embedding ODP (& OSD) content in your web site... (Score:4)
Is there a PHP class or something that everyone's using for this? I saw a couple offerings at freshmeat that relates to ODP and some some tools and code are here [dmoz.org], but I'm curious what most people are using.
W
-------------------
Re: (Score:2)
Gripe -lousy directory due to dictatorial policies (Score:3)
The real shame was watching the categories I created with TLC lie fallow for months and months without any one to update them.
With inane policies like these is it any wonder that this directory lacks up-to-date information and is in general disarray? Me thinks not.
I was thinking of the immortal words of Socrates, who said, "I drank what?"
Re:Somewhat amusing (Score:3)
The OSD is not meant to be a definitive archive. It's mission is to provide a resource for users. I think it has a done a good job in this regard.
Re:General XML question from a newbie. (Score:1)
Re:Very good point (Re:perhaps a bit off topic) (Score:1)
Sure. The point is that XML is only a file format. The data it represents is vaguely semi-structured. Of course, one needs a query/update language on top of that (and some other good stuff) on top of that to make a database---there have been many proposed. In the relational world there is no standard file format. One could represent relational data in XML pretty trivially, though.
- XML Schema is also very poor on data modelling, because it has no separation between a structural schema (which element goes inside the other) and a semantic schema (what each element means, when placed inside another)
DTDs are problematic because they just provide a grammar for the structure of an XML document. XML Schema tries hard to provide a strong notion of type. For example, I could define a type called Person and let several tags, say manager and employee both have that type.
- How do you represent shared resource in XML; such as an author of several modules ?
Sort-of. It's hard to represent graphs in XML. Unlike semi-structured data, which is a graph, XML is, at its core, a tree description language. One can define graphs with IDs and IDREFS, but it's a pain.
- How do you distinguish such an author for another author with the same name ?
This seems like a key problem---you'd have the same issue in the relational world. To separate two authors with the same name you need more information to make a key.
The biggest problem I see for XML as a data description language is that it's way too complicated for what it does. To represent semi-structured data, which is what we seem to want here, all you need is a simple graph description language. XML, however, does this with three types of edges (subelement, attribute, and IDREF) and has other features (eg., mixed content) that are hard to figure out what to do with from a database perspective. (From a document description point of view things like mixed content make a lot of sense.)
The other problem, one induced partly from the inherent complexity of XML, is that the standards that are growing up are horrendeously complicated. For example, the 300 page monster that is XML Schema [w3.org].
Thank you (Score:1)
--
Re:That document is not valid XML! Standards? Anyo (Score:2)
I agree your way of structuring the data is better, but I would add that many of the data items should be attributes. I mean you have elements and attributes available, why not use both? It would have made things much faster and cleaner to keep up to date and ensure all parsers can validate it quickly. Can you really see a SAX parser making use of that xml? And a DOM parser would consume an enormous amount of memory needlessly. Oh well, I'm sure they were in a big hurry to get this info available. And it'll get well cleaned up in the next few months.
Wrong directory (Score:1)
The OSD isn't the same as DMOZ. Similiar idea though.
I also share your concern over DMOZ. I was declined as an editor for an editor-less category that I know inside and out.
Re:General XML question from a newbie. (Score:2)
Some are available for both Java and C++.
Sorry I don't have a more detailed answer to your question but I'm sure something can be built from the Apache XML stuff.
Mike [goingware.com]
Re: (Score:1)
The post you recommend is being AC-posted to practically every /. thread these days, practically unmodified. There's a similar version dooming & glooming *BSD.
IMHO, this is just tomorrow's goatse / body thetans / frist p0st, and the troll moderation it got is all that it deserves.
--
Very good point (Re:perhaps a bit off topic) (Score:1)
but xml was not designed to replace databases
XML doesn't replace databases. It _can't_ do, because it has no query mechanism. If you want to compare something to an RDBMS, then you have to look at the combination of XML + XPath. This is actually quite a good choice for some small systems (although it has no large-volume performance).
What's a more important issue (and this is one of my personal hobby-horses) is to separate the data model from the serialisation. XML does serialisations, and it does them quite well. It's poor though on data modelling. XML Schema is also very poor on data modelling, because it has no separation between a structural schema (which element goes inside the other) and a semantic schema (what each element means, when placed inside another). As a result, it's possible to serialise XML documents to represent "One view of the data, for one context" but it's really not possible to build an XML representation of a large data modelling problem for anything beyond the trivial.
How do you distinguish such an author for another author with the same name ?
Now (obviously) people have built XML solutions that work around these problems, but XML itself doesn't support them. It doesn't have a portable solution to such commonplace problems that a generic parser (like SiRPAC) could understand, and it doesn't support the development of particularly good solutions to them.
Teaching RDF, one of the hardest (and most important) lessons to communicate is that there's an underlying data model, and there's a serialisation, and that the serialisation is only one usage-dependent view onto what ought to be a much better structured and flexible internal model. For RDF it is, but for XML it isn't.
Re:That document is not valid XML! Standards? Anyo (Score:1)
That's some of the worst XML schema design I've seen in years (OK, I know it wasn't yours).
Secondly, which pair of moronic moderators moderated this down as a troll ?
Re:Very good point (Re:perhaps a bit off topic) (Score:1)
XML Schema tries hard to provide a strong notion of type
Although that's a valid point (and I haven't written DTDs in over 2 years, in favour of schema) it's not the issue I was talking about. Look at the Infoset [w3.org] draft or the recent Processing Model [w3.org] workshop. You can barely tell the difference between reading infoset and the syntax spec, because XML just doesn't put enough distance between semantics of the content and its representation in a document.
XML doesn't "represent" anything. It never has done, it never will, and all attempts to pretend that it does will end in failure. XML (and XML Schema) is a low-level transport and manipulation platform, but it doesn't have the ability to do any form of abstract representation. Its structure and implied semantic meaning are so closely fastened together that it's impossible to squeeze a gap between them. "Representation" is the act of stretching this gap, between structure and implied meaning, so as to infer a higher level meaning.
The problem is fundamental to XML, and won't be fixed by tools at this level. There's no abstraction in XML; any attempt to indicate semantics also drags along its structural baggage, because that's the only way XML-Schema allows you to work. No number of "sideways" solutions to this; namespacing to allow parallel co-existence, BizTalk to allow sharing of schemas, will fix this - XML just doesn't offer any "upwards" in a semantic direction.
To separate two authors with the same name you need more information to make a key.
Again, I agree with you in general, but that's not quite the issue I was thinking of. Clearly we need more structure to distinguish them, although in fact we don;t need any more information (RDF can do this entirely within the document structure, with no need to start "allocating author indexes" or similar).
The symptom of this problem, in the XML world, though is an over-dependence on flat text comparisons. It's like search engines that only compare at the text level and can't tell "goat sex" from animal husbandry or a Slashdot Troll. Because XML has nothing useful beyond the text node, that's what gets used. If it's easy to do it all just by comparing author names, then that's what lazy coders do. Disambiguation between resources like this needs a simple and lightweight mechanism, because if it isn't, no-one will use it. RDF manages it with rdf:resource and rdf:about attributes. In XML then you'd have to build some identifying system at the application level (so a generic parser can't understand it) and impose its use on your data. No wonder people stick with just using the names and ignoring truly identifying relationships with resources.
ID & IDREF are just broken. If you want to do it that way, build a proper architecture for doing it and join the RDF WG.
Tell me about it 8-(
Compare the XML Schema spec, the SMIL spec, and the even more gargantuan MPEG-7 spec. Now take a look at DAML [daml.org] and see that complexity can be described, without needing a spec like a phone book.
Re:XPath (Score:1)
XPath is a partial query language for XML - it can read, but it has no way of updating the document.
There's also the issue that XPath is very much an XML tool, with a tight binding between semantics and structure (which is the whole thing that I'm saying about XML in the first place). If you have a graph represented in XML, then it's hard to write XPath expressions that can traverse it. If you have RDF stored in XML (which has several possible serialisations for the same semantic content) then it's possible to write XPath that expands these, but it's hard, error-prone, and generally unworkable.
There's still a lot of thought out there that XSLT can translate magically between schemas. Some groups see XML Schema as improving this (Hunter & Lagoze, WWW10 [dstc.edu.au]). Although Alison Cawsey's paper from WWW 9 [hw.ac.uk] shows just why this approach doesn't work. I've abandoned my own work in this field for similar reasons; even though I managed to build something workable, I just never trusted it to be reliable.
Re:embedding ODP (& OSD) content in your web site. (Score:2)
Someone should create an HTTP interface to a dmoz XML database, which would allow users to place XPATH queries which would return XML nodesets to the requesting client.
That's an interesting idea, but it's not quite the same problem. You describe a good solution to a "pull" scenario, which is great for queries instigated by a client, but it's not as good as a "push" for providing a newsfeed from a site.
I'd suggest RSS 1.0 as a good format to produce (possibly based on the same XPath-based pull that you describe). Once it's in RSS 1.0, then it's trivial to make it appear on any number of sites, or to aggregate it into other more generalised newsfeeds.
For implementing the "pull" side, then XPath encapsulated in SOAP is an easy way to build clients, and not too hard for the endpoint server. I've been doing this recently, so that a UI component (DHTML in Javascript) could selectively retrieve pieces of a big taxonomy document that was >MB in total.
My one concern (and my own personal bias) is that I see many of these items as running off the limits of what XML (and XMLDB) is good at, and being better handled in RDF. Certainly RSS 0.91 (which is XML) couldn't do this, but RSS 1.0 (which is RDF) could easily. Of course, that then makes XPath unworkable as a query language and there's not yet a stable "RDFPath" equivalent for RDF.
I'm also interested in working on this. Anyone else, drop me a mail if you are too.
Ready-made database of licences... (Score:2)
The more important one would be for the licencing info- I was about to face the task of building up a database of (L)GPL'd applications manually. I'd say they've definitely saved me some effort... sure they're not all there but it's a start... thanks, guys.
On the topic of the GPL, anybody notice they've licenced this XML document under the GNU Free Document License? [gnu.org] I can see the press release now: 'Argh! Viral pac-men documents!'
How do they validate the entries? (Score:3)
I use my computer to write music, so I went to see their listing of stable audio software. The only things listed there are a crossfade plugin for xmms, GLAME and a soundfont editor. I've tried GLAME. Listing it as "stable" is a joke. And to top it all off, they have these things listed multiple times in categories they shouldn't be in. I'm fairly certain a soundfont editor doesn't qualify as "sound synthesis".
I want to stress I'm not trying to discredit the GLAME team or any of these software packages. But what good is OSD if it's categories are a mess? I might as well just use freshmeat.
c.
Re:perhaps a bit off topic (Score:1)
Re:perhaps a bit off topic (Score:1)
Just write a script to import it into your choice of databases.
Great idea, but maybe not best approach. (Score:3)
huh? (Score:2)
Sounds to me like the point of this project is a global infinite loop. I don't know much about this, but if that's what it is...count me out. I have it bad enough as it is. (I run windows
chances are, this is a joke.
Re:Hmmm, where is CVS?--answer (Score:1)
We haven't talked to the CVS folks yet.
-Steve Mallett of OSD
New XML release. (Score:1)
-Steve Mallett of OSD
Re:perhaps a bit off topic--ANSWER (Score:2)
Re:embedding ODP (& OSD) content in your web site. (Score:2)
Steve Mallett of OSD
Re:How do they validate the entries? (Score:4)
Ultimately the info is open for catching bugs like this one. If it is a bug it will get weeded out.
-Steve Mallett of OSD
unified project-listing tool (Score:5)
-Steve Mallett of OSD
whatever (Score:1)
Re:Ready-made database of licences... (Score:1)
Anyone else notice that when an article on something related to (even remotely) Microsoft, or some other favorite whipping boy,
I'm sure there's a significant insight into the
Re:That document is not valid XML! Standards? Anyo (Score:3)
The authors of the XML file has written it like this:
<group_name></group_name>
<--properties of group-->
<group_name></group_name>
<--properties of group-->
whereas a more clever structure would have been:
<group>
<group_name></group_name>
<--properties of group-->
</group>
This way the different groups would have been separated in a more logical manner, and it would be "easier" to parse the information in the XML file.
Re:Ready-made database of licences... (Score:1)
__________________
Re:why modded down? (Score:1)
Re: (Score:1)
perhaps a bit off topic (Score:3)
-
sean
Re:embedding ODP (& OSD) content in your web site. (Score:1)
(I really like being able to do XPATH querying documents instead of insane SQL querying documents disguised as records.
XML? (Score:2)
Offtopic but interresting. (Score:1)
http://www.fsf.org/philosophy/luispo-rms-intervie
About stallman himself,
"A short list of his coding accomplishments would include Emacs as well as most of the components of the GNU/Linux system, which he either wrote or helped write. "
Re:perhaps a bit off topic (XML) (Score:2)
Fixed XML Files (Score:3)
http://www.o-r-g.org/~cheshire/osd/osd.tgz [o-r-g.org]
Also, a search engine (Cheshire2 [berkeley.edu]) running over the XML with a Very simple interface/display is available at:
http://www.o-r-g.org/~cheshire/osd/ [o-r-g.org]
Enjoy =)
-- Azaroth