Australian Stats Agency Goes Open Source 51
jimboh2k writes "The Australian Bureau of Statistics will use the 2011 Census of Population and Housing as a dry run for XML-based open source standards DDI and SDMX in a bid to make for easier machine-to-machine data, allowing users to better search for and access census datasets. The census will become the first time the open standards are used by an Australian Federal Government agency."
XML? that's so 1990 (Score:5, Insightful)
I'm perplexed why people continue to use XML when there is YAML. What is it that makes XML so attractive as a durable format? it's not human readable in a practicale sense, and YAML very much is. Since it's delimeters are comlicated and variable, It's harder to parse in ad hoc ways than yaml (line and white space) which means that for rapidly extracting things there are no shorcuts to instantiating a whole document. It's hard to grep. And both formats can fully do the other ones job so they are interchangeable.
Re:XML? that's so 1990 (Score:5, Informative)
To see how clean YAML is to reads for humans and to parse by machine look at a Sample Document [wikipedia.org]. And here's something truly impressive, a Yaml Quick reference card [yaml.org] written entirely in YAML itself. Not only is it a marvelously short card, it's human and machine readable. It's a superset of JSON too.
Re: (Score:2)
Re: (Score:2, Informative)
Interesting. How does YAML handle validation and user defined grammars?
Multiple ways of varing stringency. For the simple case you can define types (.e.g. floats, ints, or user defined types). For the vast majority of uses that's all you need for validation. Now if you want to define a schema there are several different ones that are used. Kwalify and Rx are two. Finally, there are YAML 2 XML converters. So you can just convert the YAML to XML and use your favorite XML validator. Thus the validation itself other than the types is not baked into the definition and thus
Re: (Score:1)
Great for human readability. Terrible (due to some python-like indent rules) for humans to add content to.
Meanwhile, XML might not be quite as nice as YAML for reading, but it is easier to figure out where you made a mistake, assuming you're pretty printing it (but the best thing is that pretty printing it is unnecessary).
Re:XML? that's so 1990 (Score:5, Insightful)
Great for human readability. Terrible (due to some python-like indent rules) for humans to add content to.
Oh come on man. This is like the ancient discarded whitespace lament about python. I was once like you before I started writing python. Then I saw the huge huge light of why white space indenting is so great. I could explain but I'm not sure I could have convinced even myself before trying it.
Bottom line. it's freakin easy to get the white space right and any decent editor with context sensitive tabs does it for you. emacs, vim, bbedit, eclipse. Is there any that don't?
This is a NON ISSUE
Meanwhile, XML might not be quite as nice as YAML for reading, but it is easier to figure out where you made a mistake, assuming you're pretty printing it (but the best thing is that pretty printing it is unnecessary).
Ha! you make me laugh. So now we need special editors and printers for XML reading. Were we not just complaining about white space. Now you pretty print to put perfect white space in XML?
Re: (Score:2)
Notepad, which is so often used by the technically non-clueful. Of which, I seem to work with a few.
Of course, you should use a real editor. This somehow doesn't prevent people from using notepad b/c they don't know better, or using vim but not knowing HOW to use vim and still we lose all indenting.
and I never said you needed a special editor for XML. Not even that you need one for JSON or YAML.
Pretty printing isn't MANDATORY for XML... which is really the point. With it NOT necessary, means you can fuck up
Re: (Score:3)
When is it ever desirable for indentation to not match the logical structure of a program?
The only possible reason I can come up with is if you're intentionally attempting to obfuscate your code.
Re: (Score:2)
You're getting close, it's definitely to allow an intentional expression and it's going to be a bug for all people who use white space to express more than just the {}'ness.
I wonder why a language has to enforce something that could have been enforced by the editor for those that value it.
Strictness on this is what kept back so many perl coders and stopped python from ruling the world.
But... I don't mind... and python-ites prefer white space to world domination, so thats good too!
Re: (Score:1)
bullshit, idiot. indentation is simple.
Re: (Score:2)
I'm with you on the python whitespace thing, but for YAML it's different. We're not talking about writing code here. It can be tricky to get the whitespace right but it's a damn sight easier than learning and reading XML syntax. Remember that 99% of the time machines process these files and we only care to make reading easy (where YAML whitespace is a non-issue) and human editing easy, where it isn't too bad. Composing from scratch by hand isn't really something you're going to be doing with YAML (or XML).
Re: (Score:1)
Great for human readability. Terrible (due to some python-like indent rules) for humans to add content to.
Apparently you are not aware that YAML, being a superset of JSON, can be written entirely in JSON, or a mixof the two. in JSON you don't need to use white space. So you use the white space in YAML when it makes sense (nearly always) and when you get into absurd edge cases then you toss in a little JSON syntax when apropos.
So sorry, you just don't have a case to make here unless you want to say something bad about JSON as well.
Re: (Score:2)
I use JSON (and occasionally YAML), but only for data interchange formats where I don't expect a human to need to modify it.
Yes, I am aware that JSON and YAML are largely related. And I a few times tried to write up files in JSON, just as a mockup of my intended data structure. Yes, I used a real editor with proper tab indenting. It still got to be pretty unreadable. I use Data::Dumper whenever I want the data format to be as explict as possible, but only for debugging.
But it's so much worse than that. XML
Re: (Score:2)
Does anyone else feel like they just looked at some COBOL source when looking at the YAML example?
Re: (Score:2)
I'm perplexed why people continue to use XML when there is YAML. What is it that makes XML so attractive as a durable format? it's not human readable in a practicale sense, and YAML very much is. Since it's delimeters are comlicated and variable, It's harder to parse in ad hoc ways than yaml (line and white space) which means that for rapidly extracting things there are no shorcuts to instantiating a whole document. It's hard to grep. And both formats can fully do the other ones job so they are interchangeable.
I would actually dispute all of your comments, but picking up on the last point in bold, one of XML's key features is "mixed content [w3schools.com]", which is apparently (according to http://yaml.org/xml.html [yaml.org]) not possible in YAML.
YAML is not for standards (Score:2)
XML is perfectly suitable for long term data storage and exchange. You have namespaces, schemas, and a millions of tools to handle it.
YAML is OK for storing configuration data. It's not even that good for anything else.
Also anyone who "parses in ad hoc ways" deserves to be slapped in the face.
Re: (Score:2)
I'm perplexed why people continue to use XML when there is YAML.
Can you point to me, please, to the reference on how one can define in YAML the equivalent of a schema?
You know, to act as the "contract" for the data exchange protocol... extensions (to allow 3rd party custom data sections) and namespaces (to isolate the 3rd party extensions that I'm not interested in) would be a real bonus.
Re: (Score:2)
I'm perplexed why people continue to use XML when there is YAML...
The real answer is: who cares? They're both easy [enough] to parse data formats. It's about as interesting as arguing about what your favorite editor is and why. Or your favorite database. Everyone knows the ins and outs, and nobody cares (except maybe you and the person you're arguing with). We all have libraries. We all have parsers. It really doesn't matter.
The trivial answer to your question is: because YAML is very new in the grand scheme of things. And it's not so different that it's really in
The First Time? (Score:2)
Really?
http://xena.sourceforge.net/ [sourceforge.net]
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
http://naa.gov.au/about-us/director-general/index.aspx [naa.gov.au]
Re: (Score:1)
Re: (Score:1)
British Commonwealth Apples & Oranges (Score:3)
Australia is openly embracing census data and enhancing it's availability.
Canada's government is going out of its way to prevent census data collection.
Re: (Score:1)
Seems logical - as a Tax Payer, the data should be available to me.
Although I hope its not leveraged too heavily by the commercial sector.
Re: (Score:1)
Take action to change that!
http://www.liberal.ca/open/ [liberal.ca]
The Liberal Open Government Initiative will:
* Immediately restore the long-form census;
* Make as many government datasets as possible available to the public online free of
charge at opendata.gc.ca in an open and searchable format, starting with Statistics
Canada data, including data from the long-form census;
Re: (Score:1)
Australia is openly embracing census data and extending it's availability.
Canada's government is going out of its way to extinguish census data collection.
FTFY
Re: (Score:2)
but the govt still thinks sharing is bad
This is why an Australian invented Wikileaks... I mean... "information wants to be free" and such...
and open source you share your code freely to help everyone
Hey, where does it say that they'll share the code? TFA quote:
with the ABS directing software developer Space-Time Research to utilise the standards for both input and output of all data collected next year.
So: ;) )
1. it is the data that will be shared (govt takes preemtive - still legal - actions against Wikileaks?
2. the guys that are doing the software is Space Time Reseach [spacetimeresearch.com] - the way I know, a bit far from a open source establisment (note: I have no affiliation with them)
Re: (Score:2)
it looks like they want all data up, the only data not collected is names and addresses, you can use any of the questions to define your sets.
"DDI and SDMX are good at describing things, and we're testing the very notion that you can actually consume this stuff and make it discoverable metadata for your search engines."
"We definitely want to see who's keen, who's interested in statistics and metadata, open data, data linking and what people can do with it as well."
Meanwhile, in other agencies and private (Score:2)
Closed file formats are an "innovation" of Microsoft and similar companies. It's really any different from the bastards that write unreadable code in an attempt to provide job security.
hopefully in the future some of the practices of elements of Microsoft and man
74% of people don't believe in statistics anyway (Score:1)
We should find out what percentage of the population thinks that this is a good idea....
Hope they've studied Munich's woes... (Score:2)
...and here's why:
It's official - Munich Linux migration is "dead - abandoned in all but name." - Linux
Yes, you read right: "Dead - abandoned in all but name". [fixunix.com]
Actually open source seems fine in Munich (Score:3)
Munich Linux migration is "dead - abandoned in all but name."
Last I heard it was a migration to open source and they were successfully using open source desktop applications. The operating system may be Windows rather than Linux but this still seems to be a victory for open source. On the desktop the applications are far more important than the operating system.
Re: (Score:2)
Open source standards, no open source code. Very different issue.
Open source, or open standards? (Score:1)
There is some difference. I'm not clear from the summary exactly what's going on.
Re: (Score:2)
I'm curious to see.. (Score:2)
How many Jedi's currently live in Australia.
Re: (Score:2)
How many Jedi's currently live in Australia.
None: for the moment, Assange is retained by the dark side of the force and too dry Australia is for master Yoda.
YAML is not the answer (Score:1)
As the author of the Perl module YAML::Tiny, and the current maintainer of the original YAML.pm I call troll on the parent.
YAML as a specification is way more complex than XML and it's way harder to implement.
And who in their right mind is going to read the raw census statistical quads directly? The point is moot.
XML is ideal for machine to machine communication. It's easily machine readable, and easily debuggable by nerds (which is the bit of "readable" that really matters here). And machine readable is wh
Incorrect Summary (Score:1)
The census will become the first time the open standards are used by an Australian Federal Government agency.
What the hell are you talking about? We use a variety open standards every day of every minute across every department with any modern IT assets, I think what you meant to say was the first time that open standards are being used by an Australian Federal Government agency to communicate with the general public. Even then, it's not exactly news, it was going to happen eventually.