'Openwashing' 40
An anonymous reader quotes a report from The New York Times: There's a big debate in the tech world over whether artificial intelligence models should be "open source." Elon Musk, who helped found OpenAI in 2015, sued the startup and its chief executive, Sam Altman, on claims that the company had diverged from its mission of openness. The Biden administration is investigating the risks and benefits of open source models. Proponents of open source A.I. models say they're more equitable and safer for society, while detractors say they are more likely to be abused for malicious intent. One big hiccup in the debate? There's no agreed-upon definition of what open source A.I. actually means. And some are accusing A.I. companies of "openwashing" -- using the "open source" term disingenuously to make themselves look good. (Accusations of openwashing have previously been aimed at coding projects that used the open source label too loosely.)
In a blog post on Open Future, a European think tank supporting open sourcing, Alek Tarkowski wrote, "As the rules get written, one challenge is building sufficient guardrails against corporations' attempts at 'openwashing.'" Last month the Linux Foundation, a nonprofit that supports open-source software projects, cautioned that "this 'openwashing' trend threatens to undermine the very premise of openness -- the free sharing of knowledge to enable inspection, replication and collective advancement." Organizations that apply the label to their models may be taking very different approaches to openness. [...]
The main reason is that while open source software allows anyone to replicate or modify it, building an A.I. model requires much more than code. Only a handful of companies can fund the computing power and data curation required. That's why some experts say labeling any A.I. as "open source" is at best misleading and at worst a marketing tool. "Even maximally open A.I. systems do not allow open access to the resources necessary to 'democratize' access to A.I., or enable full scrutiny," said David Gray Widder, a postdoctoral fellow at Cornell Tech who has studied use of the "open source" label by A.I. companies.
In a blog post on Open Future, a European think tank supporting open sourcing, Alek Tarkowski wrote, "As the rules get written, one challenge is building sufficient guardrails against corporations' attempts at 'openwashing.'" Last month the Linux Foundation, a nonprofit that supports open-source software projects, cautioned that "this 'openwashing' trend threatens to undermine the very premise of openness -- the free sharing of knowledge to enable inspection, replication and collective advancement." Organizations that apply the label to their models may be taking very different approaches to openness. [...]
The main reason is that while open source software allows anyone to replicate or modify it, building an A.I. model requires much more than code. Only a handful of companies can fund the computing power and data curation required. That's why some experts say labeling any A.I. as "open source" is at best misleading and at worst a marketing tool. "Even maximally open A.I. systems do not allow open access to the resources necessary to 'democratize' access to A.I., or enable full scrutiny," said David Gray Widder, a postdoctoral fellow at Cornell Tech who has studied use of the "open source" label by A.I. companies.
It helps (Score:5, Informative)
(Accusations of openwashing have previously been aimed at coding projects that used the open source label too loosely.)
It helps if you know what Open Source means [archive.org]. It means you can see the source.
If you can get access to the training data and the code that turns it into a model, it's open source regardless of what you're allowed to do with it, or whether you can afford the computer time to build the model from the data. If you can't see the sources, then it's not open source. Not even every definition of Free Software ensures that you will actually be able to use the code in question. That's why there is a GPLv3, with an anti-Tivoization clause; GPLv2 wasn't Free enough. But even the GPLv3 doesn't mandate that you be able to make meaningful use of the code for reasons beyond artificial restrictions, like not owning a supercluster.
Re: (Score:3, Informative)
Open source should really NOT be used to describe to anything that isn't copyleft.
The concept of copyleft was literally created because open source wasn't open enough for its creator.
If someone can take the code, modify it, and close it because of a deficient license like BSD or MIT, it's not really open, and never really was.
"Open" meant "documented and interoperable" in UNIXland for many years. Open Source's origins are in the security community's use of the same phrase to mean an intelligence source anyone could get information from. The first programmable computers came from military efforts, and the bulk of computers were military until they became inexpensive, so this relationship was well-established and fundamental, there
Re: (Score:2)
also why their lawyers told them that they couldn't copyright it
Whoops. I mean, Trademark. That's what I get for not using Preview.
Re: (Score:2)
Open washing (Score:2)
Re: (Score:2)
Almost nobody in the indie AI community cares about whether the training data for the model is open source. We care about the license restrictions on the model. We can re-finetune or further train a foundation however we want, the question is, what we're allowed to do with it.
A lot of people just ignore the licenses, but that can come back to bite you, and I don't recommend it.
On Funding Digital Public Works & Self-Dealing (Score:2)
I wrote most of this in 2001 (hard to believe that is almost a quarter century ago):
https://pdfernhout.net/on-fund... [pdfernhout.net]
"Consider again the self-driving cars mentioned earlier which now cruise some streets in small numbers. The software "intelligence" doing the driving was primarily developed by public money given to universities, which generally own the copyrights and patents as the contractors. Obviously there are related scientific publications, but in practice these fail to do justice to the complexity of
Re: (Score:2)
Elon Musk doesnâ(TM)t want it open (Score:2)
Heâ(TM)s simply seen that theyâ(TM)re ahead on AI research, and wants access to their tech.
Re:Elon Musk doesnâ(TM)t want it open (Score:5, Interesting)
Nobody is actually ahead in AI, because they're all solving the wrong problem, as indeed AI researchers have consistently done since the 1960s.
I'm not the least bit worried about the possibility of superintelligence, not until they actually figure out what intelligence is as opposed to what is convenient to solve.
As for Musk, he's busy trying to kill all engineering projects in America.
Re: (Score:2)
noun
1.
the ability to acquire and apply knowledge and skills.
There's probably some better definitions but that simple first dictionary result covers it pretty well. We know what intelligence is. We're on the part about how it works and we appear to be solving that quite rapidly.
Re: Elon Musk doesnâ(TM)t want it open (Score:1)
That seems a bit like a circular definition. What does it mean to acquire and apply knowledge and skills, what are the parameters for success or even a positive result?
Basically if you follow the definitions of knowledge/skills you get right back to perception, information and synonyms of intelligence. So intelligence (in the human sense) is the capacity to acquire and apply intelligence (in the military sense), but we havenâ(TM)t defined concretely what that is, what components that has etc.
Re: (Score:2)
Re: (Score:1)
I still have to see a measurement or task. If it is merely the collection of data, 1950s databases were the first artificial (electronic) intelligence, and by that definition so would any library going back literally the entirety of written history. But just storing a book to a memory (mechanically, electronically etc) does not make someone intelligent.
Re: (Score:2)
Re: (Score:1)
Dude, I have been involved in "AI" and ML for at least 15 years now. It is not intelligent. It feigns intelligence by regurgitating string sentences from a database.
Re: (Score:2)
Re: (Score:1)
I said it is not intelligent. We know what 'not intelligent' means despite not knowing what intelligence is. Regurgitating data from a database is not intelligent.
We do need to apply reduction to distill what it means to be intelligent, that is how you make definitions. We don't just repeat string sentences copied in our brains, again, you don't know what you are talking about. That is why in most cases people find examinations where students copy/remember stuff not to be a good sample of their intelligence
Re: (Score:2)
I didn't say we need to apply reduction. I pointed out your misapplication of it.
You've demonstrated you're the one without a clue here. You've contradicted yourself. Your assertions are akin to stating the earth is flat and sky is brown while also trying to claim we don't know what brown is. You're not making a
Re: (Score:1)
Then feel free to provide a non-self-referential definition of what intelligence is, a nobel prize is awaiting you.
Re: (Score:2)
Re:Elon Musk doesnâ(TM)t want it open (Score:5, Informative)
Musk doesn't care about openness. Look at what he's done to shitter. If anyone says anything mean about him they get their account suspended [imgur.com]. When his Nazi-loving supporters get revealed he can't move fast enough to prevent people from seeing it [mashable.com] while at the same time allowing his Nazi-loving supporters to do the same to others.
As we know from his flailing car company [jalopnik.com], if you criticize him or the company you lose your ability to buy a car [motoringresearch.com].
And finally, there's a reason none of his companies have a PR department. He wants everything to go through him, including any announcements about fake products used to inflate the price of company stock. When Musk talks about oppenness you can be sure he has no idea what that term means.
Re: (Score:2, Troll)
Eli's suspension was temporary something the previous Twitter wouldn't do if you were guilty of wrongthink. Suspending people for doxxing is now your line in the sand for openness? Get real. You don't actually hold this standard.
You're deluding yourself about Tesla. Those cars are everywhere and there's only going to be more. You
Re: (Score:1)
There's always something... (Score:2)
There are those who always bring up the wrong-thing.
Don't you get tired of those "glass half empty" naysayers?
Or worse, disguise their desire to control by seemingly bringing up issues to be solved?
Re: (Score:2)
If there's an issue that needs resolving, it's best to acknowledge it. Hiding away, like Microsoft does with their abysmal records on reliability and security, achieves nothing.
If honesty is a problem, then neither IT nor science seem good professions. Politics and economics might be better.
The data is the code. (Score:5, Interesting)
In neural nets, the network software is not the algorithm that is running. The net software is playing the same role as the CPU in a conventional software system. It is merely the platform on which the code is run.
The topology of the network plus the state of that network (the data) corresponds to an algorithm. That is the actual software that is being run. AI cannot be considered open until this is released.
But I flat-out guarantee no AI vendor is going to do that.
Re: The data is the code. (Score:4)
Re: (Score:2)
What's being "open sourced" in a model like Llama is just the model architecture + weights. Some people prefer the term "open weights".
The "algorithm" - what the transformer is actually doing - is defined by the weights, which are derived from the training set plus the pre-training procedure (very tricky - not a turn-key process), and maybe a bunch of post-training (even more tricky) which is where a lot of the final model functionality/behavior comes from.
Even if a company did provide the training data (wh
Re: (Score:2)
A bunch of them have already. Facebook, for example, although they did have some help from a leaker initially.
Re: The data is the code. (Score:1)
We know OpenAI has a lot (most of it actually) is shaped by hand. The actual base math models are open, itâ(TM)s not even OpenAIâ(TM)s product, but it is the filtering and keyword matching and other things that OpenAI (eg. the way it builds its databases) does that is considered the âalgorithmâ(TM). Just like Twitter is not âthe algorithmâ(TM), we all know how databases work and anyone can apply simple mathematical models to see what *should* be promoted or is viral, the questi
Also, copyright infringement (Score:4, Interesting)
I have yet to see the blanket license agreements that will be needed tet AI companies legally create derivarive works from training data. I have seen copyright license holders such as Sony issue warnings. If these agreements are still being negotiated, no AI company would ever let their data be open for fear of inviting a suit.
Re: (Score:2)
I have yet to see the blanket license agreements that will be needed tet AI companies legally create derivarive works from training data.
Derivative works contain recognizably copied elements. There are many uses which don't meet that standard. Just looking like the thing doesn't suit, either, it has to be obviously directly copied (though possibly manipulated) and not recreated.
Re: (Score:2)
I see nothing in copyright law that uses "recognizable" as a criteria. If a book originally written in English is translated into French, it's a derivative work regardless of whether anyone (especially those who don't know French) recognizes it as being translated from the original.
There are two parts to a derivative work:
I don't care (Score:2)
If bags of weights are available to everyone to mess with and use as they please that's good enough in my book. Good luck to anyone seeking to assert any legal restrictions on any bags of weights that do happen to make their way onto the Internet.
How can AI be open source? (Score:2)
If you don't know how or why the model comes up with what it comes up with you can't reveal it to anyone else. Perhaps I am not understanding correctly -- wouldn't be the first time. I suppose you can release the initial program but once it has started to do its thing with the training data it becomes pretty much a black box, doesn't it?
Regulation and control, is the agenda (Score:1)
The ulterior motive is it would be very, very convenient for massive "regulation" to be enacted, allowing for ultimate control by the large corporate entities developing these. Wouldn't that be profitable, in many ways. Ugh.
I don't think their efforts will work. The cat is out of the bag. The world is now changed.
DPRK (Score:2)
Democratic People's Republic of Korea, also known as the North Korean hereditary dictatorship.