News News | Slashdot

A New Image File Format Efficiently Stores Invisible Light Data (arstechnica.com) 11

Posted by BeauHD on Friday March 28, 2025 @07:50PM from the world-beyond-RGB dept.

An anonymous reader quotes a report from Ars Technica: Imagine working with special cameras that capture light your eyes can't even see -- ultraviolet rays that cause sunburn, infrared heat signatures that reveal hidden writing, or specific wavelengths that plants use for photosynthesis. Or perhaps using a special camera designed to distinguish the subtle visible differences that make paint colors appear just right under specific lighting. Scientists and engineers do this every day, and they're drowning in the resulting data. A new compression format called Spectral JPEG XL might finally solve this growing problem in scientific visualization and computer graphics. Researchers Alban Fichet and Christoph Peters of Intel Corporation detailed the format in a recent paper published in the Journal of Computer Graphics Techniques (JCGT). It tackles a serious bottleneck for industries working with these specialized images. These spectral files can contain 30, 100, or more data points per pixel, causing file sizes to balloon into multi-gigabyte territory -- making them unwieldy to store and analyze.

[...] The current standard format for storing this kind of data, OpenEXR, wasn't designed with these massive spectral requirements in mind. Even with built-in lossless compression methods like ZIP, the files remain unwieldy for practical work as these methods struggle with the large number of spectral channels. Spectral JPEG XL utilizes a technique used with human-visible images, a math trick called a discrete cosine transform (DCT), to make these massive files smaller. Instead of storing the exact light intensity at every single wavelength (which creates huge files), it transforms this information into a different form. [...]

According to the researchers, the massive file sizes of spectral images have reportedly been a real barrier to adoption in industries that would benefit from their accuracy. Smaller files mean faster transfer times, reduced storage costs, and the ability to work with these images more interactively without specialized hardware. The results reported by the researchers seem impressive -- with their technique, spectral image files shrink by 10 to 60 times compared to standard OpenEXR lossless compression, bringing them down to sizes comparable to regular high-quality photos. They also preserve key OpenEXR features like metadata and high dynamic range support. The report notes that broader adoption "hinges on the continued development and refinement of the software tools that handle JPEG XL encoding and decoding."

Some scientific applications may also see JPEG XL's lossy approach as a drawback. "Some researchers working with spectral data might readily accept the trade-off for the practical benefits of smaller files and faster processing," reports Ars. "Others handling particularly sensitive measurements might need to seek alternative methods of storage."

Inside arXiv - the Most Transformative Platform in All of Science (wired.com) 13

Posted by msmash on Thursday March 27, 2025 @10:55AM from the gamechangers dept.

JPMorgan Says Quantum Experiment Generated Truly Random Numbers (financialpost.com) 111

Posted by BeauHD on Wednesday March 26, 2025 @09:30PM from the world-first dept.

'Kids Are Spending Too Much Class Time on Laptops' (bloomberg.com) 77

Posted by msmash on Thursday March 20, 2025 @04:00PM from the how-about-that dept.

'There's a Good Chance Your Kid Uses AI To Cheat' (msn.com) 98

Posted by EditorDavid on Sunday March 16, 2025 @05:21PM from the AI-ate-my-homework dept.

Codon Python Compiler Gets Faster - and Changes to Apache 2 License (usenix.org) 4

Posted by EditorDavid on Sunday March 16, 2025 @11:34AM from the taking-license dept.

Google is Adding More AI Overviews and a New 'AI Mode' To Search (theverge.com) 33

Posted by msmash on Wednesday March 05, 2025 @07:30PM from the aggressive-expansion dept.

India's 'Human Calculator Kid' Shatters 6 World Records In a Single Day (gizmodo.com) 39

Posted by BeauHD on Friday February 21, 2025 @05:26PM from the mental-math dept.

OpenAI Cancels Its o3 AI Model In Favor of a 'Unified' Next-Gen Release 10

Posted by BeauHD on Wednesday February 12, 2025 @05:50PM from the what-to-expect dept.

OpenAI has canceled the release of o3 in favor of a "simplified" product lineup. CEO Sam Altman said in a post on X that, in the coming months, OpenAI will release a model called GPT-5 that "integrates a lot of [OpenAI's] technology," including o3. TechCrunch reports: The company originally said in December that it planned to launch o3 sometime early this year. Just a few weeks ago, Kevin Weil, OpenAI's chief product officer, said in an interview that o3 was on track for a "February-March" launch. "We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings," Altman wrote in the post. "We want AI to 'just work' for you; we realize how complicated our model and product offerings have gotten. We hate the model picker [in ChatGPT] as much as you do and want to return to magic unified intelligence."

Altman also announced that OpenAI plans to offer unlimited chat access to GPT-5 at the "standard intelligence setting," subject to "abuse thresholds," once the model is generally available. (Altman declined to provide more detail on what this setting -- and these abuse thresholds -- entail.) Subscribers to ChatGPT Plus will be able to run GPT-5 at a "higher level of intelligence," Altman said, while ChatGPT Pro subscribers will be able to run GPT-5 at an "even higher level of intelligence."

"These models will incorporate voice, canvas, search, deep research, and more," Altman said, referring to a range of features OpenAI has launched in ChatGPT over the past few months. "[A] top goal for us is to unify [our] models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks." Before GPT-5 launches, OpenAI plans to release its GPT-4.5 model, code-named "Orion," in the next several weeks, according to Altman's post on X. Altman says this will be the company's last "non-chain-of-thought model." Unlike o3 and OpenAI's other so-called reasoning models, non-chain-of-thought models tend to be less reliable in domains like math and physics.

Children's Arithmetic Skills Do Not Transfer Between Applied and Academic Mathematics (nature.com) 100

Posted by msmash on Wednesday February 12, 2025 @12:05PM from the closer-look dept.

Researchers Created an Open Rival To OpenAI's o1 'Reasoning' Model for Under $50 23

Posted by msmash on Thursday February 06, 2025 @10:45AM from the pushing-the-limits dept.

'Magical' Efficient-Market Theory Rebuked in Era of Passive Investing (yahoo.com) 57

Posted by msmash on Friday January 31, 2025 @02:40PM from the How-about-that dept.

Could New Linux Code Cut Data Center Energy Use By 30%? (datacenterdynamics.com) 65

Posted by EditorDavid on Saturday January 25, 2025 @07:34PM from the loving-Linux dept.

Two computer scientists at the University of Waterloo in Canada believe changing 30 lines of code in Linux "could cut energy use at some data centers by up to 30 percent," according to the site Data Centre Dynamics.

It's the code that processes packets of network traffic, and Linux "is the most widely used OS for data center servers," according to the article: The team tested their solution's effectiveness and submitted it to Linux for consideration, and the code was published this month as part of Linux's newest kernel, release version 6.13. "All these big companies — Amazon, Google, Meta — use Linux in some capacity, but they're very picky about how they decide to use it," said Martin Karsten [professor of Computer Science in the Waterloo's Math Faculty]. "If they choose to 'switch on' our method in their data centers, it could save gigawatt hours of energy worldwide. Almost every single service request that happens on the Internet could be positively affected by this."

The University of Waterloo is building a green computer server room as part of its new mathematics building, and Karsten believes sustainability research must be a priority for computer scientists. "We all have a part to play in building a greener future," he said. The Linux Foundation, which oversees the development of the Linux OS, is a founder member of the Green Software Foundation, an organization set up to look at ways of developing "green software" — code that reduces energy consumption.
Karsten "teamed up with Joe Damato, distinguished engineer at Fastly" to develop the 30 lines of code, according to an announcement from the university. "The Linux kernel code addition developed by Karsten and Damato was based on research published in ACM SIGMETRICS Performance Evaluation Review" (by Karsten and grad student Peter Cai).

Their paper "reviews the performance characteristics of network stack processing for communication-heavy server applications," devising an "indirect methodology" to "identify and quantify the direct and indirect costs of asynchronous hardware interrupt requests (IRQ) as a major source of overhead...

"Based on these findings, a small modification of a vanilla Linux system is devised that improves the efficiency and performance of traditional kernel-based networking significantly, resulting in up to 45% increased throughput..."

Cutting-Edge Chinese 'Reasoning' Model Rivals OpenAI o1 55

Posted by BeauHD on Tuesday January 21, 2025 @05:40PM from the would-you-look-at-that dept.

An anonymous reader quotes a report from Ars Technica: On Monday, Chinese AI lab DeepSeek released its new R1 model family under an open MIT license, with its largest version containing 671 billion parameters. The company claims the model performs at levels comparable to OpenAI's o1 simulated reasoning (SR) model on several math and coding benchmarks. Alongside the release of the main DeepSeek-R1-Zero and DeepSeek-R1 models, DeepSeek published six smaller "DeepSeek-R1-Distill" versions ranging from 1.5 billion to 70 billion parameters. These distilled models are based on existing open source architectures like Qwen and Llama, trained using data generated from the full R1 model. The smallest version can run on a laptop, while the full model requires far more substantial computing resources.

The releases immediately caught the attention of the AI community because most existing open-weights models -- which can often be run and fine-tuned on local hardware -- have lagged behind proprietary models like OpenAI's o1 in so-called reasoning benchmarks. Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models. "They are SO much fun to run, watching them think is hilarious," independent AI researcher Simon Willison told Ars in a text message. Willison tested one of the smaller models and described his experience in a post on his blog: "Each response starts with a ... pseudo-XML tag containing the chain of thought used to help generate the response," noting that even for simple prompts, the model produces extensive internal reasoning before output. Although the benchmarks have yet to be independently verified, DeepSeek reports that R1 outperformed OpenAI's o1 on AIME (a mathematical reasoning test), MATH-500 (a collection of word problems), and SWE-bench Verified (a programming assessment tool).

TechCrunch notes that three Chinese labs -- DeepSeek, Alibaba, and Moonshot AI's Kimi, have released models that match o1's capabilities.

AI Benchmarking Organization Criticized For Waiting To Disclose Funding from OpenAI (techcrunch.com) 6

Posted by msmash on Monday January 20, 2025 @04:50PM from the not-a-good-look dept.

OpenAI's AI Reasoning Model 'Thinks' In Chinese Sometimes, No One Really Knows Why 104

Posted by BeauHD on Tuesday January 14, 2025 @08:45PM from the what's-going-on dept.

OpenAI's "reasoning" AI model, o1, has exhibited a puzzling behavior of "thinking" in Chinese, Persian, or some other language -- "even when asked a question in English," reports TechCrunch. While the exact cause remains unclear, as OpenAI has yet to provide an explanation, AI experts have proposed a few theories. From the report: Several on X, including Hugging Face CEO Clement Delangue, alluded to the fact that reasoning models like o1 are trained on datasets containing a lot of Chinese characters. Ted Xiao, a researcher at Google DeepMind, claimed that companies including OpenAI use third-party Chinese data labeling services, and that o1 switching to Chinese is an example of "Chinese linguistic influence on reasoning."

"[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding," Xiao wrote in a post on X. "[F]or expert labor availability and cost reasons, many of these data providers are based in China." [...] Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.

Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution. Rather, these experts say, o1 and other reasoning models might simply be using languages they find most efficient to achieve an objective (or hallucinating). "The model doesn't know what language is, or that languages are different," Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. "It's all just text to it."

Tiezhen Wang, a software engineer at AI startup Hugging Face, agrees with Guzdial that reasoning models' language inconsistencies may be explained by associations the models made during training. "By embracing every linguistic nuance, we expand the model's worldview and allow it to learn from the full spectrum of human knowledge," Wang wrote in a post on X. "For example, I prefer doing math in Chinese because each digit is just one syllable, which makes calculations crisp and efficient. But when it comes to topics like unconscious bias, I automatically switch to English, mainly because that's where I first learned and absorbed those ideas."

[...] Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can't know for certain. "This type of observation on a deployed AI system is impossible to back up due to how opaque these models are," they told TechCrunch. "It's one of the many cases for why transparency in how AI systems are built is fundamental."

Rational or Not? This Basic Math Question Took Decades To Answer. (quantamagazine.org) 49

Posted by msmash on Friday January 10, 2025 @10:45AM from the irrational-behavior dept.

OpenAI's Next Big AI Effort GPT-5 is Behind Schedule and Crazy Expensive (msn.com) 120

Posted by EditorDavid on Sunday December 22, 2024 @04:34AM from the underclocking dept.

"From the moment GPT-4 came out in March 2023, OpenAI has been working on GPT-5..." reports the Wall Street Journal. [Alternate URL here.] But "OpenAI's new artificial-intelligence project is behind schedule and running up huge bills. It isn't clear when — or if — it'll work."

"There may not be enough data in the world to make it smart enough." OpenAI's closest partner and largest investor, Microsoft, had expected to see the new model around mid-2024, say people with knowledge of the matter. OpenAI has conducted at least two large training runs, each of which entails months of crunching huge amounts of data, with the goal of making Orion smarter. Each time, new problems arose and the software fell short of the results researchers were hoping for, people close to the project say... [And each one costs around half a billion dollars in computing costs.]

The $157 billion valuation investors gave OpenAI in October is premised in large part on [CEO Sam] Altman's prediction that GPT-5 will represent a "significant leap forward" in all kinds of subjects and tasks.... It's up to company executives to decide whether the model is smart enough to be called GPT-5 based in large part on gut feelings or, as many technologists say, "vibes."

So far, the vibes are off...
OpenAI wants to use its new model to generate high-quality synthetic data for training, according to the article. But OpenAI's researchers also "concluded they needed more diverse, high-quality data," according to the article, since "The public internet didn't have enough, they felt." OpenAI's solution was to create data from scratch. It is hiring people to write fresh software code or solve math problems for Orion to learn from. [And also theoretical physics experts] The workers, some of whom are software engineers and mathematicians, also share explanations for their work with Orion... Having people explain their thinking deepens the value of the newly created data. It's more language for the LLM to absorb; it's also a map for how the model might solve similar problems in the future... The process is painfully slow. GPT-4 was trained on an estimated 13 trillion tokens. A thousand people writing 5,000 words a day would take months to produce a billion tokens.

OpenAI's already-difficult task has been complicated by internal turmoil and near-constant attempts by rivals to poach its top researchers, sometimes by offering them millions of dollars... More than two dozen key executives, researchers and longtime employees have left OpenAI this year, including co-founder and Chief Scientist Ilya Sutskever and Chief Technology Officer Mira Murati. This past Thursday, Alec Radford, a widely admired researcher who served as lead author on several of OpenAI's scientific papers, announced his departure after about eight years at the company...

OpenAI isn't the only company worrying that progress has hit a wall. Across the industry, a debate is raging over whether improvement in AIs is starting to plateau. Sutskever, who recently co-founded a new AI firm called Safe Superintelligence or SSI, declared at a recent AI conference that the age of maximum data is over. "Data is not growing because we have but one internet," he told a crowd of researchers, policy experts and scientists. "You can even go as far as to say that data is the fossil fuel of AI."

And that fuel was starting to run out.

OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills (openai.com) 27

Posted by msmash on Friday December 20, 2024 @02:36PM from the AGI-race dept.

Google Releases Its Own 'Reasoning' AI Model (techcrunch.com) 5

Posted by BeauHD on Thursday December 19, 2024 @07:30PM from the rise-of-the-machines dept.

2016	The FBI Recommends Not To Indict Hillary Clinton For Email Misconduct	1010 comments
2015	Greece Rejects EU Terms	1307 comments
2007	MS Moves R&D To Canada Due To Immigration Problem	765 comments
2006	Your Favorite Support Anecdote	1177 comments
2002	Isn't it Time for Metric Time?	1717 comments

Slashdot Top Deals