Science

A New Image File Format Efficiently Stores Invisible Light Data (arstechnica.com) 11

An anonymous reader quotes a report from Ars Technica: Imagine working with special cameras that capture light your eyes can't even see -- ultraviolet rays that cause sunburn, infrared heat signatures that reveal hidden writing, or specific wavelengths that plants use for photosynthesis. Or perhaps using a special camera designed to distinguish the subtle visible differences that make paint colors appear just right under specific lighting. Scientists and engineers do this every day, and they're drowning in the resulting data. A new compression format called Spectral JPEG XL might finally solve this growing problem in scientific visualization and computer graphics. Researchers Alban Fichet and Christoph Peters of Intel Corporation detailed the format in a recent paper published in the Journal of Computer Graphics Techniques (JCGT). It tackles a serious bottleneck for industries working with these specialized images. These spectral files can contain 30, 100, or more data points per pixel, causing file sizes to balloon into multi-gigabyte territory -- making them unwieldy to store and analyze.

[...] The current standard format for storing this kind of data, OpenEXR, wasn't designed with these massive spectral requirements in mind. Even with built-in lossless compression methods like ZIP, the files remain unwieldy for practical work as these methods struggle with the large number of spectral channels. Spectral JPEG XL utilizes a technique used with human-visible images, a math trick called a discrete cosine transform (DCT), to make these massive files smaller. Instead of storing the exact light intensity at every single wavelength (which creates huge files), it transforms this information into a different form. [...]

According to the researchers, the massive file sizes of spectral images have reportedly been a real barrier to adoption in industries that would benefit from their accuracy. Smaller files mean faster transfer times, reduced storage costs, and the ability to work with these images more interactively without specialized hardware. The results reported by the researchers seem impressive -- with their technique, spectral image files shrink by 10 to 60 times compared to standard OpenEXR lossless compression, bringing them down to sizes comparable to regular high-quality photos. They also preserve key OpenEXR features like metadata and high dynamic range support.
The report notes that broader adoption "hinges on the continued development and refinement of the software tools that handle JPEG XL encoding and decoding."

Some scientific applications may also see JPEG XL's lossy approach as a drawback. "Some researchers working with spectral data might readily accept the trade-off for the practical benefits of smaller files and faster processing," reports Ars. "Others handling particularly sensitive measurements might need to seek alternative methods of storage."
Science

Inside arXiv - the Most Transformative Platform in All of Science (wired.com) 13

Paul Ginsparg, a physics professor at Cornell University, created arXiv nearly 35 years ago as a digital repository where researchers could share their findings before peer review. Today, the platform hosts more than 2.6 million papers, receives 20,000 new submissions monthly, and serves 5 million active users, Wired writes in a profile of the platform.

"Just when I thought I was out, they pull me back in!" Ginsparg quotes from The Godfather, reflecting his inability to fully hand over the platform despite numerous attempts. If arXiv stopped functioning, scientists worldwide would face immediate disruption. "Everybody in math and physics uses it," says Scott Aaronson, a computer scientist at the University of Texas at Austin. "I scan it every night."

ArXiv revolutionized academic publishing, previously dominated by for-profit giants like Elsevier and Springer, by allowing instant and free access to research. Many significant discoveries, including the "transformers" paper that launched the modern AI boom, first appeared on the platform. Initially a collection of shell scripts on Ginsparg's NeXT machine in 1991, arXiv followed him from Los Alamos National Laboratory to Cornell, where it found an institutional home despite administrative challenges. Recent funding from the Simons Foundation has enabled a hiring spree and long-needed technical updates.
Math

JPMorgan Says Quantum Experiment Generated Truly Random Numbers (financialpost.com) 111

JPMorgan Chase used a quantum computer from Honeywell's Quantinuum to generate and mathematically certify truly random numbers -- an advancement that could significantly enhance encryption, security, and financial applications. The breakthrough was validated with help from U.S. national laboratories and has been published in the journal Nature. From a report: Between May 2023 and May 2024, cryptographers at JPMorgan wrote an algorithm for a quantum computer to generate random numbers, which they ran on Quantinuum's machine. The US Department of Energy's supercomputers were then used to test whether the output was truly random. "It's a breakthrough result," project lead and Head of Global Technology Applied Research at JPMorgan, Marco Pistoia told Bloomberg in an interview. "The next step will be to understand where we can apply it."

Applications could ultimately include more energy-efficient cryptocurrency, online gambling, and any other activity hinging on complete randomness, such as deciding which precincts to audit in elections.

Education

'Kids Are Spending Too Much Class Time on Laptops' (bloomberg.com) 77

Over the past two decades, school districts have spent billions equipping classrooms with laptops, yet students have fallen further behind on essential skills, Michael Bloomberg argues. With about 90% of schools now providing these devices, test scores hover near historic lows -- only 28% of eighth graders proficient in math and 30% in reading.

Bloomberg notes technology's classroom push came from technologists and government officials who envisioned tailored curricula. Computer manufacturers, despite good intentions, had financial interests and profited substantially. The Google executive who questioned why children should learn equations when they could Google answers might now ask why they should write essays when chatbots can do it for them.

Studies confirm traditional methods -- reading and writing on paper -- remain superior to screen-based approaches. Devices distract students, with research showing up to 20 minutes needed to refocus after nonacademic activities. As some districts ban smartphones during school hours, Bloomberg suggests reconsidering classroom computer policies, recommending locked carts for more purposeful use and greater transparency for parents about screen time. Technology's promise has failed while imposing significant costs on children and taxpayers, he writes. Bloomberg calls for a return to books and pens over laptops and tablets.
AI

'There's a Good Chance Your Kid Uses AI To Cheat' (msn.com) 98

Long-time Slashdot reader theodp writes: Wall Street Journal K-12 education reporter Matt Barnum has a heads-up for parents: There's a Good Chance Your Kid Uses AI to Cheat. Barnum writes:

"A high-school senior from New Jersey doesn't want the world to know that she cheated her way through English, math and history classes last year. Yet her experience, which the 17-year-old told The Wall Street Journal with her parent's permission, shows how generative AI has rooted in America's education system, allowing a generation of students to outsource their schoolwork to software with access to the world's knowledge. [...] The New Jersey student told the Journal why she used AI for dozens of assignments last year: Work was boring or difficult. She wanted a better grade. A few times, she procrastinated and ran out of time to complete assignments. The student turned to OpenAI's ChatGPT and Google's Gemini, to help spawn ideas and review concepts, which many teachers allow. More often, though, AI completed her work. Gemini solved math homework problems, she said, and aced a take-home test. ChatGPT did calculations for a science lab. It produced a tricky section of a history term paper, which she rewrote to avoid detection. The student was caught only once."

Not surprisingly, AI companies play up the idea that AI will radically improve learning, while educators are more skeptical. "This is a gigantic public experiment that no one has asked for," said Marc Watkins, assistant director of academic innovation at the University of Mississippi.

Python

Codon Python Compiler Gets Faster - and Changes to Apache 2 License (usenix.org) 4

Slashdot reader rikfarrow summarizes an article they wrote for Usenix.org about the Open Source Python compiler Codon: In 2023 I tried out Codon. At the time I had difficulty compiling the scripts I most commonly used, but was excited by the prospect. Python is essentially single threaded and checks the shape (type) of each variable as it interprets scripts. Codon fixes types and compiles Python into compact, executable binaries that execute much faster.

Several things have changed with their latest release: I have successful compiles, the committers have added a compiled version of NumPy (high performance math algorithms), and changed their open source license to Apache 2.

"The other big news is that Exaloop, the company that is behind Codon, has changed their license to Apache 2..." according to the article, so "commercial use and derivations of Codon are now permitted without licensing."
Google

Google is Adding More AI Overviews and a New 'AI Mode' To Search (theverge.com) 33

Google announced Wednesday it is expanding its AI Overviews to more query types and users worldwide, including those not logged into Google accounts, while introducing a new "AI Mode" chatbot feature. AI Mode, which resembles competitors like Perplexity or ChatGPT Search, will initially be limited to Google One AI Premium subscribers who enable it through the Labs section of Search.

The feature delivers AI-generated answers with supporting links interspersed throughout, powered by Google's search index. "What we're finding from people who are using AI Overviews is that they're really bringing different kinds of questions to Google," said Robby Stein, VP of product on the Search team. "They're more complex questions, that may have been a little bit harder before." Google is also upgrading AI Overviews with its Gemini 2.0 model, which Stein says will improve responses for math, coding and reasoning-based queries.
Math

India's 'Human Calculator Kid' Shatters 6 World Records In a Single Day (gizmodo.com) 39

An anonymous reader quotes a report from Gizmodo: Fourteen-year-old Aaryan Shukla cruised through six mental math calculation world records in a single day, according to a Guinness World Records statement published on February 12, earning the well-deserved nickname, "human calculator kid." Specifically, it took Shukla:

- 30.9 seconds to mentally add 100 four-digit numbers
- One minute and 9.68 seconds to mentally add 200 four-digit numbers
- 18.71 seconds to mentally add 50 five-digit numbers
- Five minutes and 42 seconds to mentally divide a 20-digit number by a ten-digit number ten times
- 51.69 seconds to mentally multiply two five-digit numbers ten times
- Two minutes and 35.41 seconds to mentally multiply two eight-digit numbers ten times

According to the statement, these are among the most difficult mental calculation world records ever attempted. Shukla's frankly mind-boggling achievement also comes in the wake of another world record he broke in April 2024 at the age of 13: fastest time to mentally add 50 five-digit numbers. It took him just 25.19 seconds. That's an addition every half a second. I wouldn't be surprised if students seeking "shortcuts" in their math homework started phoning up Shukla instead of reaching for their ChatGPT browser tab.
Guinness World Records published a video about Shukla's accomplishments on YouTube.
AI

OpenAI Cancels Its o3 AI Model In Favor of a 'Unified' Next-Gen Release 10

OpenAI has canceled the release of o3 in favor of a "simplified" product lineup. CEO Sam Altman said in a post on X that, in the coming months, OpenAI will release a model called GPT-5 that "integrates a lot of [OpenAI's] technology," including o3. TechCrunch reports: The company originally said in December that it planned to launch o3 sometime early this year. Just a few weeks ago, Kevin Weil, OpenAI's chief product officer, said in an interview that o3 was on track for a "February-March" launch. "We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings," Altman wrote in the post. "We want AI to 'just work' for you; we realize how complicated our model and product offerings have gotten. We hate the model picker [in ChatGPT] as much as you do and want to return to magic unified intelligence."

Altman also announced that OpenAI plans to offer unlimited chat access to GPT-5 at the "standard intelligence setting," subject to "abuse thresholds," once the model is generally available. (Altman declined to provide more detail on what this setting -- and these abuse thresholds -- entail.) Subscribers to ChatGPT Plus will be able to run GPT-5 at a "higher level of intelligence," Altman said, while ChatGPT Pro subscribers will be able to run GPT-5 at an "even higher level of intelligence."

"These models will incorporate voice, canvas, search, deep research, and more," Altman said, referring to a range of features OpenAI has launched in ChatGPT over the past few months. "[A] top goal for us is to unify [our] models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks." Before GPT-5 launches, OpenAI plans to release its GPT-4.5 model, code-named "Orion," in the next several weeks, according to Altman's post on X. Altman says this will be the company's last "non-chain-of-thought model." Unlike o3 and OpenAI's other so-called reasoning models, non-chain-of-thought models tend to be less reliable in domains like math and physics.
Math

Children's Arithmetic Skills Do Not Transfer Between Applied and Academic Mathematics (nature.com) 100

Children working in India's fruit and vegetable markets can perform complex mental calculations with ease, yet struggle with basic written math tests that determine their academic future, according to new research that raises troubling questions about mathematics education worldwide.

The study, published in Nature, reveals how traditional education systems are failing to tap into the mathematical talents of students who develop practical skills outside the classroom, particularly those from lower-income families. MIT economist Abhijit Banerjee, who grew up watching young market vendors deftly handle complicated transactions, led the research. His team found that while these children could rapidly perform mental arithmetic, they performed poorly on standard written assessments like long division problems.

The findings come at a critical moment when mathematics education must evolve to meet modern demands, incorporating data literacy and computational skills alongside traditional mathematics. The research points to systemic issues, including a global shortage of trained mathematics teachers and assessment systems that reward memorization over reasoning. Without addressing these challenges, researchers warn, naturally talented students from disadvantaged backgrounds may never reach their potential in fields like research, entrepreneurship, or teaching.
AI

Researchers Created an Open Rival To OpenAI's o1 'Reasoning' Model for Under $50 23

AI researchers at Stanford and the University of Washington were able to train an AI "reasoning" model for under $50 in cloud compute credits, according to a research paper. From a report: The model, known as s1, performs similarly to cutting-edge reasoning models, such as OpenAI's o1 and DeepSeek's R1, on tests measuring math and coding abilities. The s1 model is available on GitHub, along with the data and code used to train it.

The team behind s1 said they started with an off-the-shelf base model, then fine-tuned it through distillation, a process to extract the "reasoning" capabilities from another AI model by training on its answers. The researchers said s1 is distilled from one of Google's reasoning models, Gemini 2.0 Flash Thinking Experimental. Distillation is the same approach Berkeley researchers used to create an AI reasoning model for around $450 last month.
The Almighty Buck

'Magical' Efficient-Market Theory Rebuked in Era of Passive Investing (yahoo.com) 57

An anonymous reader shares a report: At first blush, stock trading this week is hardly a paragon of the market-efficiency theory, an oft-romanticized idea in Economics 101. After all, big equity gauges plunged on Monday, spurred by fears of an AI model released a week earlier, before swiftly rebounding. A fresh academic paper suggests the rise of passive investing may be fueling these kind of fragile market moves.

According to a study to be published in the prestigious American Economic Review, evidence is building that active managers are slow to scoop up stocks en masse when prices move away from their intrinsic worth. Thanks to this lethargic trading behavior and the relentless boom in benchmark-tracking index funds, the impact of each trade on prices gets amplified, explaining how sell orders, like on Monday perhaps, can induce broader equity gyrations. As a result, the financial landscape is proving less dynamic and more volatile in the era of Big Passive, according to authors at the UCLA Anderson School of Management, the Stockholm School of Economics and the University of Minnesota Carlson School of Management.

Power

Could New Linux Code Cut Data Center Energy Use By 30%? (datacenterdynamics.com) 65

Two computer scientists at the University of Waterloo in Canada believe changing 30 lines of code in Linux "could cut energy use at some data centers by up to 30 percent," according to the site Data Centre Dynamics.

It's the code that processes packets of network traffic, and Linux "is the most widely used OS for data center servers," according to the article: The team tested their solution's effectiveness and submitted it to Linux for consideration, and the code was published this month as part of Linux's newest kernel, release version 6.13. "All these big companies — Amazon, Google, Meta — use Linux in some capacity, but they're very picky about how they decide to use it," said Martin Karsten [professor of Computer Science in the Waterloo's Math Faculty]. "If they choose to 'switch on' our method in their data centers, it could save gigawatt hours of energy worldwide. Almost every single service request that happens on the Internet could be positively affected by this."

The University of Waterloo is building a green computer server room as part of its new mathematics building, and Karsten believes sustainability research must be a priority for computer scientists. "We all have a part to play in building a greener future," he said. The Linux Foundation, which oversees the development of the Linux OS, is a founder member of the Green Software Foundation, an organization set up to look at ways of developing "green software" — code that reduces energy consumption.

Karsten "teamed up with Joe Damato, distinguished engineer at Fastly" to develop the 30 lines of code, according to an announcement from the university. "The Linux kernel code addition developed by Karsten and Damato was based on research published in ACM SIGMETRICS Performance Evaluation Review" (by Karsten and grad student Peter Cai).

Their paper "reviews the performance characteristics of network stack processing for communication-heavy server applications," devising an "indirect methodology" to "identify and quantify the direct and indirect costs of asynchronous hardware interrupt requests (IRQ) as a major source of overhead...

"Based on these findings, a small modification of a vanilla Linux system is devised that improves the efficiency and performance of traditional kernel-based networking significantly, resulting in up to 45% increased throughput..."
AI

Cutting-Edge Chinese 'Reasoning' Model Rivals OpenAI o1 55

An anonymous reader quotes a report from Ars Technica: On Monday, Chinese AI lab DeepSeek released its new R1 model family under an open MIT license, with its largest version containing 671 billion parameters. The company claims the model performs at levels comparable to OpenAI's o1 simulated reasoning (SR) model on several math and coding benchmarks. Alongside the release of the main DeepSeek-R1-Zero and DeepSeek-R1 models, DeepSeek published six smaller "DeepSeek-R1-Distill" versions ranging from 1.5 billion to 70 billion parameters. These distilled models are based on existing open source architectures like Qwen and Llama, trained using data generated from the full R1 model. The smallest version can run on a laptop, while the full model requires far more substantial computing resources.

The releases immediately caught the attention of the AI community because most existing open-weights models -- which can often be run and fine-tuned on local hardware -- have lagged behind proprietary models like OpenAI's o1 in so-called reasoning benchmarks. Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models. "They are SO much fun to run, watching them think is hilarious," independent AI researcher Simon Willison told Ars in a text message. Willison tested one of the smaller models and described his experience in a post on his blog: "Each response starts with a ... pseudo-XML tag containing the chain of thought used to help generate the response," noting that even for simple prompts, the model produces extensive internal reasoning before output.
Although the benchmarks have yet to be independently verified, DeepSeek reports that R1 outperformed OpenAI's o1 on AIME (a mathematical reasoning test), MATH-500 (a collection of word problems), and SWE-bench Verified (a programming assessment tool).

TechCrunch notes that three Chinese labs -- DeepSeek, Alibaba, and Moonshot AI's Kimi, have released models that match o1's capabilities.
AI

AI Benchmarking Organization Criticized For Waiting To Disclose Funding from OpenAI (techcrunch.com) 6

An anonymous reader shares a report: An organization developing math benchmarks for AI didn't disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community.

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI's mathematical skills, was one of the benchmarks OpenAI used to demo its upcoming flagship AI, o3.

In a post on the forum LessWrong, a contractor for Epoch AI going by the username "Meemi" says that many contributors to the FrontierMath benchmark weren't informed of OpenAI's involvement until it was made public. "The communication about this has been non-transparent," Meemi wrote. "In my view Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark."

AI

OpenAI's AI Reasoning Model 'Thinks' In Chinese Sometimes, No One Really Knows Why 104

OpenAI's "reasoning" AI model, o1, has exhibited a puzzling behavior of "thinking" in Chinese, Persian, or some other language -- "even when asked a question in English," reports TechCrunch. While the exact cause remains unclear, as OpenAI has yet to provide an explanation, AI experts have proposed a few theories. From the report: Several on X, including Hugging Face CEO Clement Delangue, alluded to the fact that reasoning models like o1 are trained on datasets containing a lot of Chinese characters. Ted Xiao, a researcher at Google DeepMind, claimed that companies including OpenAI use third-party Chinese data labeling services, and that o1 switching to Chinese is an example of "Chinese linguistic influence on reasoning."

"[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding," Xiao wrote in a post on X. "[F]or expert labor availability and cost reasons, many of these data providers are based in China." [...] Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.

Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution. Rather, these experts say, o1 and other reasoning models might simply be using languages they find most efficient to achieve an objective (or hallucinating). "The model doesn't know what language is, or that languages are different," Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. "It's all just text to it."

Tiezhen Wang, a software engineer at AI startup Hugging Face, agrees with Guzdial that reasoning models' language inconsistencies may be explained by associations the models made during training. "By embracing every linguistic nuance, we expand the model's worldview and allow it to learn from the full spectrum of human knowledge," Wang wrote in a post on X. "For example, I prefer doing math in Chinese because each digit is just one syllable, which makes calculations crisp and efficient. But when it comes to topics like unconscious bias, I automatically switch to English, mainly because that's where I first learned and absorbed those ideas."

[...] Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can't know for certain. "This type of observation on a deployed AI system is impossible to back up due to how opaque these models are," they told TechCrunch. "It's one of the many cases for why transparency in how AI systems are built is fundamental."
Math

Rational or Not? This Basic Math Question Took Decades To Answer. (quantamagazine.org) 49

Three mathematicians have developed a breakthrough method for proving whether numbers can be written as fractions, solving a problem that has puzzled researchers for decades. Frank Calegari, Vesselin Dimitrov and Yunqing Tang proved the irrationality of an infinite collection of numbers related to the Riemann zeta function, building on Roger Apery's landmark 1978 proof about a single such number.

The new approach, which relies on 19th-century mathematical techniques, has already helped settle a 50-year-old conjecture about modular forms and could lead to more advances in number theory.
AI

OpenAI's Next Big AI Effort GPT-5 is Behind Schedule and Crazy Expensive (msn.com) 120

"From the moment GPT-4 came out in March 2023, OpenAI has been working on GPT-5..." reports the Wall Street Journal. [Alternate URL here.] But "OpenAI's new artificial-intelligence project is behind schedule and running up huge bills. It isn't clear when — or if — it'll work."

"There may not be enough data in the world to make it smart enough." OpenAI's closest partner and largest investor, Microsoft, had expected to see the new model around mid-2024, say people with knowledge of the matter. OpenAI has conducted at least two large training runs, each of which entails months of crunching huge amounts of data, with the goal of making Orion smarter. Each time, new problems arose and the software fell short of the results researchers were hoping for, people close to the project say... [And each one costs around half a billion dollars in computing costs.]

The $157 billion valuation investors gave OpenAI in October is premised in large part on [CEO Sam] Altman's prediction that GPT-5 will represent a "significant leap forward" in all kinds of subjects and tasks.... It's up to company executives to decide whether the model is smart enough to be called GPT-5 based in large part on gut feelings or, as many technologists say, "vibes."

So far, the vibes are off...

OpenAI wants to use its new model to generate high-quality synthetic data for training, according to the article. But OpenAI's researchers also "concluded they needed more diverse, high-quality data," according to the article, since "The public internet didn't have enough, they felt." OpenAI's solution was to create data from scratch. It is hiring people to write fresh software code or solve math problems for Orion to learn from. [And also theoretical physics experts] The workers, some of whom are software engineers and mathematicians, also share explanations for their work with Orion... Having people explain their thinking deepens the value of the newly created data. It's more language for the LLM to absorb; it's also a map for how the model might solve similar problems in the future... The process is painfully slow. GPT-4 was trained on an estimated 13 trillion tokens. A thousand people writing 5,000 words a day would take months to produce a billion tokens.

OpenAI's already-difficult task has been complicated by internal turmoil and near-constant attempts by rivals to poach its top researchers, sometimes by offering them millions of dollars... More than two dozen key executives, researchers and longtime employees have left OpenAI this year, including co-founder and Chief Scientist Ilya Sutskever and Chief Technology Officer Mira Murati. This past Thursday, Alec Radford, a widely admired researcher who served as lead author on several of OpenAI's scientific papers, announced his departure after about eight years at the company...

OpenAI isn't the only company worrying that progress has hit a wall. Across the industry, a debate is raging over whether improvement in AIs is starting to plateau. Sutskever, who recently co-founded a new AI firm called Safe Superintelligence or SSI, declared at a recent AI conference that the age of maximum data is over. "Data is not growing because we have but one internet," he told a crowd of researchers, policy experts and scientists. "You can even go as far as to say that data is the fossil fuel of AI."

And that fuel was starting to run out.

AI

OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills (openai.com) 27

OpenAI has unveiled a new AI model that it says takes longer to solve problems but gets better results, following Google's similar announcement a day earlier. The model, called o3, replaces o1 from September and spends extra time working through questions that need step-by-step reasoning.

It scores three times higher than o1 on ARC-AGI, a test measuring how well AI handles complex math and logic problems it hasn't seen before. "This is the beginning of the next phase of AI," CEO Sam Altman said during a livestream Friday.

The Microsoft-backed startup is keeping o3 under wraps for now but plans to let outside researchers test it.
AI

Google Releases Its Own 'Reasoning' AI Model (techcrunch.com) 5

An anonymous reader quotes a report from TechCrunch: Google has released what it's calling a new "reasoning" AI model -- but it's in the experimental stages, and from our brief testing, there's certainly room for improvement. The new model, called Gemini 2.0 Flash Thinking Experimental (a mouthful, to be sure), is available in AI Studio, Google's AI prototyping platform. A model card describes it as "best for multimodal understanding, reasoning, and coding," with the ability to "reason over the most complex problems" in fields such as programming, math, and physics. [...]

Built on Google's recently announced Gemini 2.0 Flash model, Gemini 2.0 Flash Thinking Experimental appears to be similar in design to OpenAI's o1 and other so-called reasoning models. Unlike most AI, reasoning models effectively fact-check themselves, which helps them avoid some of the pitfalls that normally trip up AI models. As a drawback, reasoning models often take longer -- usually seconds to minutes longer -- to arrive at solutions. Given a prompt, Gemini 2.0 Flash Thinking Experimental pauses before responding, considering a number of related prompts and "explaining" its reasoning along the way. After a while, the model summarizes what it considers to be the most accurate answer.

Slashdot Top Deals