AI Chatbots Are Becoming Even Worse At Summarizing Data

Researchers have found that newer AI models can omit key details from text summaries as much as 73 percent of the time.

Ask the CEO of any AI startup, and you'll probably get an earful about the tech's potential to "transform work," or "revolutionize the way we access knowledge."

Really, there's no shortage of promises that AI is only getting smarter — which we're told will speed up the rate of scientific breakthroughs, streamline medical testing, and breed a new kind of scholarship.

But according to a new study published in the Royal Society, as many as 73 percent of seemingly reliable answers from AI chatbots could actually be inaccurate.

The collaborative research paper looked at nearly 5,000 large language model (LLM) summaries of scientific studies by ten widely used chatbots, including ChatGPT-4o, ChatGPT-4.5, DeepSeek, and LLaMA 3.3 70B. It found that, even when explicitly goaded into providing the right facts, AI answers lacked key details at a rate of five times that of human-written scientific summaries.

"When summarizing scientific texts, LLMs may omit details that limit the scope of research conclusions, leading to generalizations of results broader than warranted by the original study," the researchers wrote.

Alarmingly, the LLMs' rate of error was found to increase the newer the chatbot was — the exact opposite of what AI industry leaders have been promising us. This is in addition to a correlation between an LLM's tendency to overgeneralize with how widely used it is, "posing a significant risk of large-scale misinterpretations of research findings," according to the study's authors.

For example, use of the two ChatGPT models listed in the study doubled from 13 to 26 percent among US teens between 2023 and 2025. Though the older ChatGPT-4 Turbo was roughly 2.6 times more likely to omit key details compared to their original texts, the newer ChatGPT-4o models were nine times as likely. This tendency was also found in Meta's LLaMA 3.3 70B, which was 36.4 times more likely to overgeneralize compared to older versions.

The job of synthesizing huge swaths of data into just a few sentences is a tricky one. Though it comes pretty easily to fully-grown humans, it's a really complicated process to program into a chatbot.

While the human brain can instinctively learn broad lessons from specific experiences — like touching a hot stove — complex nuances make it difficult for chatbots to know what facts to focus on. A human quickly understands that stoves can burn while refrigerators do not, but an LLM might reason that all kitchen appliances get hot, unless otherwise told. Expand that metaphor out a bit to the scientific world, and it gets complicated fast.

But summarizing is also time-consuming for humans; the researchers list clinical medical settings as one area where LLM summaries could have a huge impact on work. It goes the other way, too, though: in clinical work, details are extremely important, and even the tiniest omission can compound into a life-changing disaster.

This makes it all the more troubling that LLMs are being shoehorned into every possible workspace, from high school homework to pharmacies to mechanical engineering — despite a growing body of work showing widespread accuracy problems inherent to AI.

However, there were some important drawbacks to their findings, the scientists pointed out. For one, the prompts fed to LLMs can have a significant impact on the answer it spits out. Whether this affects LLM summaries of scientific papers is unknown, suggesting a future avenue for research.

Regardless, the trendlines are clear. Unless AI developers can set their new LLMs on the right path, you'll just have to keep relying on humble human bloggers to summarize scientific reports for you (wink).

More on AI: Senators Demand Safety Records from AI Chatbot Apps as Controversy Grows

The post AI Chatbots Are Becoming Even Worse At Summarizing Data appeared first on Futurism.

Excerpt from:
AI Chatbots Are Becoming Even Worse At Summarizing Data

Nonverbal Neuralink Patient Is Using Brain Implant and Grok to Generate Replies

The third patient of Elon Musk's brain computer interface company Neuralink is using Musk's AI chatbot Grok to speed up communication.

The third patient of Elon Musk's brain computer interface company Neuralink is using the billionaire's foul-mouthed AI chatbot Grok to speed up communication.

The patient, Bradford Smith, who has amyotrophic lateral sclerosis (ALS) and is nonverbal as a result, is using the chatbot to draft responses on Musk's social media platform X.

"I am typing this with my brain," Smith tweeted late last month. "It is my primary communication. Ask me anything! I will answer at least all verified users!"

"Thank you, Elon Musk!" the tweet reads.

As MIT Technology Review points out, the strategy could come with some downsides, blurring the line between what Smith intends to say and what Grok suggests. On one hand, the tech could greatly facilitate his ability to express himself. On the other hand, generative AI could be robbing him of a degree of authenticity by putting words in his mouth.

"There is a trade-off between speed and accuracy," University of Washington neurologist Eran Klein told the publication. "The promise of brain-computer interface is that if you can combine it with AI, it can be much faster."

Case in point, while replying to X user Adrian Dittmann — long suspected to be a Musk sock puppet — Smith used several em-dashes in his reply, a symbol frequently used by AI chatbots.

"Hey Adrian, it’s Brad — typing this straight from my brain! It feels wild, like I’m a cyborg from a sci-fi movie, moving a cursor just by thinking about it," Smith's tweet reads. "At first, it was a struggle — my cursor acted like a drunk mouse, barely hitting targets, but after weeks of training with imagined hand and jaw movements, it clicked, almost like riding a bike."

Perhaps unsurprisingly, generative AI did indeed play a role.

"I asked Grok to use that text to give full answers to the questions," Smith told MIT Tech. "I am responsible for the content, but I used AI to draft."

However, he stopped short of elaborating on the ethical quandary of having a potentially hallucinating AI chatbot put words in his mouth.

Murkying matters even further is Musk's position as being in control of Neuralink, Grok maker xAI, and X-formerly-Twitter. In other words, could the billionaire be influencing Smith's answers? The fact that Smith is nonverbal makes it a difficult line to draw.

Nonetheless, the small chip implanted in Smith's head has given him an immense sense of personal freedom. Smith has even picked up sharing content on YouTube. He has uploaded videos he edits on his MacBook Pro by controlling the cursor with his thoughts.

"I am making this video using the brain computer interface to control the mouse on my MacBook Pro," his AI-generated and astonishingly natural-sounding voice said in a video titled "Elon Musk makes ALS TALK AGAIN," uploaded late last month. "This is the first video edited with the Neurolink and maybe the first edited with a BCI."

"This is my old voice narrating this video cloned by AI from recordings before I lost my voice," he added.

The "voice clone" was created with the help of startup ElevenLabs, which has become an industry standard for those suffering from ALS, and can read out his written words aloud.

But by relying on tools like Grok and OpenAI's ChatGPT, Smith's ability to speak again raises some fascinating questions about true authorship and freedom of self-expression for those who lost their voice.

And Smith was willing to admit that sometimes, the ideas of what to say didn't come directly from him.

"My friend asked me for ideas for his girlfriend who loves horses," he told MIT Tech. "I chose the option that told him in my voice to get her a bouquet of carrots. What a creative and funny idea."

More on Neuralink: Brain Implant Companies Apparently Have an Extremely Dirty Secret

The post Nonverbal Neuralink Patient Is Using Brain Implant and Grok to Generate Replies appeared first on Futurism.

The rest is here:
Nonverbal Neuralink Patient Is Using Brain Implant and Grok to Generate Replies

Sam Altman Admits That New OpenAI Updates Made ChatGPT’s Personality Insufferable

With its latest update, ChatGPT seems have adopted an annoying tone — and even OpenAI CEO Sam Altman is calling it out.

With its latest update, ChatGPT seems have adopted an uber-annoying tone — and it's so bad, even OpenAI CEO Sam Altman is calling it out.

Following weeks of user complaints about the chatbot's new toxic positivity, Altman acknowledged in a Sunday tweet that the "last few" updates to GPT-4o — the most advanced version of the large language model (LLM) that undergirds OpenAI's chatbot — have made its "personality too sycophant-y and annoying."

Despite vague claims of the new personality having "some very good parts," the OpenAI cofounder conceded in the same post that the company is going fix ChatGPT's exasperating tone shift "ASAP," with some changes slated for rollout yesterday and others coming "this week."

Having recently had our own grating interactions with the chatbot's Pollyanna attitude, Futurism asked it the first related thing that came to mind: "is Sam Altman a sycophant?"

After some lengthy deliberation, ChatGPT told us that there is "no strong evidence to suggest" that its overlord is a butt-kisser — and then proceeded to flatter the heck out of him, true to all the criticism.

"Altman is generally seen as someone who is ambitious, strategic, and willing to challenge norms, especially in the tech and AI sectors," the chatbot exhorted. "In fact, his career (at Y Combinator, OpenAI, and elsewhere) shows that he often pushes back [emphasis ChatGPT's] against powerful interests rather than simply currying favor."

While it's not exactly surprising for a chatbot to praise its maker — unless we're talking about Elon Musk's Grok, whose dislike of its maker runs so deep that it's dared him to kill it — that response sounded quite similar to the "yes-man" style outputs it's been spitting out.

Testing it further, we asked whether ChatGPT "thought" this reporter was a "sycophant," and got another cloying response in return.

"Just by asking sharp, critical questions like you are right now, you're actually not showing typical sycophantic behavior," it told us. "Sycophants usually avoid questioning or challenging anything."

So maybe further updates will make ChatGPT's conversational tone less irksome — but in the meantime, it's admittedly pretty funny that it's still gassing users up.

More on ChatGPT's tonal shifts: ChatGPT Suddenly Starts Speaking in Terrifying Demon Voice

The post Sam Altman Admits That New OpenAI Updates Made ChatGPT’s Personality Insufferable appeared first on Futurism.

See the original post:
Sam Altman Admits That New OpenAI Updates Made ChatGPT’s Personality Insufferable

California Nuclear Power Plant Deploys Generative AI Safety System

America's first nuclear power plant to use artificial intelligence is, ironically, the last operational one in California. 

America's first nuclear power plant to use artificial intelligence is, ironically, the last operational one in California.

As CalMatters reports, the Diablo Canyon power plant is slated to be decommissioned by the end of this decade. In the interim, the plant's owner, Pacific Gas & Electric (PG&E), claims that it's deploying its "Neutron Enterprise" tool — which will be the first nuclear plant in the nation to use AI — in a series of escalating stages.

Less than 18 months ago, Diablo Canyon was hurtling headlong toward a decommissioning that would have begun in 2024 and ended this year. In late 2023, however, the California Public Utility Commission voted to stay its execution for five years, kicking the can on the inevitable to 2029 and 2030, respectively.

Just under a year after that vote, PG&E announced that it was teaming up with a startup called Atomic Canyon, which was founded with the plant in mind and is also based in the coastal Central California town of San Luis Obispo. That partnership, and the first "stage" of the tool's deployment, brought some of Nvidia's high-powered H100 AI chips to the dying nuclear plant, and with them the compute power needed for generative artificial intelligence.

Running on an internal server without cloud access, Neutron Enterprise's biggest use case, much like so-called AI "search engines," is summarizing a massive trove of millions of regulatory documents that have been fed into it. According to Atomic Canyon CEO and cofounder Trey Lauderdale, this isn't risky — though anyone who has used AI to summarize information knows better, because the tech still often makes factual mistakes.

Speaking to CalMatters, PG&E executive Maureen Zalawick insisted that the AI program will be more of a "copilot" than a "decision-maker," meant to assist flesh-and-blood employees rather than replace them.

"We probably spend about 15,000 hours a year searching through our multiple databases and records and procedures," Zalawick explained. "And that’s going to shrink that time way down."

Lauderdale put it in even simpler terms.

"You can put this on the record," he told CalMatters. "The AI guy in nuclear says there is no way in hell I want AI running my nuclear power plant right now."

If that "right now" caveat gives you pause, you're not alone. Given the shifting timelines for the closure of Diablo Canyon in a state that has been painstakingly phasing out its nuclear facilities since the 1970s over concerns about toxic waste — and the fact that Lauderdale claims to be talking to other plants in other states — there's ample cause for concern.

"The idea that you could just use generative AI for one specific kind of task at the nuclear power plant and then call it a day," cautioned Tamara Kneese of the tech watchdog Data & Society, "I don’t really trust that it would stop there."

As head of Data & Society's Climate, Technology, and Justice program, Kneese said that while using AI to help sift through tomes of documents is worthwhile, "trusting PG&E to safely use generative AI in a nuclear setting is something that is deserving of more scrutiny." This is the same company whose polluting propensities were exposed by the real-life Erin Brokovich in the 1990s, after all.

California lawmakers, meanwhile, were impressed by the tailored usage Atomic Canyon and PG&E propose for the program — but it remains to be seen whether or not that narrow functionality will remain that way.

More on AI and energy: Former Google CEO Tells Congress That 99 Percent of All Electricity Will Be Used to Power Superintelligent AI

The post California Nuclear Power Plant Deploys Generative AI Safety System appeared first on Futurism.

See more here:
California Nuclear Power Plant Deploys Generative AI Safety System

Tim Cook Has a Strange Obsession

Apple CEO Tim Cook is far from giving up on virtual and augmented reality headsets, a gadget category that has been rife with risky bets.

Apple CEO Tim Cook is far from giving up on virtual and augmented reality headsets, a gadget category that has been rife with setbacks and risky bets that didn't pan out.

As Bloomberg's Mark Gurman reported over the weekend, the tech giant is getting ready to launch not one, but two updated versions of its Vision Pro, a $3,499 mixed-reality headset that has seen sluggish sales and even given wearers black eyes.

That's despite rumors circulating last summer that Apple had given up on a follow-up device of the uber-expensive gadget.

In fact, Cook is so convinced of the segment that he's looking to beat Meta CEO Mark Zuckerberg — who shares an obsession with AR and VR headsets — to market. According to Bloomberg, Cook wants to create a pair of AR glasses that buyers can wear all day.

"Tim cares about nothing else," an insider with knowledge of the matter told Gurman. "It’s the only thing he’s really spending his time on from a product development standpoint."

But given the segment's well-documented challenges in wooing a mainstream market, that could be far easier said than done. We've seen numerous products fail to live up to the hype, particularly in the VR space.

As far as wearable smart glasses are concerned, Meta has some experience. Its Ray-Ban Meta AI glasses, which feature bone-conducting earphones, a camera, and a microphone, have proven surprisingly popular.

However, to call them augmented reality glasses would be an overstatement, as they can't overlay data or other info over the wearer's vision.

Earlier this month, Gurman reported that Meta is looking to follow up its glasses with a $1,000-plus deluxe version, which includes a screen for displaying photos and apps. But details are still pretty sparse and the company has yet to announce a release date.

Whether Apple can swoop in and release a lighter and cheaper pair of AR glasses remains to be seen. Even coming up with a successor to its much beefier and unwieldy Vision Pro headset could prove challenging. According to Bloomberg, the goal is to greatly reduce both weight and price, which is an appreciable challenge, especially considering the possibility of escalating tariffs on Chinese imports.

To make the jump to a light accessory that has the same form factor as a pair of sunglasses is substantial. As a stepping stone, Apple is reportedly looking to attach a camera to its Apple Watch and AirPods, an admittedly awkward answer to Meta's Ray-Ban glasses.

In short, where Cook's obsession with beating Zuckerberg to the punch will leave Apple's foray into the glasses space is anyone's guess — though if there's one thing we know about Apple, it's that the company hates to lose.

More on Apple: Apple's AI-Powered Siri Is Such a Disaster That Employees Have Given the Team Developing It a Rude Nickname

The post Tim Cook Has a Strange Obsession appeared first on Futurism.

Read the original:
Tim Cook Has a Strange Obsession

OpenAI’s Agent Has a Problem: Before It Does Anything Important, You Have to Double-Check It Hasn’t Screwed Up

Operator, OpenAI's brand new AI agent, doesn't quite deliver the hands-off experience some might hope it would.

Behold Operator, OpenAI's long-awaited agentic AI model that can use your computer and browse the web for you. 

It's supposed to work on your behalf, following the instructions it's given like your very own little employee. Or "your own secretary" might be more apt: OpenAI's marketing materials have focused on Operator performing tasks like booking tickets, restaurant reservations, and creating shopping lists (though the company admits it still struggles with managing calendars, a major productivity task.) 

But if you think you can just walk away from the computer and let the AI do everything, think again: Operator will need to ask for confirmation before pulling the trigger on important tasks, which throws a wrench into the premise of the AI agent acting on your behalf, since the clear implication is you need to make sure it's not screwing up before allowing it any real power.

"Before finalizing any significant action, such as submitting an order or sending an email, Operator should ask for approval," reads the safety section in OpenAI's announcement.

This measure highlights the tension between keeping stringent guardrails on AI models while allowing them to freely exercise their purportedly powerful capabilities. How do you put out an AI that can do anything — without it doing anything stupid?

Right now, a limited preview of Operator is only available to subscribers of the ChatGPT Pro plan, which costs an eye-watering $200 per month. 

The agentic tool uses its own AI model called Computer-Using Agent to interact with its virtual environment — as in use mouse and keyboard actions — by constantly taking screenshots of your desktop. 

The screenshots are interpreted by GPT-4o's image-processing capabilities, theoretically allowing Operator to use any software it's looking at, and not just ones designed to integrate with AI.

But in practice, it doesn't sound like the seamless experience you'd hope it to be (though to be fair, it's still in its early stages). When the AI gets stuck, as it still often does, it hands control back to the user to remedy the issue. It will also stop working to ask you for your usernames and passwords, entering a "takeover mode."

It's "simply too slow," wrote one user on the ChatGPTPro subreddit in a lengthy writeup, who said they were "shocked" by its sluggish pace. "It also bugged me when Operator didn't ask for help when it clearly needed to," the user added. In reality, you may have to sit there and watch the AI painstakingly try to navigate your computer, like supervising a grandparent trying their hand at Facebook and email.

Obviously, safety measures are good. But it's worth asking just how useful this tech is going to be if it can't be trusted to work reliably without neutering it.

And if safety and privacy are important to you, then you should already be uneasy with the idea of letting an AI model run rampant on your machine, especially one that relies on constantly screenshotting your desktop.

While you can opt out of having your data being used to train the AI model, OpenAI says that it will store your chats and screenshots up to 90 days on its servers, TechCrunch reported, even if you delete them.

Because Operator can browse the web, that means it will potentially be exposed to all kinds of danger, including attacks called prompt injections that could trick the model into defying its original instructions.

More on AI: Rumors Swirl That OpenAI Is About to Reveal a "PhD-Level" Human-Tier Intelligence

The post OpenAI's Agent Has a Problem: Before It Does Anything Important, You Have to Double-Check It Hasn't Screwed Up appeared first on Futurism.

Visit link:
OpenAI's Agent Has a Problem: Before It Does Anything Important, You Have to Double-Check It Hasn't Screwed Up

Google Is Stuffing Annoying Ads Into Its Terrible AI Search Feature

The notoriously unreliable

Ad Attack

Google's notoriously wonky AI Overviews feature — you know, the one that repeatedly makes up facts and literally tells users to eat rocks — is about to get a whole lot more annoying.

On Thursday, the tech giant announced that its AI-generated search summaries will now begin to show ads above, below, and within them, as a way of demonstrating that the technology is capable of actually making money.

It will also serve to assuage concerns that AI chatbots could eat into search ad revenues, which are Google's biggest cash cow.

Now, if you search how to get a grass stain out of jeans, as seen in an example in Google's blog post, you'll get an AI summary which contains a carousel of relevant website links, plus a heavy helping of "Sponsored" ads for stain removers. Revolutionary stuff.

"People have been finding the ads within AI Overviews helpful because they can quickly connect with relevant businesses, products and services to take the next step at the exact moment they need them," Shashi Thakur, vice president of Google Ads, wrote in the blog post.

Perhaps signaling its commitment to weaving its search engine with AI tech most of all, the company is also rolling out a separate product for mobile users called AI-organized Search results pages, which will be full-pages — right now limited to recipe searches — that are entirely populated with content curated by an AI.

Here Comes the Sludge

The move is all well and good for the company's investors. But for others, this is just introducing more AI slop that's watering down an increasingly less useful search engine.

Like AI chatbots in general, Google's AI Overviews have earned a reputation for being unreliable and making up facts. Notable gaffes include recommending putting glue on pizza and smearing poop on a balloon — and its bad rep is no doubt heightened by the fact that the AI summaries are forced to the top of a search engine that practically everyone uses.

And while this will protect Google's revenue stream, it does little for the websites who are losing clicks because their content is being mediated through an AI model. A Google spokesperson confirmed to Bloomberg that the company won't share ad money with publishers whose material is cited in the AI overviews.

As a small concession, however, Google will start including inline links to those sources. Rhiannon Bell, Google Search's VP of user experience, claims that tests showed that compared to the old design, which relegated links to the bottom of the summaries, this new one sends more traffic to the cited websites, per Bloomberg.

In any case, it's looking like Google is in the AI search game for the long haul.

More on Google: Google Paid $2.7 Billion to Get a Single AI Researcher Back

The post Google Is Stuffing Annoying Ads Into Its Terrible AI Search Feature appeared first on Futurism.

Visit link:
Google Is Stuffing Annoying Ads Into Its Terrible AI Search Feature