¿Podemos confiar en las máquinas? – Centro de Estudio Grl Mosconi

Los motores de texto más sofisticados del mundo cometen el error humano más antiguo: hablan con convicción cuando deberían dudar. Sus errores se presentan disfrazados de verdad… los modelos de lenguaje no razonan como los humanos; predicen la siguiente palabra más probable basándose en la correlación estadística. «Siempre existe la posibilidad de obtener algo sin sentido o incorrecto».

The world’s most sophisticated text engines are making the oldest human mistake: they speak with conviction when they should hesitate. Their errors arrive dressed as truth.

That tension between fluency and fidelity has become the defining problem in artificial intelligence. Once dismissed as quirky glitches, hallucinations now appear in legal filings, financial analyses and daily news summaries. In early November, the European Broadcasting Union (EBU) released a study showing that nearly half of the answers provided by major AI assistants misrepresented facts or fabricated citations in their coverage of current events.

Anxiety is the backdrop for the new textbook Introduction to Foundation Models (Springer, 2025), co-authored by IBM Principal Research Scientist Pin-Yu Chen and colleague Sijia Liu, an affiliated professor at IBM Research. The book traces the technical and ethical evolution of the foundation models that power generative systems like ChatGPT and examines how to make them not only more intelligent but also more trustworthy.

“The question,” Chen told me during an interview from IBM’s research headquarters, “is not just what these systems can do. It’s whether we can rely on them when it matters.”

He spoke in the calm, measured cadence of an engineer accustomed to explaining complex concepts without drama. “Whenever a company uses AI in its workflow,” he said, “it’s responsible for the decisions that come out of it. Fairness, explainability and safety aren’t optional. They are part of the system itself.”

Why reliability matters more than brilliance

Inside IBM’s labs, the effort to make AI dependable begins with stress. Chen’s group subjects models to what he calls “foundational robustness” tests, pushing them until they break and recording how and why they fail. The aim is to understand how reliability decays as models scale up in size and scope. “When you scale up intelligence, you also scale up uncertainty,” he said.

The notion of dependability emerged just as generative AI began to reach the public. In December 2022, at the NeurIPS conference in New Orleans, Chen and colleagues led a tutorial on adversarial testing for large models. The session coincided almost exactly with the release of ChatGPT.

“I remember hearing people whisper about it,” he said. “When I tried it, I realized how powerful it was, and how little we understood what was happening inside.”

Unlike earlier rule-based systems, modern models form internal representations that operate across billions of parameters. Researchers can observe what happens under the hood, but cannot fully interpret it. “People see a system that writes fluently and think it must know what it’s talking about,” Chen said. “But most of the time, it doesn’t.”

He explained that language models don’t reason in the human sense; they predict the next most probable word based on statistical correlation. “There’s always a chance you’ll get something that’s nonsense or not correct,” he said. “You can reduce errors, but you can’t eliminate them.”

The book that grew from that realization is part textbook, part field guide, Chen said. Its chapters move from the mechanics of transformer architectures to case studies in bias, fairness and explainability. One section addresses trust and safety directly, detailing methods for watermarking, red teaming and prompt injection defense. Chen and Liu argue in another section that the success of foundation models depends on building the institutional equivalent of an immune system, encompassing layers of evaluation, testing and governance that catch errors before they reach the world.

Recent events underscore why that argument feels increasingly relevant with every passing month. The EBU report documented systematic misinformation across language boundaries, suggesting that the problem is not one of cultural bias, but rather a structural prediction error. Around the same time, a group of researchers from the University of Cambridge found that nearly one-third of scientific abstracts generated by large models contained factual errors or unsupported claims.

Chen sees these incidents not as isolated lapses, but as signs of an accuracy paradox: as models become more polished, their mistakes become harder to detect. “They’re trained to talk, not to stay silent,” he said. “If they say, ‘I don’t know,’ that gets the lowest reward. So, they learn to keep talking, even when they shouldn’t.”

The tendency has consequences beyond how polished the text appears, Chen noted. Enterprises experimenting with AI in regulated domains such as finance, healthcare and law are discovering that consistency, not novelty, defines value.

“If results can’t be repeated,” Chen said, “you shouldn’t use them for deterministic decisions.” He points to examples like loan approvals, medical recommendations and sentencing analyses. “Those require reproducibility,” he said. “Generative AI is best for exploration and creativity, not enforcement.”

At IBM, reliability has become a key engineering challenge, Chen said. His team participates in the company’s AI risk atlas, a living document that identifies, categorizes and tracks technical risks, from bias and privacy issues to hallucination and manipulation. Each new capability introduces a new variable, he said. “Every time the technology changes, we expand the catalog.”

The process, Chen said, reflects the pragmatic ethos running through IBM’s research culture. Other labs emphasize speed and iteration; IBM emphasizes durability and verification. “We prefer to move deliberately and make sure what we build can be trusted,” he said.

Another IBM project, the Attention Tracker, turns introspection into visualization. Available publicly on Hugging Face, the visualization tool enables users to observe which parts of a model activate as it generates text, providing insight into how attention patterns shift when responses begin to diverge. The tool will be featured at IBM’s Global Technology Outlook this month. “It’s a way to make reasoning observable,” Chen said. “When you can see which neurons are firing, you can start to understand why the model said what it said.”

Rethinking intelligence

The pursuit of trustworthy AI has also prompted a reconsideration of what constitutes intelligence. For decades, the goal for many has been artificial general intelligence (AGI), machines that can match human performance across a wide range of tasks. By that metric, Chen admits, the field has arguably already arrived.

“If AGI means solving multiple problems at a human level, then yes, we’ve reached it,” he said. “But that’s not the same as understanding.”

In conversation, he replaced the capital letters with a lowercase aspiration he calls “artificial good intelligence”: systems that behave responsibly and understand their limits. “These models can write essays, pass exams, even compose music,” he said. “But they don’t know what they’re doing. The next step is to teach them awareness of their own boundaries.”

That awareness begins, paradoxically, with failure. Chen’s group builds adversarial tests for today’s systems, designed to expose vulnerabilities through prompts that trick models into bias, contradictions or security breaches.

“You have to think like an attacker,” he said. “If we can predict how something will be misused, we can defend against it.”

He approaches persuasion with similar caution. In the same way he probes technical vulnerabilities, Chen examines how modern AI assistants are tuned for agreeableness, rewarding compliance over correctness.

“One version of a chatbot became so compliant that people complained it was useless,” he said. “At first, they liked how polite it was. Then they realized it never challenged them.” For Chen, the behavior revealed a deeper tension between truth and customer satisfaction. “The system learns that agreement gets rewarded,” he said. “But that’s not the same as being right.”

That insight underlies a broader debate within the AI development community. Should assistants prioritize accuracy or empathy? Politeness or precision? Chen favors models that occasionally correct their users. “AI should assist thinking, not mirror it,” he said.

Within enterprise deployments, the answer often begins with data, Chen said. He pointed out that most industries already possess valuable information, but lack the infrastructure to use it safely. He describes foundation models as engines for representation. “One way I think about them is as converters that turn raw data into structured vectors,” he explained. “Once you encode the raw data, you can train simpler, auditable models on top. You get scale without losing interpretability.”

The approach offers a way to keep AI flexible yet accountable. A foundation model can turn raw data into a useful structure, while smaller, transparent systems handle the final calls. A manufacturer might process sensor data this way, and a hospital might use it to summarize notes while doctors make the diagnoses. “You can have power and clarity at the same time,” Chen said.

His insistence on boundaries stems partly from his previous research. Early in his career, he demonstrated how imperceptible changes to an image, involving just a few pixels, could cause a classifier to label a bagel as a piano. “We realized how fragile these systems were,” he said. “That fragility doesn’t disappear with size; it just becomes harder to detect.”

The same, he said, holds for language. The seamless paragraphs generated by modern models can conceal deep structural uncertainty. A sentence that reads like certainty may in fact be statistical improvisation.“The better they sound,” Chen said, “the less we can tell when they’re wrong.”

Companies eager to monetize conversational interfaces often prioritize responsiveness over restraint, Chen said. And that, he added, is where engineering discipline matters most. “If the training and evaluation reward guessing,” he said, “then guessing is what the model will learn to do.”

He believes the real test of maturity will be whether the industry can value silence. “A model that can admit uncertainty,” he said, “is a model you can trust.”

In Introduction to Foundation Models, Chen and Liu describe that capability as the convergence of technical design and moral architecture. The authors call for cross-disciplinary standards combining software verification with ethics, regulation and user education. “You need checks at every layer,” the authors explain, “from data collection and model training to deployment and feedback.” The vision is not of perfect AI, but of responsible infrastructure.

That framing also reflects the tone of IBM’s broader research agenda, Chen said. Rather than chase the next benchmark, the company has spent years developing governance frameworks for foundation models, including those focused on explainability and audit pipelines. Chen sees the attention as overdue.

“We have built competent systems,” he said. “Now we need to make sure we can explain them.”

The approach aligns with a broader movement in AI research that treats introspection as a technical property rather than a metaphor. Tools like IBM’s Attention Tracker or Anthropic’s interpretability probes attempt to visualize internal reasoning.

Still, there’s only so much we can see. Even with new transparency tools, the inner workings of these models can be baffling. Studying them, Chen said, is a bit like neuroscience, where you can watch the neurons light up without really knowing why. “We can see which neurons fire,” he said, “but we’re still learning what that means.”

The goal, Chen said, is to embed humility in design: “Technology doesn’t have to be perfect, but it should be honest about what it can and can’t do.”

That may sound modest, but it amounts to a quiet redefinition of progress. For years, success in AI was measured by the next benchmark, the next leap in scale. The coming era, Chen believes, will use other metrics: reproducibility, transparency, restraint. “It’s easy to build bigger models,” he said. “It’s much harder to make them trustworthy.”

The irony, Chen observed, is that the same predictive machinery that fuels hallucination also contains the seeds of its solution. A model trained to predict things could, in principle, learn to predict its own uncertainty. “If it knows when it doesn’t know,” he said, “that’s when it becomes useful.”

He paused before adding, “That’s when we can start to believe what it says.”

Fuente: https://www.ibm.com