{"id":16473,"date":"2025-02-04T08:00:30","date_gmt":"2025-02-04T11:00:30","guid":{"rendered":"https:\/\/www.fie.undef.edu.ar\/ceptm\/?p=16473"},"modified":"2025-02-04T08:00:30","modified_gmt":"2025-02-04T11:00:30","slug":"para-que-la-inteligencia-artificial-sea-una-mision-critica-debe-alucinar-menos","status":"publish","type":"post","link":"https:\/\/www.fie.undef.edu.ar\/ceptm\/?p=16473","title":{"rendered":"Para que la inteligencia artificial sea una misi\u00f3n cr\u00edtica, debe alucinar menos"},"content":{"rendered":"<p>El Departamento de Defensa de EE UU ha redoblado sus esfuerzos en favor de un ej\u00e9rcito basado en datos y potenciado por la inteligencia artificial. Sin embargo, el uso de la IA para aplicaciones de misi\u00f3n cr\u00edtica para la defensa nacional se ha visto obstaculizado por una raz\u00f3n en particular: las alucinaciones. Las alucinaciones ocurren cuando los grandes modelos de lenguaje (LLM, por sus siglas en ingl\u00e9s) como ChatGPT generan informaci\u00f3n que suena plausible pero que es incorrecta en los hechos. No es raro que los LLM alucinen con una frecuencia de hasta 1 de cada 10 respuestas, seg\u00fan un estudio realizado por investigadores de la Universidad Carnegie Mellon. Es esa tasa de error del 10 por ciento la que ha frenado el potencial de la IA en el Departamento de Defensa.<\/p>\n<hr \/>\n<p>The Defense Department has doubled down on a data-driven military empowered by artificial intelligence. The use of AI for mission-critical applications for national defense, however, has been hindered by one reason in particular \u2013 hallucinations.<\/p>\n<p>Hallucinations happen when large language models (LLMs) such as ChatGPT generate plausible-sounding but factually incorrect information. It\u2019s not uncommon for LLMs to hallucinate as often as 1 in every 10 responses, according to a study by researchers at Carnegie Mellon University. It\u2019s that 10-percent error rate that\u2019s slowed a fuller potential for AI in the DoD.<\/p>\n<p>Now, however, there\u2019s a new software solution called Retrieval Augmented Generation Verification (RAG-V) that addresses hallucinations in LLMs by drastically reducing their occurrence. Introduced by\u00a0<a href=\"https:\/\/bit.ly\/4g7K4Xx\" target=\"_blank\" rel=\"noopener\">Primer<\/a>, which builds practical and trusted AI for complex enterprise environments, RAG-V nearly eliminates hallucinations by adding a novel verification stage.<\/p>\n<figure id=\"attachment_16475\" aria-describedby=\"caption-attachment-16475\" style=\"width: 150px\" class=\"wp-caption alignright\"><img loading=\"lazy\" class=\" wp-image-16475\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2025\/02\/John-Bohanon-is-vice-president-for-data-science-at-Primer-231x300.jpg\" alt=\"\" width=\"150\" height=\"195\" srcset=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2025\/02\/John-Bohanon-is-vice-president-for-data-science-at-Primer-231x300.jpg 231w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2025\/02\/John-Bohanon-is-vice-president-for-data-science-at-Primer.jpg 582w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><figcaption id=\"caption-attachment-16475\" class=\"wp-caption-text\">John Bohannon is vice president for data science at Primer.<\/figcaption><\/figure>\n<p>\u201cRAG-V makes it possible to take a Large Language Model and put it into a mission-critical setting so warfighters can rely on it; that\u2019s the heart of the matter,\u201d said John Bohannon, vice president for data science at Primer. \u201cIn some settings you want hallucinations; it\u2019s called creativity, such as when you want ideas for throwing a party. These models are very creative and come up with stuff out of thin air.<\/p>\n<p>\u201cThe dark side is if you\u2019re using it in a setting where factuality matters in getting answers to questions, hallucinations can be bad \u2013 especially when they\u2019re subtle. When it\u2019s an obviously incorrect piece of information, it\u2019s easy for a human to catch it. The dangerous thing is when the model\u2019s confident and it looks correct, but it\u2019s a hallucination saying it in such a plausible, credible way that it can fool you.\u201d<\/p>\n<p>Primer works by incorporating a verification step akin to fact checking that grounds the LLM\u2019s responses in substantiated data sources, reducing the above-mentioned error rate from around 5-10 percent \u2013 which is the state-of-the-art for leading LLMs \u2013 to just 0.1 percent. This dramatic improvement in reliability is critical for applications where factual accuracy is paramount, enabling warfighters and analysts to trust the AI-generated insights and make time-sensitive decisions with increased confidence.<\/p>\n<p>Beyond detecting hallucinations, RAG-V also provides detailed explanations of any errors, allowing the system to iteratively improve and further enhance trust. Primer\u2019s approach represents a significant advancement in responsible AI development, addressing key challenges around transparency and accountability that are essential for the adoption of these transformative technologies in the defense sector.<\/p>\n<p><strong>RAG-V reduces hallucinations<\/strong><\/p>\n<p>High-profile AI hallucinations have already made the news, especially in the legal profession where a lawyer submitted a court brief partially created with ChatGPT that invented case law.<\/p>\n<p>It\u2019s not a mystery why hallucinations happen, though. They\u2019re caused by the way LLMs are trained. Even though they seem to be a mysterious piece of new science, they\u2019re trained in a relatively simple manner that\u2019s like a game of fill-in-the-blank. The LLM is fed text, for example, with some of the words hidden and the model fills in the blanks.<\/p>\n<p>Do that billions of times and the LLM becomes very good at putting words together \u2013 just not always the right words. What the model\u2019s trying to do is generate words that seem as high probability as possible, and read like they would\u2019ve been written by a human.<\/p>\n<p>A partial solution for hallucinations came on the scene in 2021 called Retrieval Augmented Generation \u2013 without the verification part created by Primer \u2013 that mitigated hallucinations down to the level they\u2019re at today. RAG works by retrieving relevant information from a trusted system of record and then including it in the prompt for the generative model. The prompt also instructs the model to answer the user\u2019s question only on that retrieved data without filling in the blanks.<\/p>\n<p>Even with RAG, these models don\u2019t always follow direction and still make errors and hallucinate. What Primer has done to bring down hallucinations to the point where trust can be restored on the part of the user is add a new, last step that fact checks the data against verifiable sources \u2013 hence the verification in RAG-V.<\/p>\n<p>With RAG-V, warfighters executing mission-critical applications can now add a trustworthy LLM capability to their toolkit in these times of great competitors also developing advanced AI capabilities where operations across multiple domains and at the edge are needed.<\/p>\n<figure id=\"attachment_16476\" aria-describedby=\"caption-attachment-16476\" style=\"width: 150px\" class=\"wp-caption alignright\"><img loading=\"lazy\" class=\" wp-image-16476\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2025\/02\/Matthew-Macnak-is-senior-vice-president-of-customer-solutions-engineering-at-Primer-300x300.jpeg\" alt=\"\" width=\"150\" height=\"150\" srcset=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2025\/02\/Matthew-Macnak-is-senior-vice-president-of-customer-solutions-engineering-at-Primer-300x300.jpeg 300w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2025\/02\/Matthew-Macnak-is-senior-vice-president-of-customer-solutions-engineering-at-Primer-150x150.jpeg 150w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2025\/02\/Matthew-Macnak-is-senior-vice-president-of-customer-solutions-engineering-at-Primer.jpeg 512w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><figcaption id=\"caption-attachment-16476\" class=\"wp-caption-text\">Matthew Macnak is senior vice president of customer solutions engineering at Primer.<\/figcaption><\/figure>\n<p>\u201cIn order for us to stay ahead of competitors like China, Russia, Iran, and North Korea \u2013 all of whom are using LLMs \u2013 our AI at its core is about helping the end users do something more efficiently by doing it with less effort in a faster manner,\u201d said Matthew Macnak, senior vice president of customer solutions engineering at Primer. \u201cPrimer is a full platform; it\u2019s not just powered by an LLM so we can actually take a version of that into the field and still have the ability to process massive amounts of unstructured information with fewer personnel and more reliability due to RAG-V.<\/p>\n<p>\u201cImagine that we can put our AI systems into something the size of a suitcase and then place it on an aircraft that\u2019s doing ISR missions. Now you have a few operators able to ingest, let\u2019s say, thousands of voice or text collects and analyze that information in real time rather than just collecting the data, going back, and waiting for the data to be analyzed. Now they can have a result in situ rather than waiting on someone else to tell them what to do.\u201d<\/p>\n<p><strong>More trust, fewer hallucinations<\/strong><\/p>\n<p>Even with RAG, Large Language Models have error rates so bad \u2013 typically 5-10 percent of the time \u2013 that they can\u2019t be trusted for many defense applications, as highlighted. The rate depends on the data and what questions are being asked of the LLM, but that makes them unacceptable for intelligence operations, for example.<\/p>\n<p>\u201cIf you have something that\u2019s going to fib one out of every 10 times you ask it a question, that means that you have to check its work every single time,\u201d noted Bohannon. \u201cYou have to be hypervigilant and therefore that\u2019s a deal breaker in a setting where you\u2019re using this to actually augment a human. When the stakes are high, you can\u2019t afford to have to use a tool that you can\u2019t trust.\u201d<\/p>\n<p>When it comes to LLMs, Bohannon suggests being a skeptical buyer. When an analyst\u2019s need to brainstorm can be addressed with ChatGPT, for example, they\u2019re likely in safe territory regarding hallucinations. But if the need is mission critical with high stakes, users should be wary of LLMs without a final step for verification and fact checking.<\/p>\n<p>\u201cEven when they\u2019re used with best practices like RAG and other guardrails for Large Language Models, they still have an unacceptable error rate. Potential users should come in with the mindset that this technology is new and there\u2019s only a few shops out there like Primer that are taking that problem seriously.\u201d<\/p>\n<p>Added Macnak: \u201cWe\u2019ve all heard \u2018trust but verify.\u2019 The valuable part of RAG-V is that it allows the user to see exactly why it made a decision and more importantly how that output was verified. Our focus remains on the end user, whether that\u2019s an analyst, operator, or warfighter.<\/p>\n<p>\u201cTo that end, we\u2019re building products and technology around those individuals that actually work in order to serve them. The 0.1 percent error rate we\u2019ve reached with RAG-V is a great example of why we\u2019re doing this for users making mission-critical decisions.\u201d<\/p>\n<p><strong>Fuente:<\/strong> <a href=\"https:\/\/breakingdefense.com\/2024\/12\/for-artificial-intelligence-to-be-mission-critical-it-must-hallucinate-less\/\" target=\"_blank\" rel=\"noopener\"><em>https:\/\/breakingdefense.com<\/em><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>El Departamento de Defensa de EE UU ha redoblado sus esfuerzos en favor de un ej\u00e9rcito basado en datos y potenciado por la inteligencia artificial.&hellip; <\/p>\n","protected":false},"author":1,"featured_media":16474,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[2,37,23],"tags":[],"_links":{"self":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/16473"}],"collection":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16473"}],"version-history":[{"count":1,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/16473\/revisions"}],"predecessor-version":[{"id":16477,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/16473\/revisions\/16477"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/media\/16474"}],"wp:attachment":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16473"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16473"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16473"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}