{"id":2391,"date":"2017-10-17T10:21:17","date_gmt":"2017-10-17T13:21:17","guid":{"rendered":"https:\/\/www.nachodelatorre.com.ar\/mosconi\/?p=2391"},"modified":"2017-10-17T10:21:17","modified_gmt":"2017-10-17T13:21:17","slug":"big-data-data-mining-seguridad-grandes-preguntas","status":"publish","type":"post","link":"https:\/\/www.fie.undef.edu.ar\/ceptm\/?p=2391","title":{"rendered":"Big Data &#8211; Data Mining &#8211; Seguridad &#8211; \u00bfGrandes preguntas?"},"content":{"rendered":"<p>RAND-Lex es un programa inform\u00e1tico que puede escanear millones de l\u00edneas de texto e identificar de qu\u00e9 est\u00e1n hablando las personas. El programa ha arrojado luz sobre c\u00f3mo se comunican los terroristas, c\u00f3mo piensa el p\u00fablico \u2026 sobre la salud y m\u00e1s. Bill Marcellino \u00a0pas\u00f3 seis meses con una compa\u00f1\u00eda de marines, atravesando obst\u00e1culos y marchando 25Km , solo para entender c\u00f3mo hablan. .. Lleg\u00f3 a RAND como cient\u00edfico social y conductual, donde se encontr\u00f3 compartiendo una oficina con un inform\u00e1tico llamado Zev Winkelman\u2026<!--more--><\/p>\n<p><img loading=\"lazy\" class=\" alignright\" title=\"Speech bubbles imposed over a world map, image by Olena_T\/Getty Images\" src=\"https:\/\/wwwassets.rand.org\/content\/rand\/blog\/rand-review\/2017\/10\/big-data-big-questions\/_jcr_content\/par\/blogpost.aspectcrop.868x455.cm.jpg\/x1508190816319.jpg.pagespeed.ic.04Y-e-nC6w.jpg\" alt=\"Speech bubbles imposed over a world map\" width=\"389\" height=\"204\" \/>Researchers at RAND made a surprising discovery as they sifted through millions of Arabic tweets. For all its vaunted social-media savvy, the Islamic State was losing the war of words on Twitter. Its opponents outnumbered its supporters six to one. They were calling the group and its fighters the \u201cdogs of fire.\u201d<\/p>\n<p><a href=\"https:\/\/www.rand.org\/pubs\/research_reports\/RR1328.html\" target=\"_blank\" rel=\"noopener noreferrer\">The report<\/a> last year received widespread attention. What got less notice was how the researchers did it. A team at RAND had built a computer program that could scan millions of lines of text and identify what people were talking about, how they fit into communities, and how they saw the world.<\/p>\n<p>The program, known as RAND-Lex, has since shed light on <a href=\"https:\/\/www.rand.org\/pubs\/research_reports\/RR1742.html\" target=\"_blank\" rel=\"noopener noreferrer\">how al-Qa&#8217;ida affiliates communicate<\/a>, how Russian internet trolls operate, and <a href=\"https:\/\/www.rand.org\/blog\/2017\/01\/what-32-million-tweets-tell-us-about-health-and-the.html\" target=\"_blank\" rel=\"noopener noreferrer\">how the American public thinks about health<\/a>. It has helped carry an old lesson of linguistics into the digital age: How people speak speaks volumes about them\u2014even when it&#8217;s 140 characters at a time.<\/p>\n<p id=\"a-holistic-approach-to-text-an-\"><strong>A Holistic Approach to Text Analytics<\/strong><\/p>\n<p><a href=\"https:\/\/www.rand.org\/about\/people\/m\/marcellino_william.html\" target=\"_blank\" rel=\"noopener noreferrer\">Bill Marcellino<\/a> once spent six months with a company of Marines, slogging through obstacle courses and gutting out 15-mile hikes, just to understand how they talk. He came to RAND in 2010 as a social and behavioral scientist, where he found himself sharing an office with a computer scientist named <a href=\"https:\/\/www.rand.org\/about\/people\/w\/winkelman_zev.html\" target=\"_blank\" rel=\"noopener noreferrer\">Zev Winkelman<\/a>. Winkelman had left a job in the financial industry after the Sept. 11 terrorist attacks to work on big-data approaches to national security.<\/p>\n<div class=\"photo photo-left\"><img loading=\"lazy\" class=\" alignright\" title=\"Bill Marcellino (left) and Zev Winkelman, photo by Dori Gordon Walker\/RAND Corporation\" src=\"https:\/\/wwwassets.rand.org\/content\/rand\/blog\/rand-review\/2017\/10\/big-data-big-questions\/_jcr_content\/par\/blogpost\/par-blog\/imagewithclass.aspectfit.0x0.jpg\/x1507656115997.jpg.pagespeed.ic.SKyaV4p1lc.jpg\" alt=\"Bill Marcellino (left) and Zev Winkelman\" width=\"382\" height=\"215\" \/><\/div>\n<p>They soon realized they were working on the same kinds of puzzles, just from different perspectives. Marcellino was using what he knew about the big picture of language to understand what was unique or telling about individual pieces of text. Winkelman was looking at text, too\u2014but he was using computers to identify the distinct pieces first, and then work back to the bigger picture.<\/p>\n<p>\u201cWe realized we could bring together social science and computer science to make meaning out of huge data sets of text,\u201d Winkelman says. \u201cWe could build something more holistic, something that people could use, a center of gravity for text analytics.\u201d<\/p>\n<p id=\"distinguishing-isis-supporters-\"><strong>Distinguishing ISIS Supporters and Opponents on Twitter<\/strong><\/p>\n<p>Marcellino and Winkelman started coming in early and staying late to turn their ideas into computer code. Their first version, RAND-Lex 1.0, could scroll through millions of lines of text and compare them against a linguistic baseline. It was looking for surprises\u2014words or phrases that appeared more often than expected, statistical outliers. It might flag the words \u201csingle-payer,\u201d \u201cpreexisting,\u201d and \u201cObamacare\u201d in a transcript for a health-care debate, for example\u2014not necessarily the most frequent words, but the most distinct.<\/p>\n<div class=\"pull-quote right\">\n<p>ISIS opponents preferred to belittle the group by abbreviating its name in Arabic to Daesh.<\/p>\n<\/div>\n<p>That&#8217;s how researchers at RAND were able to get an unprecedented look at the online messaging battle between ISIS supporters and opponents. They found that supporters almost always referred to the group by its full name, the Islamic State. Opponents preferred to belittle the group by abbreviating its name in Arabic to Daesh.<\/p>\n<p>But when the researchers fed only those Daesh tweets into RAND-Lex, they found that, for all their numbers, <a href=\"https:\/\/www.rand.org\/pubs\/perspectives\/PE227.html\" target=\"_blank\" rel=\"noopener noreferrer\">opponents often were speaking past each other<\/a>. Gulf State Shia blamed Saudi Arabia for the rise of ISIS; Saudi Arabia and its Sunni neighbors blamed Shia Iran. And none of them matched up with the Syrian mujahideen, who sometimes applauded ISIS fighters even while denouncing the group&#8217;s brutality.<\/p>\n<p><a href=\"https:\/\/www.rand.org\/pubs\/research_reports\/RR1328.html\" target=\"_blank\" rel=\"noopener noreferrer\">The study<\/a> revealed fierce opposition to ISIS across communities on Arabic Twitter. But it also showed that a one-size-fits-all approach to <a href=\"https:\/\/www.rand.org\/blog\/2016\/10\/fighting-the-islamic-state-on-social-media.html\" target=\"_blank\" rel=\"noopener noreferrer\">countering ISIS&#8217;s online message<\/a> would fall flat.<\/p>\n<p id=\"following-the-linguistic-finge-\"><strong>Following the Linguistic Fingerprints of ISIS<\/strong><\/p>\n<p>The RAND-Lex team narrowed its focus to <a href=\"https:\/\/www.rand.org\/pubs\/research_reports\/RR1742.html\" target=\"_blank\" rel=\"noopener noreferrer\">Egypt in another study<\/a>. The researchers wanted to see if they could measure how well ISIS&#8217;s message resonated with people far outside its home turf of Iraq and Syria.<\/p>\n<p>To do that, they ran ISIS speeches, proclamations, and articles through RAND-Lex, looking for distinct words\u2014the group&#8217;s linguistic fingerprints. Then they looked for those same words in more than 6 million Egyptian tweets, to see whether people were starting to talk like ISIS.<\/p>\n<p>They found that only around 1 or 2 percent of the population was borrowing words from ISIS. They were much more likely to describe the world in terms taken from the Muslim Brotherhood. But the number of ISIS-imitating accounts grew in the months the researchers followed, especially in poorer places like the Sinai, a sign that its message was starting to stick with some Egyptians.<\/p>\n<p>The next update to RAND-Lex helped researchers understand why. It was able to not just pull out distinct words and phrases, but also assign values to them\u2014to discern angry words from happy words, for example, or future-facing words from backward-looking words. It could start to get a feel for the text.<\/p>\n<div class=\"pull-quote left\">\n<p>ISIS speech was often intense, future-oriented, focused on social values and relationships\u2014a rallying cry.<\/p>\n<\/div>\n<p>It found that ISIS speech was not as hateful and negative as some might expect. Instead, it was often intense, future-oriented, focused on social values and relationships\u2014a rallying cry. It used \u201cwe\u201d phrases, but not so much \u201cthem\u201d phrases.<\/p>\n<p>The language of al-Qa&#8217;ida in the Arabian Peninsula, by comparison, was informational, even technical\u2014less a call to action than a report of how and why something had happened. It often read, the researchers noted, like a how-to manual.<\/p>\n<p>\u201cWe can show what is unique about how different people talk about the world, and how they tackle the world,\u201d Marcellino said. \u201cThey&#8217;re inextricably linked. We talk about the world in ways that reflect how we see the world.\u201d<\/p>\n<p id=\"more-needles-more-haystacks-\"><strong>More Needles, More Haystacks<\/strong><\/p>\n<p>In more recent months, RAND researchers have scanned tens of millions of American tweets into RAND-Lex to understand <a href=\"https:\/\/www.rand.org\/blog\/2017\/01\/what-32-million-tweets-tell-us-about-health-and-the.html\" target=\"_blank\" rel=\"noopener noreferrer\">how people talk about health and wellness<\/a>. They found that people are more likely to talk about being sick than about staying well\u2014a possible opening for healthy-living campaigns to change the conversation.<\/p>\n<p>Researchers have also used RAND-Lex to examine Russian propaganda on Twitter\u2014and found a running online battle between Russian propagandists and Ukrainian activists. Marcellino ran hundreds of blog items through RAND-Lex to see how Americans were talking about privilege; most addressed \u201cwhite privilege\u201d or \u201cmale privilege,\u201d he found, but almost none mentioned class privilege.<\/p>\n<p>The computer program he and Winkelman built by hand, with help and support from across RAND, has expanded beyond keyword testing and value comparisons. It can search through volumes of text and pull out the major themes; it can learn from small samples of text how to classify much larger collections. It can tease out the overall stance of a text in English or Arabic, with Russian in the works. RAND recently made it available to outside researchers as a subscription service.<\/p>\n<p>\u201cWe live in a world where the amount of data is increasing all the time,\u201d Marcellino said. \u201cIt&#8217;s not just that the haystacks are getting bigger and bigger. The number of haystacks is increasing exponentially. We need new ways to find the needles.<\/p>\n<p>\u201cWe&#8217;ve realized that if you leverage what machines are good at and what humans are good at, you can do really, really important work, at massive scales.\u201d<\/p>\n<p>It is, if anything, a growth industry. In the time it takes you to finish this sentence, 6,000 new messages will have whistled across Twitter alone.<\/p>\n<p><strong>Fuente:<\/strong> <em><a href=\"https:\/\/www.rand.org\/blog\/rand-review\/2017\/10\/big-data-big-questions.html\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.rand.org<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>RAND-Lex es un programa inform\u00e1tico que puede escanear millones de l\u00edneas de texto e identificar de qu\u00e9 est\u00e1n hablando las personas. El programa ha arrojado&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[23,29],"tags":[],"_links":{"self":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/2391"}],"collection":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2391"}],"version-history":[{"count":0,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/2391\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2391"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2391"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}