{"id":2644,"date":"2018-02-07T09:14:11","date_gmt":"2018-02-07T12:14:11","guid":{"rendered":"https:\/\/www.nachodelatorre.com.ar\/mosconi\/?p=2644"},"modified":"2018-02-07T09:14:11","modified_gmt":"2018-02-07T12:14:11","slug":"procesadores-de-aprendizaje-profundo-para-dispositivos-inteligentes-de-iot","status":"publish","type":"post","link":"https:\/\/www.fie.undef.edu.ar\/ceptm\/?p=2644","title":{"rendered":"Procesadores de aprendizaje profundo para dispositivos inteligentes de IoT"},"content":{"rendered":"<p>En solo unos pocos a\u00f1os, Inteligencia Artificial (AI) \/ Aprendizaje Profundo (Deep Learning (DL) \/ Reinforcement Learning (RL) \/ Machine Learning (ML) se han convertido en herramientas importantes para muchas industrias y ahora contin\u00faan en un r\u00e1pido ciclo de innovaci\u00f3n.<!--more--><\/p>\n<p>In the past few years, the Artificial Intelligence field has entered a high growth phase, driven largely by advancements in Machine Learning methodologies like Deep Learning (DL) and Reinforcement Learning (RL). Combinations of those techniques demonstrate unprecedented performance in solving a wide range of problems, from\u00a0<a href=\"https:\/\/www.iotforall.com\/creativity-and-artificial-intelligence\/\">playing Go at super-human<\/a>\u00a0level to diagnosing cancer like a specialist.<\/p>\n<p>In our previous blogs,\u00a0<a href=\"https:\/\/www.iotforall.com\/intelligent-iot-fog-computing-trends\/\">Intelligent IoT and Fog Computing Trends<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.iotforall.com\/computer-vision-iot\/\">The Rise of Ubiquitous Computer Vision In IoT<\/a>, we talked about some interesting use cases of DL in IoT. The applications will be both broad and deep. They are going to fuel the demand for new breeds of processors in coming decades.<\/p>\n<h2>Deep Learning Workflow Overview<\/h2>\n<p>DL\/RL innovations are happening at an astonishing pace (thousands of papers with new algorithms are presented in numerous\u00a0<a href=\"https:\/\/www.iotforall.com\/iot-data-and-ai-summit-recap-part-two\/\">AI related conferences<\/a>\u00a0every year). Though it is premature to predict the final winning solutions, hardware companies are racing to build processors, tools, and frameworks. They are trying to identify pain points and bottlenecks in DL workflows (Fig. 1), leveraging years of experience of researchers.<\/p>\n<h2>Deep Learning Workflow<\/h2>\n<figure id=\"attachment_7550\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"wp-image-7550 size-full td-animation-stack-type2-2\" src=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/DL-Workflow.png\" sizes=\"(max-width: 720px) 100vw, 720px\" srcset=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/DL-Workflow.png 720w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/DL-Workflow-300x225.png 300w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/DL-Workflow-600x450.png 600w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/DL-Workflow-200x150.png 200w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/DL-Workflow-80x60.png 80w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/DL-Workflow-265x198.png 265w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/DL-Workflow-696x522.png 696w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/DL-Workflow-560x420.png 560w\" alt=\"\" width=\"720\" height=\"540\" \/><figcaption class=\"wp-caption-text\">Fig. 1: Basic Deep Learning Workflow<\/figcaption><\/figure>\n<h2>Platforms For Training DL Models<\/h2>\n<p>Let\u2019s start with training platforms. Graphical Processing Units (GPU) based systems are usually the choice for training advanced DL models. Nvidia has long realized the advantages of using GPU for general purpose high performance computing.<\/p>\n<p>GPU has hundreds of compute cores that support a large number of hardware threads and high throughput floating point computations. Nvidia developed Compute Unified Device Architecture (CUDA) programming framework to make GPU friendly for scientists and machine learning experts to use.<\/p>\n<p>CUDA toolchain has improved overtime, providing researchers a flexible and friendly way to realize highly complex algorithms. A few years ago, Nvidia aptly identified the DL opportunity and persistently developed CUDA support for most of DL operations. Standard frameworks like Caffe, Torch, and Tensorflow all support CUDA.<\/p>\n<p>In cloud services like AWS, developers have a choice between using CPU or GPU (more specifically Nvidia GPU). Platform choice depends on the complexity of the neural networks, budget, and time. GPU based systems can usually cut the training time by several times over CPU but are more expensive (Fig. 2)<\/p>\n<figure id=\"attachment_7551\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"wp-image-7551 size-full td-animation-stack-type2-2\" src=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/AWS_P2.jpg\" sizes=\"(max-width: 921px) 100vw, 921px\" srcset=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/AWS_P2.jpg 921w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/AWS_P2-300x205.jpg 300w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/AWS_P2-768x524.jpg 768w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/AWS_P2-600x409.jpg 600w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/AWS_P2-200x136.jpg 200w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/AWS_P2-218x150.jpg 218w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/AWS_P2-696x475.jpg 696w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/AWS_P2-616x420.jpg 616w\" alt=\"\" width=\"921\" height=\"628\" \/><figcaption class=\"wp-caption-text\">Fig. 2: AWS EC2 GPU Instances<\/figcaption><\/figure>\n<h3>Alternatives to GPU \/ CPU<\/h3>\n<p>Alternatives are coming. Khronos proposed\u00a0<a href=\"https:\/\/www.khronos.org\/opencl\/\">OpenCL<\/a>\u00a0in 2009 which is an open standard for parallel computing on a wide range of hardwares like CPU, GPU, DSP or FPGA. It will enable other processors like AMD GPUs to enter DL training market, providing developers with more choices.<\/p>\n<p>However, it is still behind CUDA in DL library support. Hopefully, that situation will improve in the next few years. Intel is also developing processors customized for DL training through its Nervana acquisition.<\/p>\n<h2>Competitive Landscape of DL Inference<\/h2>\n<p>DL inference is a very competitive market. Applications can be deployed at multiple levels, usually depending on the requirements of the use cases:<\/p>\n<ul>\n<li>Cloud \/ Enterprise: Image classifications, Cybersecurity, Text Analytics, NLP, etc.<\/li>\n<li>Smart Gateways: Biometrics, Speech Recognition, Smart Agent, etc.<\/li>\n<li>Edge endpoints: Mobile devices, Smart cameras, etc.<\/li>\n<\/ul>\n<h3>Cloud Inference<\/h3>\n<p>Cloud inference market will see tremendous growth, with a strong push from internet giants like Google, Facebook, Baidu, or Alibaba. For example, Google Cloud and Microsoft Azure offer very strong image classification, natural language processing, and face recognition APIs that developers can easily integrate into their cloud applications.<\/p>\n<p>Cloud inference platforms will need to support millions of simultaneous users reliably. The ability to scale the throughput is critical. Besides, cutting down energy consumption is another top priority in order to control operating cost of their services.<\/p>\n<p>On cloud inference space, in addition to GPUs, data centers are using FPGA or customized processors to make cloud inference applications more cost effective and power efficient. For example,\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-unveils-project-brainwave\/\">Microsoft Project Brainwave<\/a>\u00a0uses Intel FPGAs to demonstrate strong performance and flexibilities in running DL algorithms like CNN, LSTM, etc.<\/p>\n<figure id=\"attachment_7552\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"wp-image-7552 size-full td-animation-stack-type2-2\" src=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelStratix.jpg\" sizes=\"(max-width: 331px) 100vw, 331px\" srcset=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelStratix.jpg 331w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelStratix-300x203.jpg 300w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelStratix-200x135.jpg 200w\" alt=\"\" width=\"331\" height=\"224\" \/><figcaption class=\"wp-caption-text\">Fig. 3: Intel 14nm Stratix FPGA<\/figcaption><\/figure>\n<p>FPGAs have advantages. The hardware logics, compute kernels, and memory configurations are customizable for a specific type of neural network, making it more efficient in tackling a pre-trained model. However, one drawback is the difficulty of programming compared to CPU or CUDA. As mentioned in the previous section, OpenCL will be helpful in making FPGA more software developer friendly.<\/p>\n<p>Besides FPGA, Google is also making a customized processor called TPU.\u00a0<a href=\"https:\/\/www.barrons.com\/articles\/intel-can-beat-nvidia-in-inference-of-a-i-says-morningstar-1504298887\">It is an ASIC that focus on highly efficient matrix calculations<\/a>. However, it is only supported within Google\u2019s own services.<\/p>\n<p>Here are some of the players in DL cloud inference.<\/p>\n<table>\n<tbody>\n<tr>\n<td>Categories<\/td>\n<td>Processors<\/td>\n<td>Remarks<\/td>\n<\/tr>\n<tr>\n<td>Customized DL processors<\/td>\n<td>Google TPUIntel Nervana<\/td>\n<td>\n<ul>\n<li>Google TPU is for Google Cloud internal use only<\/li>\n<li><a href=\"https:\/\/wccftech.com\/intel-lake-crest-chip-detailed-32-gb-hbm2-1-tb\/\">Intel Nervana Lake Crest Processor<\/a>\u00a0with Tensor based architecture<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td>GPU with DL accelerator<\/td>\n<td>Nvidia Volta (V100)<\/td>\n<td>\n<ul>\n<li>Added Tensor Core supporting common matrix operations<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td>FPGA<\/td>\n<td>Xilinx, Intel<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h3>Embedded DL Inference For Intelligent Edge Computing<\/h3>\n<p>On the edge, DL inference solutions need to address a diverse set of requirements for different use cases and markets.<\/p>\n<h4>Autonomous Driving Platforms<\/h4>\n<p>Autonomous vehicle platforms are currently the hottest market where the state-of-the-art DL and RL methods are being applied to achieve the highest level of autonomous driving. Nvidia has been leading the market with several classes of DL SoCs from Tegra to Xavier.\u00a0 For example, Xavier SoC is built into Nvidia\u2019s Drive PX platforms that can achieve up to 320 TFLOP. It is going to target level 5 autonomous driving.<\/p>\n<h4>Mobile Processors<\/h4>\n<p>Another rapid growth area is mobile application processors. DL enables new features on smartphones that were not possible before. One example is Apple\u2019s neural engine integration into A11 Bionic chip, which enables it to add high accuracy face locking on the iPhone X.<\/p>\n<p>Chinese chipmaker HiSilicon has also released its Kirin 970 processor which features a Neural Processing Unit (NPU). Some of Huawei\u2019s latest smartphones (Fig. 4) are already designed with the new DL processors. For example, using the NPU, the smartphone camera \u201cknows\u201d what it is looking at and adjusts the camera settings automatically depending on the subject of the scene (e.g. human, plants, landscape, etc).<\/p>\n<figure id=\"attachment_7553\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"wp-image-7553 size-full td-animation-stack-type2-2\" src=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/HuaweiMate.jpg\" sizes=\"(max-width: 331px) 100vw, 331px\" srcset=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/HuaweiMate.jpg 331w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/HuaweiMate-161x300.jpg 161w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/HuaweiMate-200x373.jpg 200w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/HuaweiMate-225x420.jpg 225w\" alt=\"\" width=\"331\" height=\"618\" \/><figcaption class=\"wp-caption-text\">Fig. 4: Huawei Mate 10 Pro \u2013 Subject Aware Camera<\/figcaption><\/figure>\n<p>The following tables list some of the processors for DL inference applications.<\/p>\n<table>\n<tbody>\n<tr>\n<td>Company<\/td>\n<td>Chip<\/td>\n<td>Remarks<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\">Nvidia<\/td>\n<td>Tegra<\/td>\n<td>Jetson TX1, TX2<\/td>\n<\/tr>\n<tr>\n<td>Xavier<\/td>\n<td>Volta architecture with Tensor Cores specifically for DL operations. Drive PX Xavier, PX Pegasus Platforms<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\">Intel<\/td>\n<td>Movidius Myriad<\/td>\n<td>Vision Processing Unit (VPU) targeting computer vision for drone, robotics, etc.<\/td>\n<\/tr>\n<tr>\n<td>MobileEye<\/td>\n<td>MobileEye EyeQ is specifically built for autonomous driving market<\/td>\n<\/tr>\n<tr>\n<td>Qualcomm<\/td>\n<td>Snapdragon 600\/800<\/td>\n<td>Neural Network Engine SDK uses Hexagon DSP + Adrenu GPU for building efficient DL inference for edge devices<\/td>\n<\/tr>\n<tr>\n<td>Samsung<\/td>\n<td>Exynos 9 Series 9810<\/td>\n<td>Target smartphones: e.g. Galaxy S9<\/td>\n<\/tr>\n<tr>\n<td>HiSilicon\/Huawei<\/td>\n<td>Kirin 970<\/td>\n<td>Target smartphones: e.g. Huawei Mate and Honor<\/td>\n<\/tr>\n<tr>\n<td>Rockchip<\/td>\n<td>RK 3399Pro<\/td>\n<td>Target security monitoring, drones, etc.<\/td>\n<\/tr>\n<tr>\n<td>Mediatek<\/td>\n<td>Helio P and X series<\/td>\n<td>Target smartphones: e.g. Oppo and Meizu.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>New Architectures<\/h2>\n<p>It is worth mentioning that there is a new category of processors, called neuromorphic processors, which closely mimic the mechanism of neurons and synapses of human brains. They can realize a type of neural network called Spiking Neural Network (SNN) which learns in both the spatial and temporal domains.<\/p>\n<form id=\"mc4wp-form-1\" class=\"mc4wp-form mc4wp-form-3235 mc4wp-form-styles-builder mc4wp-ajax\" method=\"post\" data-id=\"3235\" data-name=\"Horizontal Newsletter Signup\">\n<div class=\"mc4wp-form-fields\">\n<div class=\"hornewsbody\">\n<div class=\"honwheading\">IoT For All Newsletter<\/div>\n<div class=\"nohobodytext\">Sign up for our weekly newsletter and exclusive content!<\/div>\n<p><input name=\"EMAIL\" required=\"\" type=\"email\" placeholder=\"Your email address\" \/><\/p>\n<p><input type=\"submit\" value=\"Sign up\" \/><\/p>\n<\/div>\n<\/div>\n<div class=\"mc4wp-response\"><\/div>\n<\/form>\n<p>&nbsp;<\/p>\n<p>In principle, they are much more power efficient compared to existing DL architectures and have advantages in tackling online machine learning problems.<\/p>\n<p>IBM\u2019s TrueNorth and Intel\u2019s Loihi are based on neuromorphic architecture. Researchers are exploring the capabilities of the chips,\u00a0<a href=\"http:\/\/www.pnas.org\/content\/113\/41\/11441.abstract\">showing some potential<\/a>. It is unclear when the new types of processors will be ready for broad commercial use. \u00a0A number of startups like\u00a0<a href=\"http:\/\/appliedbrainresearch.com\/projects\/\">Applied Brain Research<\/a>\u00a0and\u00a0<a href=\"http:\/\/www.electronicdesign.com\/embedded-revolution\/brainchip-enters-ai-territory-spiking-neural-network\">Brainchip<\/a>\u00a0are also focusing on this area, developing tools and IPs.<\/p>\n<figure id=\"attachment_7555\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"wp-image-7555 size-full td-animation-stack-type2-2\" src=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelLoihi.jpg\" sizes=\"(max-width: 738px) 100vw, 738px\" srcset=\"https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelLoihi.jpg 738w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelLoihi-300x225.jpg 300w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelLoihi-600x450.jpg 600w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelLoihi-200x150.jpg 200w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelLoihi-80x60.jpg 80w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelLoihi-265x198.jpg 265w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelLoihi-696x522.jpg 696w, https:\/\/www.iotforall.com\/wp-content\/uploads\/2018\/01\/IntelLoihi-559x420.jpg 559w\" alt=\"\" width=\"738\" height=\"554\" \/><figcaption class=\"wp-caption-text\">Fig. 5: Intel Loihi<\/figcaption><\/figure>\n<h2>It\u2019s an Interesting Time<\/h2>\n<p>In just a short few years, AI\/DL\/RL\/ML have become important tools for many industries. The underlying ecosystem, from IPs, processors, system designs to toolchains and software methodologies, has entered a rapid innovation cycle. New processors will enable many new IoT use cases which were not feasible before.<\/p>\n<p>However, IoT and Machine Learning use cases are still evolving. it will take generations of processors for chip designers and developers to come up with the right mix of architecture in addressing the needs of various markets. We will take a deeper look into compute platforms for various verticals in future articles.<\/p>\n<p><strong>Fuente:<\/strong>\u00a0<em><a href=\"https:\/\/www.iotforall.com\/deep-learning-processors-for-iot\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.iotforall.com<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>En solo unos pocos a\u00f1os, Inteligencia Artificial (AI) \/ Aprendizaje Profundo (Deep Learning (DL) \/ Reinforcement Learning (RL) \/ Machine Learning (ML) se han convertido&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[23,29],"tags":[],"_links":{"self":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/2644"}],"collection":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2644"}],"version-history":[{"count":0,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/2644\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2644"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2644"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2644"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}