مسرع الذكاء الاصطناعي

يُعرّف مسرّع الذكاء الاصطناعيّ بأنّه فئة من مسرّعات عتاد الحاسوب المخصص، ^[1] أو نظام الحاسوب،^[2]^[3] وقد صُمّم هذا المسرّع لتسريع عمل تطبيقات الذكاء الاصطناعيّ وخصوصاً، الشبكات العصبونيّة الاصطناعيّة والرؤية الآلية وتعلّم الآلة. تتضمّن التطبيقات النموذجيّة خوارزمياتٍ للروبوتيات وإنترنت الأشياء ومهام أخرى ذات بيانات ضخمة أو حسيّة، ^[4] وتتميّز بتصميماتها متعدّدة النواة، كما أنّها تتركز عمومًا على الحساب منخفض الدقة وهياكل تدفق البيانات الحديثة أو إمكانيّة الحوسبة في الذاكرة.^[5] أصبحت شريحة الدائرة المتكاملة النموذجيّة للذكاء الاصطناعيّ تحوي بلايين من مقحلات موسفيت (ترانزستور)، ^[6] بالإضافة إلى أنه يوجد عدد من المصطلحات الخاصة بالموّردات في الأجهزة الموجودة في هذه الفئة، وتجدر الإشارة إلى أنها تعتبر تقنيّة ناشئة دون تصميم سائد.

التاريخ

لطالما كملّت أنظمة الحاسوب وحدة المعالجة المركزية بمسرّعات ذات أغراض خاصّة لأداء مهام مخصّصة التي تُعرف بالمعالجات المساعدة. تتضمّن أبرز وحدات عتاد الحاسوب الخاصّة بالتطبيق بطاقات فيديو للرسوميات وبطاقات صوت ووحدات معالجة الرسوميات ومعالجات الإشارة الرقميّة، وحينما ذاع صيت أحجام عمل التعلّم العميق والذكاء الاصطناعيّ في 2010، تطورت وحدات عتاد حاسوب مخصصّة أواُقتُبست من منتجات موجودة بالفعل لتسريع هذه المهام.

محاولات أولية

كانت تُستخدم معالجات الإشارة الرقميّة كمسرّعات شبكات عصبونيّة في بدايات عام 1993 ومثال على ذلك هو تسريع برنامج التعرف الضوئي على الحروف^[7] كما كان هنالك محاولات لصنع أنظمة متوازية عالية الانتاجيّة في التسعينيات لحواسيب محطات العمل الشاملة لتطبيقات متنوعة التي تتضمّن محاكيات الشبكات العصبونيّة^[8]^[9]^[10] واُكتشفت في تلك الفترة كذلك مسرّعات مبنيّة على مصفوفات البوابات المنطقيّة القابلة للبرمجة لكل من عمليتي الاستنباط ^[11] والتدريب.^[12] طوّر«يان لوكون» مسرّعاً لأشباه الموصلات ذات أكسيد المعادن التكميلي للشبكة العصبونية الذي سمّاه «آنّا».^[13]

الحوسبة غير المتجانسة

تُعّرف الحوسبة غير المتجانسة بأنّها دمح عدد من المعالجات المخصّصة في نظام أو حتى شريحة واحدة وكلّ منهما محسّنٌ لأداء مهمّة من نوع خاصّ. تتميّز هياكل معيّنة كمعالج الخليّة الدقيق ^[14] بميزات تتداخل بشكل ملحوظ مع مسرّعات الذكاء الاصطناعيّ وتتضمّن: الدعم للحساب منخفض الدقّة الكامل وهيكلة تدفّق البيانات وتقديم الإنتاجيّة على فترة الانتظار، كما طُبق فيما بعد استخدام معالج الخليّة الدقيق مع عدد من المهام^[15]^[16]^[17] التي تضمّنت الذكاء الاصطناعيّ.^[18]^[19]^[20] كما كسبت وحدات المعالجة المركزيّة بشكل متزايد في بداية الألفية، الكثير من «وحدات آلة تدفق البيانات المتعددة» التي حُفزت عن طريق أحجام عمل الفيديوهات والألعاب الإلكترونية بالإضافة إلى الدعم لأنواع البيانات الكاملة منخفضة الدقّة.^[21]

استخدام وحدة معالجة الرسوميات

تُعرّف وحدات معالجة الرسوميّات أو (GPUs) بأنها عتاد حاسوب مخصّص لمعالجة بالصور وإيجاد خصائص الصورة الداخليّة. يتشابه الأساس الرياضيّ للشبكات العصبونيّة مع معالجة الصور فهي تشمل مهامًّا ذات توازي مربك المتضمّنة للمصفوفات الثنائية وبذلك فقد مكنّت وحدات معالجة الرسوميات من أن تصبح مستخدمة بكثرة لمهام تعليم الآلة.^[22]^[23]^[24] أصبحت وحدات معالجة الرسوميّات بحلول عام 2016 مشهورة بعمل الذكاء الاصطناعيّ ولاتزال تستمّر بالتطوّر نحو تسهيل التعّلم العميق لكل من الاستنباط والتدريب^[25] لدى الآلات كالسيارات ذاتيّة القيادة.^[26] يعمل حاليا مطورو وحدة المعالجة الرسومية كشركة إنفيديا إن في لينك على تطوير ميزة موّصّلة إضافية لنوع أحجام عمل تدفق البيانات الذي يستفيد منه الذكاء الاصطناعيّ،^[27] وكما زاد تطبيق وحدات معالجة الرسوميّات بكثرة على الذكاء الاصطناعيّ، أدرج مصنعو تلك الوحدات عتاد حاسوب مخصّص للشبكات العصبونيّة لزيادة تسريع هذه المهامّ.^[28]^[29] كما تهدف«معالجات البيانات على قلب الأنوية» إلى تسريع تدريب الشبكات العصبونيّة.^[29]

استخدام مصفوفات البوابات المنطقيّة القابلة للبرمجة

لاتزال إطارات عمل التعلّم العميق في خضم التطوّر مما شكّل صعوبة في تصميم عتاد حاسوب مخصّص، بينما سهلّت الأجهزة القابلة لإعادة التشكيل، كمصفوفات البوابات المنطقية القابلة للبرمجة (FPGA)، تطوير عتاد الحاسوب وإطارات العمل والبرامج سوية.^[11]^[12]^[30]^[31] لقد استخدمت ميكروسوفت شرائح مصفوفات البوابات المنطقية القابلة للبرمجة (FPGAs) لتسريع الاستنباط^[32] كما دفع تطبيقها على الذكاء الاصطناعيّ شركة إنتل للاستحواذ على شركة ألتيرا بهدف دمج تلك المصفوفات في خادم وحدات المعالجة المركزيّة الذي سيكون بإمكانه تسريع الذكاء الاصطناعيّ إضافة إلى المهام ذات الأغراض العامّة.^[33]

ظهورالدائرات المتكاملة الخاصّة بالتطبيق لمسرّع الذكاء الاصطناعيّ (ASICs)

على الرغم من أن أداء كلاً من وحدات معالجة الرسوميات (GPUs) ومصفوفات البوابات المنطقيّة القابلة للبرمجة (FPGAs) يتفوّق كثيرًا على أداء وحدات المعالجة المركزيّة (CPUs) إلّا أنّ عاملاً يمقدار 10 في الفعاليّة^[34]^[35] قد يُحرَز باستخدام تصميم أكثر دقّة بالاستعانة بالدائرة المتكاملة الخاصّة بالتطبيق (ASIC). تُوّظف هذه المسرّعات استراتيجيّات كالاستخدام المحسّن للذاكرة واستخدام الحساب منخفض الدقّة لتسريع العمليات الحسابية ومعدّل انتاجيّة الحوسبة.^[36]^[37] استخدمت بعض صيغ الفاصلة العائمة (Floating point formats) ذات دقة منخفضة تسريع الذكاء الاصطناعيّ فأصبحت بنصف دقّة وبصيغة الفاصلة العائمة المسمّى (bfloat16).^[38]^[39]^[40]^[41]^[42]^[43]^[44] تصمّم حاليّا شركات كفيسبوك وأمازون وقوقل داراتهم المتكاملة الخاصّة بالتطبيق الذكيّة اصطناعيّاً (AI ASICS).^[45]^[46]

هيكلة الحوسبة في الذاكرة

أعلن باحثو شركةIBM في يونيو عام 2017 عن هيكلة معاكسة لهيكلة فون نيومان المبنيّة على الحوسبة في الذاكرة ومنظومات الذاكرة متغيرة الطور التي طُبقّت على الاكتشاف الارتباطيّ الزمانيّ وبذلك كانوا ينوون تعميم انتهاج الحوسبة غير المتجانسة والأنظمة المتوازية الهائلة^[47] كما أعلنوا أيضاً في أكتوبر عام 2018 عن هيكلة مبنيّة على المعالجة في الذاكرة والمنمذجة على شبكة متشابكة لدماغ بشريّ لتسريع الشبكات العصبونيّة العميقة^[48] وهذا النظام مبنيّ على مصفوفات الذاكرة متغيرة الطور.^[49]

الحوسبة في الذاكرة باستخدام ذاكرات مقاومة تماثلية

اكتشف باحثون من جامعة البوليتكنيك في ميلانو عام 2019 طريقة لحلّ أنظمة المعادلات الخطيّة في عشرات قليلة من النانو ثواني من خلال عمليّة واحدة وبُنيت خوارزميتهم على الحوسبة في الذاكرة باستخدام ذاكرات مقاومة تماثلية التي يتميّز أداؤها بكفاءة عالية في الوقت والطاقة، وذلك بواسطة إجراء عملية ضرب متجّه ثنائي المصفوفة في خطوة واحدة مع قانون أوم وكيرشوف. استنتج الباحثون أنّه يمكن لدائرة الرد التصحيحي بالاستعانة بذاكرات مقاومة ذات نقطة تقاطع أن تحلّ مسائل جبريّة كأنظمة المعادلات الخطيّة والمتجهات الذاتية ثنائية المصفوفة والمعادلات التفاضليّة في خطوة واحدة فقط. تُحسّن هذه الطريقة الأوقات الحسابيّة بشكل كبير بالمقارنة بالخوارزميّات المتعارف عليها.^[50]

أشباه الموصلات النحيلة ذريّاً

نشر ماريجا وعلماء آخرون في عام 2020 تجارب لمادة قنويّة نشطة كبيرة المساحة لتطويرأجهزة ذات المنطق في الذاكرة ودائرات مبنيّة على الموّصلات القائمة علـى تأثير المجال الكهربي ذات البوابات العائمة (FGFETs).تعتبر أشباه الموّصلات النحيلة ذريّاً هذه واعدة لمستقبل تطبيقات تعلم الآلة فعّالة الطاقة حيثما يُستخدم هيكل الجهاز الأساسي ذاته للعمليات المنطقية وتخزين البيانات. استخدم المؤلفون موادا ثنائية الأبعاد كشبه الموّصل ثاني كبريتيد الموليبدنوم.^[51]

نظام التسمية

لايزال هذا الاختصاص في حالة تغيّر مستمّر بمطلع عام 2016 إضافة إلى إتيان مورديه الملّح بمسماهم التسويقي الخاص الذي يرادف مسمّى «مسرّع الذكاء الاصطناعيّ» أملاً أن تصبح تصميماتهم وواجهات برمجة تطبيقاتهم APIs هي التصميم السائد. لايوجد إجماع على الفاصل بين هذه الأجهزة عن بعضها ولا على المظهر المحددّ الذي ستتخذه ولكن تهدف بوضوح عدّة أمثلة إلى ملء هذا الفراغ بمقدار عادل من الكفاءات المختلفة. عندما ظهرت مسرّعات رسوميّات المستهلك في الماضي، تبّنى قطاع الصناعة في نهاية المطاف المسمّى الذاتيّ لنيفيديا وهو وحدة معالجة الرسوميّات "GPU"^[52] كالاسم الجامع ل«مسرّعات الرسوميّات» الذي اتخذّ أشكالا عديدة قبل أن يستقّر على مجرى اجماليّ لتنفيذ نموذج مقدّم من قبل واجهة دايركت ثري دي.

تطبيقات محتملة

•السيارات ذاتيّة القيادة: لقد استهدفت شركة نيفيديا لوحات سلسلتهم Drive PX فيما يخصّ هذا المجال.^[53]

•روبوتات عسكرية.

•روبوتات زراعيّة: مثل مكافحة الأعشاب بدون استخدام مبيد الآفات.^[54]

• التحكم بالصوت: مثل مايوجد في الهواتف المحمولة التي تعّد هدفا للبرنامج الصفري Zeroth لشركة كوالكوم.^[55]

•الترجمة الآليّة.

•الطائرات بدون طيّار: كأنظمة الملاحة وأيضا وحدة المعالجة Movidius Myriad 2 التي نجحت في توجيه الدرونز ذاتية القيادة.^[56]

•الروبوتات الصناعيّة: يمكن أن تقوم بمضاعفة نطاق المهام التي قد تصبح مشغلّة آليّاً عن طريق التكيّف مع حالات مختلفة.

•الرعاية الصحيّة: قد تساعد في التشخيص الطبي.

•محرّكات البحث: قد تزيد كفاءة طاقة مراكز البيانات والقدرة على استخدام استعلامات متطوّرة بكثرة.

•معالجة اللغات الطبيعيّة.

المراجع

^ "Intel unveils Movidius Compute Stick USB AI Accelerator". July 21, 2017. Archived from the original on August 11, 2017. Retrieved August 11, 2017.
^ "Inspurs unveils GX4 AI Accelerator". June 21, 2017.
^ Wiggers, Kyle (November 6, 2019) [2019], Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors, archived from the original on March 6, 2020, retrieved March 14, 2020
^ "Google Developing AI Processors".Google using its own AI accelerators.
^ "A Survey of ReRAM-based Architectures for Processing-in-memory and Neural Networks", S. Mittal, Machine Learning and Knowledge Extraction, 2018
^ "13 Sextillion & Counting: The Long & Winding Road to the Most Frequently Manufactured Human Artifact in History". Computer History Museum. April 2, 2018. Retrieved July 28, 2019.
^ "convolutional neural network demo from 1993 featuring DSP32 accelerator".
^ "design of a connectionist network supercomputer".
^ "The end of general purpose computers (not)".This presentation covers a past attempt at neural net accelerators, notes the similarity to the modern SLI GPGPU processor setup, and argues that general purpose vector accelerators are the way forward (in relation to RISC-V hwacha project. Argues that NN's are just dense and sparse matrices, one of several recurring algorithms)
^ Ramacher, U.; Raab, W.; Hachmann, J.A.U.; Beichter, J.; Bruls, N.; Wesseling, M.; Sicheneder, E.; Glass, J.; Wurz, A.; Manner, R. (1995). Proceedings of 9th International Parallel Processing Symposium. pp. 774–781. CiteSeerX 10.1.1.27.6410. doi:10.1109/IPPS.1995.395862. ISBN 978-0-8186-7074-9.
^ ^ا ^ب "Space Efficient Neural Net Implementation".
^ ^ا ^ب Gschwind, M.; Salapura, V.; Maischberger, O. (1996). "A Generic Building Block for Hopfield Neural Networks with On-Chip Learning". 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96. pp. 49–52. doi:10.1109/ISCAS.1996.598474. ISBN 0-7803-3073-0. S2CID 17630664.
^ "Application of the ANNA Neural Network Chip to High-Speed Character Recognition" (PDF).
^ Gschwind, Michael; Hofstee, H. Peter; Flachs, Brian; Hopkins, Martin; Watanabe, Yukio; Yamazaki, Takeshi (2006). "Synergistic Processing in Cell's Multicore Architecture". IEEE Micro. 26 (2): 10–24. doi:10.1109/MM.2006.41. S2CID 17834015.
^ De Fabritiis, G. (2007). "Performance of Cell processor for biomolecular simulations". Computer Physics Communications. 176 (11–12): 660–664. arXiv:physics/0611201. doi:10.1016/j.cpc.2007.02.107.
^ "Video Processing and Retrieval on Cell architecture". CiteSeerX 10.1.1.138.5133.
^ Benthin, Carsten; Wald, Ingo; Scherbaum, Michael; Friedrich, Heiko (2006). 2006 IEEE Symposium on Interactive Ray Tracing. pp. 15–23. CiteSeerX 10.1.1.67.8982. doi:10.1109/RT.2006.280210. ISBN 978-1-4244-0693-7.
^ "Development of an artificial neural network on a heterogeneous multicore architecture to predict a successful weight loss in obese individuals" (PDF).
^ Kwon, Bomjun; Choi, Taiho; Chung, Heejin; Kim, Geonho (2008). 2008 5th IEEE Consumer Communications and Networking Conference. pp. 1030–1034. doi:10.1109/ccnc08.2007.235. ISBN 978-1-4244-1457-4.
^ Duan, Rubing; Strey, Alfred (2008). Euro-Par 2008 – Parallel Processing. Lecture Notes in Computer Science. 5168. pp. 665–675. doi:10.1007/978-3-540-85451-7_71. ISBN 978-3-540-85450-0.
^ "Improving the performance of video with AVX". February 8, 2012.
^ "microsoft research/pixel shaders/MNIST".
^ "How GPU came to be used for general computation".
^ "imagenet classification with deep convolutional neural networks" (PDF).
^ "nvidia driving the development of deep learning". May 17, 2016.
^ "nvidia introduces supercomputer for self driving cars". January 6, 2016.
^ "how nvlink will enable faster easier multi GPU computing". November 14, 2014.
^ "A Survey on Optimized Implementation of Deep Learning Models on the NVIDIA Jetson Platform", 2019
^ ^ا ^ب Harris, Mark (May 11, 2017). "CUDA 9 Features Revealed: Volta, Cooperative Groups and More". Retrieved August 12, 2017.
^ Sefat, Md Syadus; Aslan, Semih; Kellington, Jeffrey W; Qasem, Apan (August 2019). "Accelerating HotSpots in Deep Neural Networks on a CAPI-Based FPGA". 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS): 248–256. doi:10.1109/HPCC/SmartCity/DSS.2019.00048.
^ "FPGA Based Deep Learning Accelerators Take on ASICs". The Next Platform. August 23, 2016. Retrieved September 7, 2016.
^ "Project Brainwave". Microsoft Research. Retrieved June 16, 2020.
^ "A Survey of FPGA-based Accelerators for Convolutional Neural Networks", Mittal et al., NCAA, 2018
^ "Google boosts machine learning with its Tensor Processing Unit". May 19, 2016. Retrieved September 13, 2016.
^ "Chip could bring deep learning to mobile devices". www.sciencedaily.com. February 3, 2016. Retrieved September 13, 2016.
^ "Deep Learning with Limited Numerical Precision" (PDF).
^ Rastegari, Mohammad; Ordonez, Vicente; Redmon, Joseph; Farhadi, Ali (2016). "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks". arXiv:1603.05279 [cs.CV].
^ Khari Johnson (May 23, 2018). "Intel unveils Nervana Neural Net L-1000 for accelerated AI training". VentureBeat. Retrieved May 23, 2018. ...Intel will be extending bfloat16 support across our AI product lines, including Intel Xeon processors and Intel FPGAs.
^ Michael Feldman (May 23, 2018). "Intel Lays Out New Roadmap for AI Portfolio". TOP500 Supercomputer Sites. Retrieved May 23, 2018. Intel plans to support this format across all their AI products, including the Xeon and FPGA lines
^ Lucian Armasu (May 23, 2018). "Intel To Launch Spring Crest, Its First Neural Network Processor, In 2019". Tom's Hardware. Retrieved May 23, 2018. Intel said that the NNP-L1000 would also support bfloat16, a numerical format that’s being adopted by all the ML industry players for neural networks. The company will also support bfloat16 in its FPGAs, Xeons, and other ML products. The Nervana NNP-L1000 is scheduled for release in 2019.
^ "Available TensorFlow Ops | Cloud TPU | Google Cloud". Google Cloud. Retrieved May 23, 2018. This page lists the TensorFlow Python APIs and graph operators available on Cloud TPU.
^ Elmar Haußmann (April 26, 2018). "Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50". RiseML Blog. Archived from the original on April 26, 2018. Retrieved May 23, 2018. For the Cloud TPU, Google recommended we use the bfloat16 implementation from the official TPU repository with TensorFlow 1.7.0. Both the TPU and GPU implementations make use of mixed-precision computation on the respective architecture and store most tensors with half-precision.
^ Tensorflow Authors (February 28, 2018). "ResNet-50 using BFloat16 on TPU". Google. Retrieved May 23, 2018.[permanent dead link]
^ Joshua V. Dillon; Ian Langmore; Dustin Tran; Eugene Brevdo; Srinivas Vasudevan; Dave Moore; Brian Patton; Alex Alemi; Matt Hoffman; Rif A. Saurous (November 28, 2017). TensorFlow Distributions (Report). arXiv:1711.10604. Bibcode:2017arXiv171110604D. Accessed 2018-05-23. All operations in TensorFlow Distributions are numerically stable across half, single, and double floating-point precisions (as TensorFlow dtypes: tf.bfloat16 (truncated floating point), tf.float16, tf.float32, tf.float64). Class constructors have a validate_args flag for numerical asserts
^ "Facebook has a new job posting calling for chip designers".
^ "Subscribe to read | Financial Times". www.ft.com. Cite uses generic title (help)
^ Abu Sebastian; Tomas Tuma; Nikolaos Papandreou; Manuel Le Gallo; Lukas Kull; Thomas Parnell; Evangelos Eleftheriou (2017). "Temporal correlation detection using computational phase-change memory". Nature Communications. 8. arXiv:1706.00511. doi:10.1038/s41467-017-01481-9. PMID 29062022.
^ "A new brain-inspired architecture could improve how computers handle data and advance AI". American Institute of Physics. October 3, 2018. Retrieved October 5, 2018.
^ Carlos Ríos; Nathan Youngblood; Zengguang Cheng; Manuel Le Gallo; Wolfram H.P. Pernice; C. David Wright; Abu Sebastian; Harish Bhaskaran (2018). "In-memory computing on a photonic platform". arXiv:1801.06228 [cs.ET].
^ Zhong Sun; Giacomo Pedretti; Elia Ambrosi; Alessandro Bricalli; Wei Wang; Daniele Ielmini (2019). "Solving matrix equations in one step with cross-point resistive arrays". Proceedings of the National Academy of Sciences. 116 (10): 4123–4128.
^ Marega, Guilherme Migliato; Zhao, Yanfei; Avsar, Ahmet; Wang, Zhenyu; Tripati, Mukesh; Radenovic, Aleksandra; Kis, Anras (2020). "Logic-in-memory based on an atomically thin semiconductor". Nature. 587 (2): 72–77. doi:10.1038/s41586-020-2861-0.
^ "NVIDIA launches the World's First Graphics Processing Unit, the GeForce 256".
^ "Self-Driving Cars Technology & Solutions from NVIDIA Automotive". NVIDIA.
^ "design of a machine vision system for weed control" (PDF). Archived from the original (PDF) on June 23, 2010. Retrieved June 17, 2016.
^ "qualcomm research brings server class machine learning to every data devices". October 2015.
^ "movidius powers worlds most intelligent drone". March 16, 2016.

[1] "Intel unveils Movidius Compute Stick USB AI Accelerator". July 21, 2017. Archived from the original on August 11, 2017. Retrieved August 11, 2017.

[2] "Inspurs unveils GX4 AI Accelerator". June 21, 2017.

[3] Wiggers, Kyle (November 6, 2019) [2019], Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors, archived from the original on March 6, 2020, retrieved March 14, 2020

[4] "Google Developing AI Processors".Google using its own AI accelerators.

[5] "A Survey of ReRAM-based Architectures for Processing-in-memory and Neural Networks", S. Mittal, Machine Learning and Knowledge Extraction, 2018

[6] "13 Sextillion & Counting: The Long & Winding Road to the Most Frequently Manufactured Human Artifact in History". Computer History Museum. April 2, 2018. Retrieved July 28, 2019.

[7] "convolutional neural network demo from 1993 featuring DSP32 accelerator".

[8] "design of a connectionist network supercomputer".

[9] "The end of general purpose computers (not)".This presentation covers a past attempt at neural net accelerators, notes the similarity to the modern SLI GPGPU processor setup, and argues that general purpose vector accelerators are the way forward (in relation to RISC-V hwacha project. Argues that NN's are just dense and sparse matrices, one of several recurring algorithms)

[10] Ramacher, U.; Raab, W.; Hachmann, J.A.U.; Beichter, J.; Bruls, N.; Wesseling, M.; Sicheneder, E.; Glass, J.; Wurz, A.; Manner, R. (1995). Proceedings of 9th International Parallel Processing Symposium. pp. 774–781. CiteSeerX 10.1.1.27.6410. doi:10.1109/IPPS.1995.395862. ISBN 978-0-8186-7074-9.

[مولد_تلقائيا3-11] ا ^ب "Space Efficient Neural Net Implementation".

[مولد_تلقائيا2-12] ا ^ب Gschwind, M.; Salapura, V.; Maischberger, O. (1996). "A Generic Building Block for Hopfield Neural Networks with On-Chip Learning". 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96. pp. 49–52. doi:10.1109/ISCAS.1996.598474. ISBN 0-7803-3073-0. S2CID 17630664.

[13] "Application of the ANNA Neural Network Chip to High-Speed Character Recognition" (PDF).

[14] Gschwind, Michael; Hofstee, H. Peter; Flachs, Brian; Hopkins, Martin; Watanabe, Yukio; Yamazaki, Takeshi (2006). "Synergistic Processing in Cell's Multicore Architecture". IEEE Micro. 26 (2): 10–24. doi:10.1109/MM.2006.41. S2CID 17834015.

[15] De Fabritiis, G. (2007). "Performance of Cell processor for biomolecular simulations". Computer Physics Communications. 176 (11–12): 660–664. arXiv:physics/0611201. doi:10.1016/j.cpc.2007.02.107.

[16] "Video Processing and Retrieval on Cell architecture". CiteSeerX 10.1.1.138.5133.

[17] Benthin, Carsten; Wald, Ingo; Scherbaum, Michael; Friedrich, Heiko (2006). 2006 IEEE Symposium on Interactive Ray Tracing. pp. 15–23. CiteSeerX 10.1.1.67.8982. doi:10.1109/RT.2006.280210. ISBN 978-1-4244-0693-7.

[18] "Development of an artificial neural network on a heterogeneous multicore architecture to predict a successful weight loss in obese individuals" (PDF).

[19] Kwon, Bomjun; Choi, Taiho; Chung, Heejin; Kim, Geonho (2008). 2008 5th IEEE Consumer Communications and Networking Conference. pp. 1030–1034. doi:10.1109/ccnc08.2007.235. ISBN 978-1-4244-1457-4.

[20] Duan, Rubing; Strey, Alfred (2008). Euro-Par 2008 – Parallel Processing. Lecture Notes in Computer Science. 5168. pp. 665–675. doi:10.1007/978-3-540-85451-7_71. ISBN 978-3-540-85450-0.

[21] "Improving the performance of video with AVX". February 8, 2012.

[22] "microsoft research/pixel shaders/MNIST".

[23] "How GPU came to be used for general computation".

[24] "imagenet classification with deep convolutional neural networks" (PDF).

[25] "nvidia driving the development of deep learning". May 17, 2016.

[26] "nvidia introduces supercomputer for self driving cars". January 6, 2016.

[27] "how nvlink will enable faster easier multi GPU computing". November 14, 2014.

[28] "A Survey on Optimized Implementation of Deep Learning Models on the NVIDIA Jetson Platform", 2019

[مولد_تلقائيا1-29] ا ^ب Harris, Mark (May 11, 2017). "CUDA 9 Features Revealed: Volta, Cooperative Groups and More". Retrieved August 12, 2017.

[30] Sefat, Md Syadus; Aslan, Semih; Kellington, Jeffrey W; Qasem, Apan (August 2019). "Accelerating HotSpots in Deep Neural Networks on a CAPI-Based FPGA". 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS): 248–256. doi:10.1109/HPCC/SmartCity/DSS.2019.00048.

[31] "FPGA Based Deep Learning Accelerators Take on ASICs". The Next Platform. August 23, 2016. Retrieved September 7, 2016.

[32] "Project Brainwave". Microsoft Research. Retrieved June 16, 2020.

[33] "A Survey of FPGA-based Accelerators for Convolutional Neural Networks", Mittal et al., NCAA, 2018

[34] "Google boosts machine learning with its Tensor Processing Unit". May 19, 2016. Retrieved September 13, 2016.

[35] "Chip could bring deep learning to mobile devices". www.sciencedaily.com. February 3, 2016. Retrieved September 13, 2016.

[36] "Deep Learning with Limited Numerical Precision" (PDF).

[37] Rastegari, Mohammad; Ordonez, Vicente; Redmon, Joseph; Farhadi, Ali (2016). "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks". arXiv:1603.05279 [cs.CV].

[38] Khari Johnson (May 23, 2018). "Intel unveils Nervana Neural Net L-1000 for accelerated AI training". VentureBeat. Retrieved May 23, 2018. ...Intel will be extending bfloat16 support across our AI product lines, including Intel Xeon processors and Intel FPGAs.

[39] Michael Feldman (May 23, 2018). "Intel Lays Out New Roadmap for AI Portfolio". TOP500 Supercomputer Sites. Retrieved May 23, 2018. Intel plans to support this format across all their AI products, including the Xeon and FPGA lines

[40] Lucian Armasu (May 23, 2018). "Intel To Launch Spring Crest, Its First Neural Network Processor, In 2019". Tom's Hardware. Retrieved May 23, 2018. Intel said that the NNP-L1000 would also support bfloat16, a numerical format that’s being adopted by all the ML industry players for neural networks. The company will also support bfloat16 in its FPGAs, Xeons, and other ML products. The Nervana NNP-L1000 is scheduled for release in 2019.

[41] "Available TensorFlow Ops | Cloud TPU | Google Cloud". Google Cloud. Retrieved May 23, 2018. This page lists the TensorFlow Python APIs and graph operators available on Cloud TPU.

[42] Elmar Haußmann (April 26, 2018). "Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50". RiseML Blog. Archived from the original on April 26, 2018. Retrieved May 23, 2018. For the Cloud TPU, Google recommended we use the bfloat16 implementation from the official TPU repository with TensorFlow 1.7.0. Both the TPU and GPU implementations make use of mixed-precision computation on the respective architecture and store most tensors with half-precision.

[43] Tensorflow Authors (February 28, 2018). "ResNet-50 using BFloat16 on TPU". Google. Retrieved May 23, 2018.[permanent dead link]

[44] Joshua V. Dillon; Ian Langmore; Dustin Tran; Eugene Brevdo; Srinivas Vasudevan; Dave Moore; Brian Patton; Alex Alemi; Matt Hoffman; Rif A. Saurous (November 28, 2017). TensorFlow Distributions (Report). arXiv:1711.10604. Bibcode:2017arXiv171110604D. Accessed 2018-05-23. All operations in TensorFlow Distributions are numerically stable across half, single, and double floating-point precisions (as TensorFlow dtypes: tf.bfloat16 (truncated floating point), tf.float16, tf.float32, tf.float64). Class constructors have a validate_args flag for numerical asserts

[45] "Facebook has a new job posting calling for chip designers".

[46] "Subscribe to read | Financial Times". www.ft.com. Cite uses generic title (help)

[47] Abu Sebastian; Tomas Tuma; Nikolaos Papandreou; Manuel Le Gallo; Lukas Kull; Thomas Parnell; Evangelos Eleftheriou (2017). "Temporal correlation detection using computational phase-change memory". Nature Communications. 8. arXiv:1706.00511. doi:10.1038/s41467-017-01481-9. PMID 29062022.

[48] "A new brain-inspired architecture could improve how computers handle data and advance AI". American Institute of Physics. October 3, 2018. Retrieved October 5, 2018.

[49] Carlos Ríos; Nathan Youngblood; Zengguang Cheng; Manuel Le Gallo; Wolfram H.P. Pernice; C. David Wright; Abu Sebastian; Harish Bhaskaran (2018). "In-memory computing on a photonic platform". arXiv:1801.06228 [cs.ET].

[50] Zhong Sun; Giacomo Pedretti; Elia Ambrosi; Alessandro Bricalli; Wei Wang; Daniele Ielmini (2019). "Solving matrix equations in one step with cross-point resistive arrays". Proceedings of the National Academy of Sciences. 116 (10): 4123–4128.

[51] Marega, Guilherme Migliato; Zhao, Yanfei; Avsar, Ahmet; Wang, Zhenyu; Tripati, Mukesh; Radenovic, Aleksandra; Kis, Anras (2020). "Logic-in-memory based on an atomically thin semiconductor". Nature. 587 (2): 72–77. doi:10.1038/s41586-020-2861-0.

[52] "NVIDIA launches the World's First Graphics Processing Unit, the GeForce 256".

[53] "Self-Driving Cars Technology & Solutions from NVIDIA Automotive". NVIDIA.

[54] "design of a machine vision system for weed control" (PDF). Archived from the original (PDF) on June 23, 2010. Retrieved June 17, 2016.

[55] "qualcomm research brings server class machine learning to every data devices". October 2015.

[56] "movidius powers worlds most intelligent drone". March 16, 2016.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]