用於終端人工智慧運算之近似脈動陣列處理架構= ASAP: Approximate ...

相关文章推荐

魁梧的葫芦 · 超大型深井铁矿高效绿色开采技术与智能装备研究 ...· 1 月前 ·

瘦瘦的小摩托 · 全都是套路？微信群里的“复旦陈果哲学课”，你 ...· 7 月前 ·

俊秀的葫芦 · 德国亚琛工业大学郭余宝教授来校讲学· 7 月前 ·

瘦瘦的跑步鞋 · 内地居民间办结婚证需要什么材料？· 1 年前 ·

慷慨的蜡烛 · 《阴阳师》获好莱坞音乐传媒奖提名背后，隐藏了 ...· 1 年前 ·

追风的啤酒 · 冰皮月饼制作过程传统食品,中秋节,节日摄影, ...· 2 年前 ·

In recent years, more and more hardware accelerators are customized for deep learning (DL) computation to improve the throughput of inference and the performance per watt. In DL applications, convolutions are the major computation. The convolution can be regarded as a special matrix multiplication, which utilizes multiplier-accumulators for hardware implementation. In a multiplier-accumulator, the critical path and some paths with longer path delay seldom activate. In addition, the data distribution of deep learning applications leads those paths to activate rarely. According to this characteristic, we propose an Approximate Systolic Array Processor (ASAP), which combines both approximate computing and variable-latency design. With the technique which is similar to randomly drop partial connections within a deep neural network, we implement voltage under-scaling on our proposed DL accelerator to improve the power consumption of systolic arrays with negligible accuracy loss. In our experimental results of hand-written digit recognition and image classification applications, ASAP can obtain 47%~51% power saving over a baseline systolic array based on the architecture of Google’s TPU with 1% loss in classification accuracy.

[1] A. Krizhevsky et al., "ImageNet classification with deep convolutional neural networks," in Proc. of NIPS, pp. 1097-1105, Dec. 2012.

Google Scholar

[2] K. Guo et al., "Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGAs," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 37, no. 1, pp. 35-47, Jan. 2018.

Google Scholar

[3] E. Nurvitadhi et al., "Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?," in Proc. of ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, pp. 5-14, 2017.

Google Scholar

[4] N. P. Jouppi et al., "In-Datacenter Performance Analysis of a Tensor Processing UnitTM," in Proc. of ISCA, pp. 1-12, Jun. 2017.

Google Scholar

[5] T. Chen et al., "DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning, " in Proc. of ASPLOS, pp. 269-284, Mar. 2014.

Google Scholar