In recent years, more and more hardware accelerators are customized for deep learning (DL) computation to improve the throughput of inference and the performance per watt. In DL applications, convolutions are the major computation. The convolution can be regarded as a special matrix multiplication, which utilizes multiplier-accumulators for hardware implementation. In a multiplier-accumulator, the critical path and some paths with longer path delay seldom activate. In addition, the data distribution of deep learning applications leads those paths to activate rarely. According to this characteristic, we propose an Approximate Systolic Array Processor (ASAP), which combines both approximate computing and variable-latency design. With the technique which is similar to randomly drop partial connections within a deep neural network, we implement voltage under-scaling on our proposed DL accelerator to improve the power consumption of systolic arrays with negligible accuracy loss. In our experimental results of hand-written digit recognition and image classification applications, ASAP can obtain 47%~51% power saving over a baseline systolic array based on the architecture of Google’s TPU with 1% loss in classification accuracy.

[1] A. Krizhevsky et al., "ImageNet classification with deep convolutional neural networks," in Proc. of NIPS, pp. 1097-1105, Dec. 2012.
  • Google Scholar
  • [2] K. Guo et al., "Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGAs," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 37, no. 1, pp. 35-47, Jan. 2018.
  • Google Scholar
  • [3] E. Nurvitadhi et al., "Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?," in Proc. of ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, pp. 5-14, 2017.
  • Google Scholar
  • [4] N. P. Jouppi et al., "In-Datacenter Performance Analysis of a Tensor Processing UnitTM," in Proc. of ISCA, pp. 1-12, Jun. 2017.
  • Google Scholar
  • [5] T. Chen et al., "DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning, " in Proc. of ASPLOS, pp. 269-284, Mar. 2014.
  • Google Scholar
  •