Towards energy-efficient convolutional neural network inference
Autoren
Mehr zum Buch
Deep learning and particularly convolutional neural networks (CNNs) have become the method of choice for most computer vision tasks. The achieved leap in accuracy has dramatically increased the range of possibilities and created a demand for running these compute and memory intensive algorithms on embedded and mobile devices. In this thesis, we evaluate the capabilities of software-programmable hardware, dive into specialized accelerators, and explore the potential of extremely quantized CNNs—all with special consideration to external memory bandwidth, which dominates the overall energy cost. We establish that—including I/O—software-programmable platforms can achieve 10–40 GOp/s/W, our specialized accelerator for fixedpoint CNNs achieves 630 GOp/s/W, binary-weight CNNs can be implemented with up to 5.9 TOp/s/W and very small binarized neural networks implementable with purely combinational logic could be run directly on the sensor with 670 TOp/s/W.