# Block-Sparse Recurrent Neural Networks

@article{Narang2017BlockSparseRN, title={Block-Sparse Recurrent Neural Networks}, author={Sharan Narang and Eric Undersander and Gregory Frederick Diamos}, journal={ArXiv}, year={2017}, volume={abs/1711.02782} }

Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. [...] Key Method Using these techniques, we can create block-sparse RNNs with sparsity ranging from 80% to 90% with a small loss in accuracy. This technique allows us to reduce the model size by roughly 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter… Expand

#### Figures, Tables, and Topics from this paper

#### 85 Citations

Block-wise Dynamic Sparseness

- Computer Science, Mathematics
- ArXiv
- 2020

A new method for dynamic sparseness, whereby part of the computations are omitted dynamically, based on the input, whereby the method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time. Expand

Dynamic Block Sparse Reparameterization of Convolutional Neural Networks

- Computer Science
- 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
- 2019

This work focuses on block sparsity and generates efficient block sparse convolutional neural networks using the approach DBSR (Dynamic block sparse reparameterization), which decreases parameters and FLOPS of ResneXt50 by a factor of 2x with only increase of 0.48 in Top-1 error. Expand

Rethinking Full Connectivity in Recurrent Neural Networks

- Computer Science, Mathematics
- ArXiv
- 2019

Structurally sparse RNNs are studied, showing that they are well suited for acceleration on parallel hardware, with a greatly reduced cost of the recurrent operations as well as orders of magnitude less recurrent weights. Expand

Hierarchical Block Sparse Neural Networks

- Mathematics, Computer Science
- ArXiv
- 2018

This work jointly addresses both accuracy and performance of sparse DNNs using their proposed class of sparse neural networks called HBsNN (Hierarchical Block sparse Neural Networks). Expand

Structured Pruning of Recurrent Neural Networks through Neuron Selection

- Computer Science, Mathematics
- Neural Networks
- 2020

This work proposes a structured pruning method through neuron selection which can remove the independent neuron of RNNs and introduces two sets of binary random variables, which can be interpreted as gates or switches to the input neurons and the hidden neurons, respectively. Expand

Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity

- Computer Science
- INTERSPEECH
- 2020

A new LSTM training technique based on hierarchical coarse-grain sparsity (HCGS), which enforces hierarchical structured sparsity by randomly dropping static block-wise connections between layers between layers to reduce weight storage for both training and inference hardware systems. Expand

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training

- Computer Science
- ArXiv
- 2021

This work proposes to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity, which are well amenable to computation reduction at run-time in general-purpose SIMD hardware as well as systolic arrays. Expand

One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation

- Computer Science, Mathematics
- ICLR
- 2020

A new recurrent pruning objective derived from the spectrum of the recurrent Jacobian is introduced, which is data efficient, easy to implement, and produces 95% sparse GRUs that significantly improve on existing baselines. Expand

Accelerating Sparse Deep Neural Networks

- Computer Science
- ArXiv
- 2021

The design and behavior of Sparse Tensor Cores are presented, which exploit a 2:4 (50%) sparsity pattern that leads to twice the math throughput of dense matrix units. Expand

CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks

- Computer Science, Engineering
- ICS
- 2020

This paper presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed structured block (CSB) pruning technique, and proposes a novel hardware architecture with a dedicated compiler to address the challenging workload imbalance issue and significantly improves the hardware efficiency. Expand

#### References

SHOWING 1-10 OF 32 REFERENCES

Exploring Sparsity in Recurrent Neural Networks

- Computer Science, Mathematics
- ICLR
- 2017

This work proposes a technique to reduce the parameters of a network by pruning weights during the initial training of the network, which reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Expand

Exploiting sparseness in deep neural networks for large vocabulary speech recognition

- Computer Science
- 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2012

The goal of enforcing sparseness as soft regularization and convex constraint optimization problems is formulated, solutions under the stochastic gradient ascent setting are proposed, and novel data structures are proposed to exploit the randomSparseness patterns to reduce model size and computation time. Expand

Learning Structured Sparsity in Deep Neural Networks

- Computer Science, Mathematics
- NIPS
- 2016

The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers. Expand

Improving the speed of neural networks on CPUs

- Computer Science
- 2011

This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy. Expand

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

- Computer Science, Mathematics
- ArXiv
- 2017

This paper quantitatively measure the trade-off between sparsity regularity and prediction accuracy, providing insights in how to maintain accuracy while having more a more structured sparsity pattern. Expand

Sparse Convolutional Neural Networks

- Computer Science
- 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015

This work shows how to reduce the redundancy in these parameters using a sparse decomposition, and proposes an efficient sparse matrix multiplication algorithm on CPU for Sparse Convolutional Neural Networks (SCNN) models. Expand

Learning Intrinsic Sparse Structures within Long Short-term Memory

- Computer Science, Mathematics
- ICLR
- 2018

This work aims to learn structurally-sparse Long Short-Term Memory by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs, by proposing Intrinsic Sparse Structures (ISS) in LSTMs. Expand

Mixed Precision Training

- Computer Science, Mathematics
- ICLR
- 2018

This work introduces a technique to train deep neural networks using half precision floating point numbers, and demonstrates that this approach works for a wide variety of models including convolution neural networks, recurrent neural networks and generative adversarial networks. Expand

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

- Computer Science
- ICONIP
- 2017

A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hard- ware implementation costs during training to… Expand

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

- Computer Science
- ICLR
- 2016

This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Expand