oBERT: Compound Sparsification Delivers Faster Accurate Models for NLP
Discover “compound sparsification” and learn the way to use it to BERT fashions for 10x compression and GPU-level latency on commodity CPUs.
Discover “compound sparsification” and learn the way to use it to BERT fashions for 10x compression and GPU-level latency on commodity CPUs.