- TorchSparse++: A Unified Framework for Efficient Inference and Training for Sparse Point Cloud on GPUs
H. Tang, S. Yang, Z. Liu, K. Hong, Z. Yu, X. Li, G. Dai, Y. Wang, S. Han
MICRO’23 Tiny Training Engine: Pocket-sized Engine for Efficient On-device Training
L. Zhu, L. Hu, J. Lin, W. Chen, W. Wang, C. Gan, S. Han
MICRO’23Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, W. J. Dally
50th ISCA invited paper, for EIE as the top-5 most cited paper in 50 years of ISCA- AWQ: Activation-aware Weight-only Quantization for LLM Compression and Acceleration
J. Lin, J. Tang, H. Tang, S. Yang, X. Dang, S. Han
arxiv
paper / code / TinyChat - EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation
H. Cai, J. Li, M. Hu, C. Gan, S. Han
ICCV’23
paper / code - SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
G. Xiao*, J. Lin*, M. Seznec, H. Wu, J. Demouth, S. Han
ICML’23
paper / code / integration by NVIDIA / integration by Intel - FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Z Liu*, X Yang*, H Tang, S Yang, S Han
CVPR’23
paper, code - SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
X Chen*, Z Liu*, H Tang, L Yi, H Zhao, S Han
CVPR’23
paper, code - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation
Z. Liu*, H. Tang*, A. Amini, X. Yang, H. Mao, D. L. Rus, S. Han
ICRA’23
paper / code / website / demo - On-Device Training Under 256KB Memory
J. Lin*, L. Zhu*, W. Chen, W. Wang, C. Gan, S. Han
NeurIPS’22
paper / website / demo / code / slides / poster - Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
M. Li, J. Lin, C. Meng, S. Ermon, S. Han, J. Zhu
NeurIPS’22 - Network Augmentation for Tiny Deep Learning
H. Cai, C. Gan, J. Lin, S. Han
ICLR’22
paper / code - LitePose: Efficient Architecture Design for 2D Human Pose Estimation
Y. Wang, M. Li, H. Cai, W. Chen, S. Han
CVPR’22
paper / code - TorchSparse: Efficient Point Cloud Inference Engine
H. Tang, Z. Liu, X. Li, Y. Lin, S. Han
MLSys’22
paper / website / code - Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
H. Cai, J. Lin, Y. Lin, Z. Liu, H. Tang, H. Wang, L. Zhu, S. Han
TODAES’22
paper - QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits
H. Wang, Y. Ding, J. Gu, Z. Li, Y. Lin, D. Pan, F. Chong, S. Han
HPCA’22
paper / qmlsys website / TorchQuantum / MIT News / video - QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization
H. Wang, J. Gu, Y. Ding, Z. Li, F. T. Chong, D. Z. Pan, S. Han
DAC’22
paper / qmlsys website / code - QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning
H. Wang, Z. Li, J. Gu, Y. Ding, D. Z. Pan, S. Han
DAC’22
paper / qmlsys website / code - QuEst: Graph Transformer for Quantum Circuit Reliability Prediction
H. Wang, P. Liu, J. Cheng, Z. Liang, J. Gu, Z. Li, Y. Ding, W. Jiang, Y. Shi, X. Qian, D. Z. Pan, F. T. Chong, S. Han
ICCAD’22, invited - Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning
L. Zhu, H. Lin, Y Lu, Y. Lin, S. Han
NeurIPS’21
paper / website / slides / poster - PointAcc: Efficient Point Cloud Accelerator
Y. Lin, Z. Zhang, H. Tang, H. Wang, S. Han
MICRO’21
paper / website / slides / talk / lightning talk - LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
Z. Liu, S. Stent, J. Li, J. Gideon, S. Han
ICCV’21
paper / website - SemAlign: Annotation-Free Camera-LiDAR Calibration with Semantic Alignment Loss
Z. Liu, H. Tang, S. Zhu, S. Han
IROS’21
paper / website - NAAS: Neural Accelerator Architecture Search
Y. Lin, M. Yang, S. Han
DAC’21
paper / website / slides / video - Anycost GANs for Interactive Image Synthesis and Editing
J. Lin, R. Zhang, F. Ganz, S. Han, J. Zhu
CVPR’21
website / video / code - Efficient and Robust LiDAR-Based End-to-End Navigation
Z Liu, A. Amini, S. Zhu, S. Karaman, S. Han, D. Rus
ICRA’21
website / video / news - IOS: Inter-Operator Scheduler For CNN Acceleration
Y. Ding, L. Zhu, Z. Jia, G. Pekhimenko, S. Han
MLSys’21
code / video / slides / poster - SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
H. Wang, Z. Zhang, S. Han
HPCA’21
Paper / Slides / Intro Video / Short Video / Project Page - MCUNet: Tiny Deep Learning on IoT Devices
J. Lin, W. Chen, Y. Lin, J. Cohn, C. Gan, S. Han
NeurIPS’20, spotlight
MIT News / website / Wired / Stacey on IoT / Morning Brew / IBM / Analytics Insight - Tiny Transfer Learning: Reduce Activations, not Trainable Parameters for Efficient On-Device Learning
H. Cai, C. Gan, L. Zhu, S. Han
NeurIPS’20
website / slides / code - Differentiable Augmentation for Data-Efficient GAN Training
S. Zhao, Z. Liu, J. Lin, J. Zhu, S. Han
NeurIPS’20
code / website / talk / VentureBeat / blog - Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
H. Tang∗, Z. Liu∗, S. Zhao, Y. Lin, J. Lin. H. Wang, S. Han
ECCV’20
[website][video][tutorial][code][TorchSparse] - HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
H. Wang, Z. Wu, Z. Liu, H. Cai, L. Zhu, C Gan, S. Han
ACL’20
Paper / Slides / Video / Code / Project Page GAN Compression: Learning Efficient Architectures for Conditional GANs
M. Li, J. Lin, Y. Ding, Z. Liu, J. Zhu, S. Han CVPR’20
paper / website / demo / tutorialAPQ: Joint Search for Network Architecture, Pruning and Quantization Policy
T. Wang, K. Wang, H. Cai, J. Lin, Z. Liu, S. Han
CVPR’20
Code / VideoGCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement LearningH. Wang, K. Wang, J. Yang, L. Shen, N. Sun, H.S. Lee and S. Han
Design Automation Conference (DAC), 2020.
Paper / Slides / Poster / Video / Project PageSpArch: Efficient Architecture for Sparse Matrix Multiplication
Z. Zhang, H. Wang, S. Han, W.J. Dally
International Symposium on High-Performance Computer Architecture (HPCA), 2020.
Paper / 2-min Intro / Intro / Talk / Slides / Project PageOnce For All: Train One Network and Specialize It for Efficient Deployment
H. Cai, C. Gan, T. Wang, Z. Zhang, S. Han
International Conference on Learning Representations (ICLR), 2020.
also appeared at TinyML summit, SysML workshop and CVPR workshop 2020.
[paper][website][code][tutorial][poster][ICLR talk][tinyML talk][CVPR talk][tutorial][MIT news][Qualcomm news][Venture Beat]
Lite Transformer with Long Short Term Attention
Z. Wu, Z. Liu, J. Lin, Y. Lin, S. Han
International Conference on Learning Representations (ICLR), 2020.
[paper][website][code][slides]
Point Voxel CNN for Efficient 3D Deep Learning
Z. Liu, H. Tang, Y. Lin, S. Han
Neural Information Processing System (NeurIPS), 2019. Spotlight
[paper][deployment on MIT Driverless][playlist][talk][slides][code][website]Deep Leakage from Gradients
L. Zhu, Z. Liu, S. Han
Neural Information Processing System (NeurIPS), 2019.
[paper][website][code][poster][colab]TSM: Temporal Shift Module for Efficient Video Understanding
J. Lin, C. Gan, S. Han.
International Conference on Computer Vision (ICCV), 2019.
[paper][demo][code][poster][MIT News][Engadget][MIT Technology Review][NVIDIA News][NVIDIA Jetson Developer Forum]
[industry integration: @NVIDIA, @Baidu]HAQ: Hardware-Aware Automated Quantization with Mixed Precision
K. Wang, Z. Liu, Y. Lin, J. Lin, S. Han.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. Oral presentation.
[paper][slides][poster][code][video][BibTex]ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
H. Cai, L. Zhu, S. Han
International Conference on Learning Representations (ICLR), 2019.
[paper][code][demo][poster][MIT news][IEEE Spectrum][BibTex]
[industry integration: @AWS, @Facebook]Defensive Quantization: When Efficiency Meets Robustness
J. Lin, C. Gan, S. Han
International Conference on Learning Representations (ICLR), 2019.
[paper][poster][MIT News][BibTex]AMC: AutoML for Model Compression and Acceleration on Mobile Devices.
Y. He, J. Lin, Z. Liu, H. Wang, J. Li, S. Han
European Conference on Computer Vision (ECCV), 2018
[paper][poster][website][code][BibTex]
Past publications at Stanford:
- Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally
International Conference on Learning Representations (ICLR), April 2018. - Efficient Sparse-winograd Convolutional Neural Networks
Xingyu Liu, Jeff Pool, Song Han, William J. Dally
International Conference on Learning Representations (ICLR), April 2018. - DSD: Dense-Sparse-Dense Training for Deep Neural Networks
Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Shijian Tang, Erich Elsen, Bryan Catanzaro, John Tran, William J. Dally
International Conference on Learning Representations (ICLR), April 2017. - Trained Tenary Quantization
Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally
International Conference on Learning Representations (ICLR), April 2017. - Software-Hardware Co-Design for Efficient Neural Network Acceleration
Kaiyuan Guo, Song Han, Song Yao, Yu Wang, Yuan Xie, Huazhong Yang
Hot Chips special issue of IEEE Micro, March/April 2017 - ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA
Song Han, Junlong Kang, Huizi Mao, Yubin Li, Dongliang Xie, Hong Luo, Yu Wang, Huazhong Yang, William J. Dally
NIPS workshop on Efficient Methods for Deep Neural Networks (EMDNN), Dec 2016, Best Paper Honorable Mention.
International Symposium on Field-Programmable Gate Arrays (FPGA), Feb 2017, Best Paper Award. - Compressing and Regularizing Deep Neural Networks, Improving Prediction Accuracy Using Deep Compression and DSD Training
Song Han
O’Reilly, Nov 2016. - EIE: Efficient Inference Engine on Compressed Deep Neural Network
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, William J. Dally
International Symposium on Computer Architecture (ISCA), June 2016; Hotchips, Aug 2016. - Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han, Huizi Mao, William J. Dally
NIPS Deep Learning Symposium, December 2015.
International Conference on Learning Representations (ICLR), May 2016, Best Paper Award. - Learning both Weights and Connections for Efficient Neural Networks
Song Han, Jeff Pool, John Tran, William J. Dally
Advances in Neural Information Processing Systems (NIPS), December 2015. - SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and < 0.5MB Model Size
Forrest Iandola, Song Han, Matthew Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer
arXiv 2016. - Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware
Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Song Yao, Song Han, Yu Wang, Huazhong Yang
IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 2016. - Hardware-friendly Convolutional Neural Network with Even-number Filter Size
Song Yao, Song Han, Kaiyuan Guo, Jianqiao Wangni, Yu Wang, William J. Dally
International Conference on Learning Representations Workshop, May 2016.