Song Han is an assistant professor at MIT’s EECS. He received his PhD degree from Stanford University. His research focuses on efficient deep learning computing. He proposed “deep compression” technique that can reduce neural network size by an order of magnitude without losing accuracy, and the hardware implementation “efficient inference engine” that first exploited pruning and weight sparsity in deep learning accelerators. His team’s work on hardware-aware neural architecture search (ProxylessNAS, Once-for-All Network (OFA), MCUNet) was highlighted by MIT News, Wired, Qualcomm News, VentureBeat, IEEE Spectrum, integrated in PyTorch and AutoGluon, received six low-power computer vision contest awards in flagship AI conferences, and a world-record in the open division of MLPerf inference benchmark (1.078M Img/s). Song received Best Paper awards at ICLR’16 and FPGA’17, Amazon Machine Learning Research Award, SONY Faculty Award, Facebook Faculty Award, NVIDIA Academic Partnership Award. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning” and the IEEE “AIs 10 to Watch: The Future of AI” award.
TinyML, putting AI on a diet, efficient algorithms and hardware for computation-intensive AI applications.
We actively collaborate with industry partners. Many research projects have successfully influenced industry products. Welcome to drop me an email for collaboration.
Model Compression / AutoML / NAS: [MLSys’21][NeurIPS’20, spotlight][NeurIPS’20][ICLR’20][CVPR’20][CVPR’20][ICLR’19][CVPR’19, oral][ECCV’18][ICLR’16, BP][NIPS’15]
Efficient AI on edge devices: Video / Point Cloud / NLP / GAN: [NeurIPS’20][ACL’20][CVPR’20][ECCV’20][ICLR’20][NeurIPS’19, spotlight][ICCV’19]
HW for ML: [HPCA’21][HPCA’20][FPGA’17, BP][ISCA’16]
ML for HW: [DAC’21][DAC’20][NeurIPS’19 W]
Efficiency and privacy: [ECCV’20][NeurIPS’19](ICLR’19)
- IEEE “AIs 10 to Watch: The Future of AI” Award, 2020
- NSF CAREER Award, 2020
- NVIDIA Academic Partnership Award, 2020
- MIT Technology Review list of 35 Innovators Under 35, 2019
- SONY Faculty Award, 2017/2018/2020
- Amazon Machine Learning Research Award, 2018/2019
- Facebook Research Award, 2019
- Best paper award, FPGA’2017
- Best paper award, ICLR’2016
- First place, 5th Low-Power Computer Vision Challenge, CPU detection track & FPGA track, Aug 2020 [OFA]
- First place, 3D semantic segmentation on SemanticKitti, July 2020 [SPVNAS]
- First place, 4th Low-Power Computer Vision Challenge, both CPU classification and detection track, Jan 2020 [OFA]
- First place, 3rd Low-Power Computer Vision Challenge, DSP track, @ICCV 2019 [OFA]
- First place, MicroNet Challenge, NLP track (WikiText-103), @NeurIPS 2019 [paper]
- First place, Visual Wake Words Challenge, TF-lite track, @CVPR 2019 [ProxylessNAS][demos]
- MCUNet [NeurIPS’20 spotlight]:
- Wired, AI Algorithms Are Slimming Down to Fit in Your Fridge
- MIT News, System brings deep learning to “internet of things” devices
Stacey on IoT, Researchers take a 3-pronged approach to Edge AI
- IBM, New IBM-MIT system brings AI to microcontrollers – paving the way to ‘smarter’ IoT
- Analytics Insight, Amalgamating Ml And Iot In Smart Home Devices
- Techable, MITがIoTデバイス向けのコンパクトなAIシステムを開発！
- Tendencias, El aprendizaje profundo impulsa el Internet de las cosas
- DiffAugment [NeurIPS’20]:
- Venture Beat, MIT researchers claim augmentation technique can train GANs with less data
- Once-For-All Network [ICLR’20]:
- Venture Beat, MIT aims for energy efficiency in AI model training
- MIT News, Reducing the carbon footprint of artificial intelligence
- AI Daily, New MIT Architecture May Lead To Smaller Carbon Footprints For Neural Networks
- TechHQ, How MIT is making ground towards ‘greener’ AI
- Singularity Hub, This ‘Once-For-All’ Neural Network Could Slash AI’s Carbon Footprint
- Inhabitat, MIT moves toward greener, more sustainable artificial intelligence
- Qualcomm, Research from MIT shows promising results for on-device AI
- Temporal Shift Module [ICCV’19]:
- NVIDIA, New MIT Video Recognition Model Dramatically Improves Latency on Edge Devices
- MIT Technology Review, Powerful computer vision algorithms are now small enough to run on your phone
- Engadget, MIT-IBM developed a faster way to train video recognition AI
- MIT News, Faster video recognition for the smartphone era
- ProxylessNAS [ICLR’19]:
- IEEE Spectrum, Using AI to Make Better AI
- MIT News, Kicking neural network design automation into high gear
- April 2021: Once-for-All (OFA) Network got a world-record in the open division of MLPerf Inference Benchmark: 1.078M inferences per second on 8 A100 GPUs. [Github]
- March 2021: HAQ: Hardware-Aware Automated Quantization with Mixed Precision is integrated by Intel OpenVINO Toolkit.
- Feb 2021: Efficient and Robust LiDAR-Based End-to-End Navigation is accepted by ICRA’21. We introduce Fast-LiDARNet that is based on sparse GPU kernel optimization and hardware-aware neural architecture search, improving the speed from 5 fps to 47 fps; together with Hybrid Evidential Fusion that directly estimates the uncertainty and fuse the control predictions, which reduces the number of takeovers in road test.
- Feb 2021: Anycost GANs for Interactive Image Synthesis and Editing is accepted by CVPR’21. GAN is big. GAN is slow. It takes seconds to edit a single on edge devices, prohibiting interactive user experience. Anycost GAN can be executed at various cost budgets (up to 10× computation reduction) and adapt to a wide range of hardware and latency requirements. When deployed on edge devices, our model achieves 6-12× speedup, enabling interactive image editing.
- Jan 2021: IOS: Inter-operator Scheduler For CNN Acceleration is accepted by MLSys’21. Existing deep learning frameworks focus on optimizing intra-operator parallelization. However, a single operator can not fully utilize the available parallelism in GPU, especially under small batch size. We extensively study the parallelism between operators and propose Inter-Operator Scheduler (IOS) to automatically schedule the execution of multiple operators in parallel.
- Oct 2020: MCUNet: Tiny Deep Learning on IoT Devices is covered by MIT News: System brings deep learning to “internet of things” devices and [Wired][Stacey on IoT][Morning Brew][IBM][Analytics Insight]
- Oct 2020: SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning is accepted by HPCA’21. (sister project: SpArch: Efficient Architecture for Sparse Matrix Multiplication, HPCA’20)
- Sep 2020: MCUNet: Tiny Deep Learning on IoT Devices is accepted by NeurIPS’20 as spotlight presentation.
- Sep 2020: Tiny Transfer Learning: Reduce Memory, not Parameters for Efficient On-Device Learning is accepted by NeurIPS’20.
- Sep 2020: Differentiable Augmentation for Data-Efficient GAN Training is accepted by NeurIPS’20.
- Aug 2020: OnceForAll team received the first place in the Low-Power Computer Vision Challenge, mobile CPU detection track.
- Aug 2020: OnceForAll team received the first place in the Low-Power Computer Vision Challenge, FPGA track.
- July 2020: SPVNAS ranks first on SemanticKITTI.
- July 2020: Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution accepted by ECCV’20.
- July 2020: DataMix: Efficient Privacy-Preserving Edge-Cloud Inference accepted by ECCV’20.
- June 2020: Once-For-All Network (OFA) for on-device AI is highlighted by Qualcomm.
- June 2020: We open sourced Data-Efficient GAN Training with DiffAugment on Github. Covered by VentureBeat.
- May 2020: HAT: Hardware-Aware Transformer for Efficient Natural Language Processing to appear at ACL’2020 [paper][code][website]. This is our second paper on efficient NLP on edge devices, together with Lite Transformer [ICLR’20 paper][code][website][slides]
- April 2020: Slides for ICLR’20 NAS workshop and TinyML webinar “AutoML for TinyML with Once-for-All Network” is available.
- April 2020: Once-For-All Network (OFA) is covered by MIT News and Venture Beat: Reducing the carbon footprint of artificial intelligence: MIT system cuts the energy required for training and running neural networks.
- Mar 2020: Point-Voxel CNN for Efficient 3D Deep Learning is highlighted by NVIDIA Jetson Community Project Spotlight.
- Mar 2020: Point-Voxel CNN for Efficient 3D Deep Learning is deployed on MIT Driverless, improving the 3D detection accuracy from 95% to 99.93%, improving the detection range from 8m to 12m, reducing the latency from 2ms/object to 1.25ms/object [demo]
- Feb 2020: SpArch: Efficient Architecture for Sparse Matrix Multiplication appeared at International Symposium on High-Performance Computer Architecture (HPCA) 2020. Sparse Matrix Multiplication (SpMM) is an important primitive for many applications (graphs, sparse neural networks, etc). SpArch has a spatial merger array to perform parallel merge of the partial sum, and a Huffman Tree scheduler to determine the optimal order to merge the partial sums, reducing the DRAM access. [paper][slides][website][2min talk][full talk]
- Feb 2020: GAN Compression: Learning Efficient Architectures for Conditional GANs and APQ: Joint Search for Network Architecture, Pruning and Quantization Policy are accepted by CVPR’20.
- Feb 2020: With our efficient model, the Once-for-All Network, our team is awarded the first place in the Low Power Computer Vision Challenge (both classification and detection track).
- Jan 2020: Song received the NSF CAREER Award for “Efficient Algorithms and Hardware for Accelerated Machine Learning”.
- Dec 2019: Once-For-All Network (OFA) is accepted by ICLR’2020. Train only once, specialize for many hardware platforms, from CPU/GPU to hardware accelerators. OFA decouples model training from architecture search.
OFA consistently outperforms SOTA NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNet-V3) while reducing orders of magnitude GPU hours and CO2 emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting (<600M FLOPs). [Paper][Code][Poster][MIT News][Qualcomm News][VentureBeat]
- Dec 2019: Lite Transformer with Long Short Term Attention is accepted by ICLR’2020. We investigate the mobile setting for NLP tasks to facilitate the deployment of NLP model on the edge devices. [Paper]
- Nov 2019: AutoML for Architecting Efficient and Specialized Neural Networks to appear at IEEE Micro.
- Oct 2019: TSM is featured by MIT News, Engadget, NVIDIA News, MIT Technology Review.
- Oct 2019: Our team is awarded the first place in the Low Power Computer Vision Challenge, DSP track at ICCV’19 using the Once-for-all Network.
- Oct 2019: Our winning solution to the Visual Wake Words Challenge is highlighted by Google. The technique is ProxylessNAS.[demo][code].
- Oct 2019: Open source: the search code for ProxylessNAS is available on Github.
- Oct 2019: Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos is accepted by NeurIPS workshop on Systems for ML. TSM, a compact model for video understanding, is hardware-friendly not only for inference but also for training. With TSM, we can scale up Kinetics training to 1536 GPUs and reduce the training time from 2 days to 15 minutes. TSM is highlighted at the opening remarks at AI Research Week hosted by the MIT-IBM Watson AI Lab. [paper]
- Oct 2019: Distributed Training across the World is accepted by NeurIPS workshop on Systems for ML.
- Oct 2019: Neural-Hardware Architecture Search is accepted by NeurIPS workshop on ML for Systems.
- Sep 2019: Point-Voxel CNN for Efficient 3D Deep Learning is accepted by NeurIPS’19 as spotlight presentation.
- Sep 2019: Deep Leakage from Gradients is accepted by NeurIPS’19. [paper][poster][code][website]
- July 2019: TSM: Temporal Shift Module for Efficient Video Understanding is accepted by ICCV’19. Video understanding is more computationally intensive than images, making it harder to deploy on edge devices. Frames in the temporal dimension is highly redundant. TSM uses 2D convolution’s computation complexity and achieves better temporal modeling ability than 3D convolution. TSM also enables low-latency, real-time video recognition (13ms latency on Jetson Nano and 70ms latency on Raspberry PI-3). [paper][demo][code][poster][industry integration@NVIDIA][MIT News][Engadget][MIT Technology Review][NVIDIA News][NVIDIA Jetson Developer Forum]
- June 2019: HAN Lab is awarded the first place in the Visual Wake-up Word Challenge@CVPR’19. The task is human detection on IoT device that has a tight computation budget: <250KB model size, <250KB peak memory usage, <60M MAC. The techniques are described in the ProxylessNAS paper. [code][Raspberry Pi and Pixel 3 demo]
- June 2019: Song is presenting “Design Automation for Efficient Deep Learning by Hardware-aware Neural Architecture Search and Compression” at ICML workshop on On-Device Machine Learning& Compact Deep Neural Network Representations, CVPR workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications, CVPR workshop on Efficient Deep Learning for Computer Vision, UCLA, TI and
- June 2019: Open source. AMC: AutoML for Model Compression and Acceleration on Mobile Devices is available on Github. AMC uses reinforcement learning to automatically find the optimal sparsity ratio for channel pruning.
- June 2019: Open source. HAQ: Hardware-aware Automated Quantization with Mixed Precision is available on Github.
- May 2019: Song Han received Facebook Research Award.
- April 2019: Defensive Quantization on MIT News: Improving Security as Artificial Intelligence Moves to Smartphones.
- April 2019: Our manuscript of Design Automation for Efficient Deep Learning Computing is available on arXiv (accepted by the Micro journal). [slides]
- March 2019: ProxylessNAS is covered by MIT News: Kicking Neural Network Design Automation into High Gear and IEEE Spectrum: Using AI to Make Better AI.
- March 2019: HAQ: Hardware-aware Automated Quantization with Multi-precision is accepted by CVPR’19 as oral presentation. HAQ leverages reinforcement learning to automatically determine the quantization policy (bit width per layer), and we take the hardware accelerator’s feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback (both latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.
So far, ProxylessNAS [ICLR’19] => AMC [ECCV’18] => HAQ [CVPR’19] forms a pipeline of efficient AutoML.
- Feb 2019: Song presented “Bandwidth-Efficient Deep Learning with Algorithm and Hardware Co-Design” at ISSCC’19 in the forum “Intelligence at the Edge: How Can We Make Machine Learning More Energy Efficient?
- Jan 2019: Song is appointed to the Robert J. Shillman (1974) Career Development Chair.
- Jan 2019: “Song Han: Democratizing artificial intelligence with deep compression” by MIT Industry Liaison Program. [article][video]
- Dec 2018: Congrats Xiangning received the 2nd place in the feedback phase of the NeuraIPS’18 AutoML Challenge: AutoML for Lifelong Machine Learning.
- Dec 2018: Defensive Quantization: When Efficiency Meets Robustness is accepted by ICLR’19. Neural network quantization is becoming an industry standard to compress and efficiently deploy deep learning models. Is model compression a free lunch? No, if not treated carefully. We observe that the conventional quantization approaches are vulnerable to adversarial attacks. This paper aims to raise people’s awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models. [paper][MIT News]
- Dec 2018: Learning to Design Circuits appeared at NeurIPS workshop on Machine Learning for Systems (full version accepted by DAC’2020). Analog IC design relies on human experts to search for parameters that satisfy circuit specifications with their experience and intuitions, which is highly labor intensive and time consuming. This paper propose a learning based approach to size the transistors and help engineers to shorten the design cycle. [paper]
- Dec 2018: Our work on ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware is accepted by ICLR’19. Neural Architecture Search (NAS) is computation intensive. ProxylessNAS saves the GPU hours by 200x than NAS, saves GPU memory by 10x than DARTS, while directly searching on ImageNet. ProxylessNAS is hardware-aware. It can design specialized neural network architecture for different hardware, making inference fast. With >74.5% top-1 accuracy, the measured latency of ProxylessNAS is 1.8x faster than MobileNet-v2, the current industry standard for mobile vision. [paper][code][demo][poster][MIT news][IEEE Spectrum][industry integration: @AWS, @Facebook]
- Sep 2018: Song Han received Amazon Machine Learning Research Award.
- Sep 2018: Song Han received SONY Faculty Award.
- Sep 2018: Our work on AMC: AutoML for Model Compression and Acceleration on Mobile Devices is accepted by ECCV’18. This paper proposes learning-based method to perform model compression, rather than relying on human heuristics and rule-based methods. AMC can automate the model compression process, achieve better compression ratio, and also be more sample efficient. It takes shorter time can do better than rule-based heuristics. AMC compresses ResNet-50 by 5x without losing accuracy. AMC makes MobileNet-v1 2x faster with 0.4% loss of accuracy. [paper][website]
- June 2018: Song presents invited paper “Bandwidth Efficient Deep Learning” at Design Automation Conference (DAC’18). The paper talks about techniques to save memory bandwidth, networking bandwidth, and engineer bandwidth for efficient deep learning.
- Mar 26, 2018: Song presented Deep Gradient Compression at NVIDIA GPU Technology Conference.
- Feb 26, 2018: Song presented “Bandwidth Efficient Deep Learning: Challenges and Trade-offs” at FPGA’18 panel session.
- Jan 29, 2018: Deep Gradient Compression is accepted by ICLR’18. This technique can reduce the communication bandwidth by 500x and improves the scalability of large-scale distributed training. [slides].
Ph.D. Stanford University, advised by Prof. Bill Dally
B.S. Tsinghua University
Email: FirstnameLastname [at] mit [dot] edu
Email for PhD/intern applications: