Song Han is an assistant professor at MIT EECS. Dr. Han received the Ph.D. degree in Electrical Engineering from Stanford advised by Prof. Bill Dally. Dr. Han’s research focuses on efficient deep learning computing. He proposed “Deep Compression” and “ Efficient Inference Engine” that impacted the industry. His work received the best paper award in ICLR’16 and FPGA’17. He is the co-founder and chief scientist of DeePhi Tech (a leading efficient deep learning solution provider), which was acquired by Xilinx. The pruning, compression and acceleration techniques have been integrated into products. His hobbies include biking, snowboarding, drum sets, and design.
My recent research focus on efficient algorithm and hardware for computation-intensive AI applications. I am looking for PhD and UROP students interested in deep learning and computer architecture. Below are the research areas of HAN Lab:
H: High performance, High energy efficiency Hardware
A: AutoML, Architectures and Accelerators for AI
N: Novel algorithms for Neural Networks
Keywords: efficient AI, edge AI, auto AI; model compression, gradient compression, compact model design, sparsity, auto pruning, auto quantization, neural architecture search, efficient video recognition, efficient 3D recognition, specialized model, specialized hardware, hardware acceleration, FPGA, neural network and hardware co-design.
In the post-ImageNet era, computer vision and machine learning researchers are solving more complicated AI problems using larger data sets driving the demand for more computation. However, Moore’s Law is slowing down, Dennard scaling has stopped, the amount of computation per unit cost and power is no longer increasing at its historic rate. This mismatch between supply and demand for computation highlights the need for co-designing efficient machine learning algorithms and domain-specific hardware architectures. The vast design space across algorithm and hardware is difficult to be explored by human engineers. We are constrained not only by computation resource but also human resource. Therefore, we need auto AI techniques. We are recently working on hardware-centric auto AI: ProxylessNAS [ICLR’19], AMC [ECCV’18], HAQ [CVPR’19].
I’m interested in application-driven, domain-specific computer architecture research. I’m interested in achieving higher efficiency by tailoring the hardware architecture to characteristics of the application domain, and also innovating on efficient algorithms that are hardware-friendly (TSM [ICCV’19] for efficient video recognition, PVCNN for efficient 3D point cloud recognition) . My current research center around co-designing efficient algorithms and hardware systems for machine learning, to free AI from the power hungry hardware beasts and democratize AI to cheap mobile devices, reducing the cost of running deep learning on data centers, as well as automating machine learning model design. I enjoy the research intersections across machine learning algorithms and computer architecture.
- July 2019: TSM: Temporal Shift Module for Efficient Video Understanding is accepted by ICCV’19. Video understanding is more computation intensive than images and it is expensive to deploy on edge devices. Frames in the temporal dimension is highly redundant. TSM uses 2D convolution’s computation complexity and achieves better temporal modeling ability than 3D convolution. Uni-directional TSM also enables low-latency, real-time video recognition. [paper][demo][code][industry integration].
- June 2019: Song is awarded MIT Technology Review 35 Innovators Under 35.
- June 2019: HAN Lab team received the first place in the Visual Wake-up Word Challenge@CVPR’19. The task is to recognize whether there’s human in front of a camera as visual wake up. It runs on an always-on IoT device and has to fit a tight computation budget: with <250KB model size, <250KB peak memory usage, <60M MAC. The techniques are described in this paper.
- June 2019: HAN Lab team received the third place in the classification track of the LPIRC competition@CVPR. The task is to perform image classification within 30ms latency on a Pixel-2 phone while achieving higher accuracy. The techniques are described in this paper.
- June 2019: Song is presenting “Design Automation for Efficient Deep Learning by Hardware-aware Neural Architecture Search and Compression” at ICML workshop on On-Device Machine Learning& Compact Deep Neural Network Representations, CVPR workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications, CVPR workshop on Efficient Deep Learning for Computer Vision, UCLA, TI and
- June 2019: Open source. AMC: AutoML for Model Compression and Acceleration on Mobile Devices is available on Github. AMC uses reinforcement learning to automatically find the optimal sparsity ratio for channel pruning.
- June 2019: Open source. HAQ: Hardware-aware Automated Quantization with Mixed Precision is available on Github.
- May 2019: Song Han received Facebook Research Award.
- April 2019: Defensive Quantization on MIT News: Improving Security as Artificial Intelligence Moves to Smartphones.
- April 2019: Our manuscript of Design Automation for Efficient Deep Learning Computing is available on arXiv.[slides]
- March 2019: ProxylessNAS on MIT News: Kicking Neural Network Design Automation into High Gear and IEEE Spectrum: Using AI to Make Better AI.
- March 2019: HAQ: Hardware-aware Automated Quantization with Multi-precision is accepted by CVPR’19 as oral presentation. HAQ leverages reinforcement learning to automatically determine the quantization policy (bit width per layer), and we take the hardware accelerator’s feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback (both latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.
So far, ProxylessNAS [ICLR’19] => AMC [ECCV’18] => HAQ [CVPR’19] forms a pipeline of efficient AutoML.
- Feb 2019: Song presented “Bandwidth-Efficient Deep Learning with Algorithm and Hardware Co-Design” at ISSCC’19 in the forum “Intelligence at the Edge: How Can We Make Machine Learning More Energy Efficient?
- Jan 2019: Song is appointed to the Robert J. Shillman (1974) Career Development Chair.
- Jan 2019: “Song Han: Democratizing artificial intelligence with deep compression” by MIT Industry Liaison Program. [article][video]
- Dec 2018: Congrats Xiangning received the 2nd place in the feedback phase of the NeuraIPS’18 AutoML Challenge: AutoML for Lifelong Machine Learning.
- Dec 2018: Our work on Defensive Quantization: When Efficiency Meets Robustness is accepted by ICLR’19. Neural network quantization is becoming an industry standard to compress and efficiently deploy deep learning models. Is model compression a free lunch? No, if not treated carefully. We observe that the conventional quantization approaches are vulnerable to adversarial attacks. This paper aims to raise people’s awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models. [paper][MIT News]
- Dec 2018: Our work on Learning to Design Circuits appeared at NeurIPS workshop on Machine Learning for Systems. Analog IC design relies on human experts to search for parameters that satisfy circuit specifications with their experience and intuitions, which is highly labor intensive and time consuming. This paper propose a learning based approach to size the transistors and help engineers to shorten the design cycle. [paper]
- Dec 2018: Our work on ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware is accepted by ICLR’19. Neural Architecture Search (NAS) is computation intensive. ProxylessNAS saves the GPU hours by 200x than NAS, saves GPU memory by 10x than DARTS, while directly searching on ImageNet. ProxylessNAS is hardware-aware. It can design specialized neural network architecture for different hardware, making inference fast. With >74.5% top-1 accuracy, the measured latency of ProxylessNAS is 1.8x faster than MobileNet-v2, the current industry standard for mobile vision. [paper][code][demo][poster][MIT news][IEEE Spectrum]
- Sep 2018: Song Han received Amazon Machine Learning Research Award.
- Sep 2018: Song Han received SONY Faculty Award.
- Sep 2018: Our work on AMC: AutoML for Model Compression and Acceleration on Mobile Devices is accepted by ECCV’18. This paper proposes learning-based method to perform model compression, rather than relying on human heuristics and rule-based methods. AMC can automate the model compression process, achieve better compression ratio, and also be more sample efficient. It takes shorter time can do better than rule-based heuristics. AMC compresses ResNet-50 by 5x without losing accuracy. AMC makes MobileNet-v1 2x faster with 0.4% loss of accuracy. [paper / bibTeX]
- June 2018: Song presents invited paper “Bandwidth Efficient Deep Learning” at Design Automation Conference (DAC’18). The paper talks about techniques to save memory bandwidth, networking bandwidth, and engineer bandwidth for efficient deep learning.
- Mar 26, 2018: Song presented Deep Gradient Compression at NVIDIA GPU Technology Conference.
- Feb 26, 2018: Song presented “Bandwidth Efficient Deep Learning: Challenges and Trade-offs” at FPGA’18 panel session.
- Jan 29, 2018: Deep Gradient Compression is accepted by ICLR’18. This technique can reduce the communication bandwidth by 500x and improves the scalability of large-scale distributed training. [slides].
- Ph.D. Stanford University, Sep. 2012 to Sep. 2017
- B.S. Tsinghua University, Aug. 2008 to Jul. 2012
- Email: FirstnameLastname [at] mit [dot] edu
- PhD, UROP and summer intern applicants: please email han [dot] lab [dot] mit [at] gmail so that it won’t be filtered.