Song Han is an assistant professor in the Electrical Engineering and Computer Science Department of the Massachusetts Institute of Technology (MIT). Dr. Han received the Ph.D. degree in Electrical Engineering from Stanford University advised by Prof. Bill Dally. His industry experiences include Google Brain and Facebook Applied Machine Learning.
Dr. Han’s research focuses on energy-efficient deep learning, at the intersection between machine learning and computer architecture. He proposed Deep Compression that can compress deep neural networks by an order of magnitude without losing the prediction accuracy. He designed EIE: Efficient Inference Engine, a hardware accelerator that can perform inference directly on the compressed sparse model, which saves memory bandwidth and results in significant speedup and energy saving. His work has been featured by TheNextPlatform, TechEmergence, Embedded Vision and O’Reilly. His research efforts in model compression and hardware acceleration received the Best Paper Award at ICLR’16 and the Best Paper Award at FPGA’17. Before joining Stanford, Song graduated from Tsinghua University.
I joined MIT EECS as an assistant professor (MIT news). I am looking for PhD students interested in deep learning and computer architecture. Below are the missions of HAN’s Lab:
H: High performance, High energy efficiency Hardware
A: AutoML, Architectures and Accelerators for AI
N: Novel algorithms for Neural Networks and deep learning
S: Small models, Scalable Systems, and Specialized Silicon
In the post-ImageNet era, computer vision and machine learning researchers are solving more complicated AI problems using larger data sets driving the demand for more computation. However, we are in the post-Moore’s Law world where the amount of computation per unit cost and power is no longer increasing at its historic rate. This mismatch between supply and demand for computation highlights the need for co-designing efficient machine learning algorithms and domain-specific hardware architectures.
I’m interested in application-driven, domain-specific computer architecture research. The end of Dennard scaling makes power become the key constraint. I’m interested in achieving higher efficiency by tailoring the architecture to characteristics of the application domain. My current research center around co-designing efficient algorithms and hardware systems for machine learning, to free AI from the power hungry hardware beasts and democratize AI to cheap mobile devices, reducing the cost of running deep learning on data centers, as well as automating machine learning model design. I enjoy the research intersections across machine learning algorithms and computer architecture.
- Dec 2018: Our work on ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware is available on arXiv. Neural Architecture Search (NAS) is computation intensive. ProxylessNAS saves the GPU hours by 200x than NAS, saves GPU memory by 10x than DARTS, while directly searching on ImageNet. ProxylessNAS is hardware-aware. It can design specialized neural network architecture for different hardware, making inference fast. With >74.5% top-1 accuracy, the measured latency of ProxylessNAS is 1.8x faster than MobileNet-v2, the current industry standard for mobile vision. [paper][code][demo][website]
- Nov 2018: Our work on Efficient Video Understanding with Temporal Shift Module (TSM) is available on arXiv. Video understanding is more computation intensive than images and it is expensive to deploy. TSM uses 2D convolution’s computation complexity and achieves better temporal modeling ability than 3D convolution. Measured on P100 GPU, TSM achieved 1.8% higher accuracy at 8x lower latency and 12x higher throughput compared with I3D. TSM ranks the first on both Something-Something V1 and V2 leaderboards as of Nov 2018. [paper][website][demo][slides]
- Sep 2018: Song Han received Amazon Machine Learning Research Award.
- Sep 2018: Song Han received SONY Faculty Award.
- Sep 2018: Our work on AMC: AutoML for Model Compression and Acceleration on Mobile Devices is accepted by ECCV’18. This paper proposes learning-based method to perform model compression, rather than relying on human heuristics and rule-based methods. AMC can automate the model compression process, achieve better compression ratio, and also be more sample efficient. It takes shorter time can do better than rule-based heuristics. AMC compresses ResNet-50 by 5x without losing accuracy. AMC makes MobileNet-v1 2x faster with 0.4% loss of accuracy. [paper / bibTeX]
- June 2018: Song presents invited paper “Bandwidth Efficient Deep Learning” at Design Automation Conference (DAC’18). The paper talks about techniques to save memory bandwidth, networking bandwidth, and engineer bandwidth for efficient deep learning.
- Mar 26, 2018: Song presented Deep Gradient Compression at NVIDIA GPU Technology Conference.
- Feb 26, 2018: Song presented “Bandwidth Efficient Deep Learning: Challenges and Trade-offs” at FPGA’18 panel session.
- Jan 29, 2018: Deep Gradient Compression is accepted by ICLR’18. This technique can reduce the communication bandwidth by 500x and improves the scalability of large-scale distributed training. [slides].
- Ph.D. Stanford University, Sep. 2012 to Sep. 2017
- B.S. Tsinghua University, Aug. 2008 to Jul. 2012
- Area Chair, International Conference on Learning Representations (ICLR’19)
- Program committee, System for Machine Learning Conference (SysML’19)
- Program committee, High Performance Computer Architecture (HPCA’18)
- Email: FirstnameLastname [at] mit [dot] edu
- PhD and summer intern applicants: please email han [dot] lab [dot] mit [at] gmail so that it won’t be filtered.