Song Han

MIT EECS

Accelerated Deep Learning Computing

About

 

 

 

 

 

 

Song Han is an assistant professor at MIT EECS. Dr. Han received the Ph.D. degree in Electrical Engineering from Stanford advised by Prof. Bill Dally. Dr. Han’s research focuses on efficient deep learning computing. He proposed “Deep Compression” and the hardware implementation “ Efficient Inference Engine” that impacted the industry. His work received the best paper award in ICLR’16 and FPGA’17. The pruning, compression and acceleration techniques have been integrated into many AI chip products. His hobbies include biking, snowboarding, drum sets, and design.

Research Interests

My recent research focus on efficient algorithm and hardware for computation-intensive AI applications. I am looking for PhD and UROP students interested in deep learning and computer architecture. 
H: High performance, High energy efficiency Hardware
A: AutoML, Architectures and Accelerators for AI
N: Novel algorithms for Neural Networks

Efficient AI on the edge, auto AI; model compression, gradient compressioncompact model design, sparsity, auto pruning,  auto quantization, neural architecture search, efficient video recognition, efficient 3D recognition, specialized model, specialized hardware, hardware acceleration, FPGA, neural network and hardware co-design.

Google Scholar, Github, LinkedIn, Group WebsiteYouTube, Twitter, Facebook

Awards

  • NSF Career Award
  • MIT Technology Review list of 35 Innovators Under 35
  • SONY Faculty Award
  • Amazon Machine Learning Research Award
  • Facebook Research Award
  • Best paper award, ICLR’16
  • Best paper award, FPGA’17

News Blog

  • Feb 2020: HAN Lab Dawnlight team is awarded the first place in the NeurIPS’19 Low Power Computer Vision Challenge (both classification and detection track) using the Once-for-all Network.

     

  • Jan 2020: Song received the NSF CAREER Award for “Efficient Algorithms and Hardware for Accelerated Machine Learning”.

     

  • Dec 2019: Once-For-All Network (OFA) is accepted by ICLR’2020. Train only once, specialize for many deployment scenarios. OFA decouples model training from architecture search. OFA consistently achieves better performance than SOTA models (MobileNet-v3, EfficientNet) while reducing orders of magnitude GPU hours and CO2 emission than NAS. [Paper]

     

  • Dec 2019: Lite Transformer with Long Short Term Attention is accepted by ICLR’2020. We investigate the mobile setting for NLP tasks to facilitate the deployment of NLP model on the edge devices. [Paper]

     

  • Nov 2019: SpArch: Efficient Architecture for Sparse Matrix Multiplication to appear at International Symposium on High-Performance Computer Architecture (HPCA) 2020. [Paper]

     

  • Nov 2019: AutoML for Architecting Efficient and Specialized Neural Networks to appear at IEEE Micro.

     

  • Oct 2019: TSM is featured by MIT News, EngadgetNVIDIA News, MIT Technology Review.

     

  • Oct 2019: HAN Lab team is awarded the first place in the Low Power Computer Vision Challenge, DSP track at ICCV’19 using the Once-for-all Network.
  • Oct 2019: Our solution to the Visual Wake Words Challenge is highlighted by Google. The technique is ProxylessNAS.[demo][code].

     

  • Oct 2019: Open source: the search code for ProxylessNAS is available on Github.

     

  • Oct 2019: Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos is accepted by NeurIPS workshop on Systems for ML. TSM, a compact model for video understanding, is hardware-friendly not only for inference but also for training. With TSM, we can scale up Kinetics training to 1536 GPUs and reduce the training time from 2 days to 15 minutes. TSM is highlighted at the opening remarks at AI Research Week hosted by the MIT-IBM Watson AI Lab. [paper]

     

  • Oct 2019: Distributed Training across the World is accepted by NeurIPS workshop on Systems for ML.

     

  • Oct 2019: Neural-Hardware Architecture Search is accepted by NeurIPS workshop on ML for Systems.

     

  • Sep 2019: Point-Voxel CNN for Efficient 3D Deep Learning is accepted by NeurIPS’19 as spotlight presentation. [paper]

     

  • Sep 2019: Deep Leakage from Gradients is accepted by NeurIPS’19. [paper]

     

  • July 2019: TSM: Temporal Shift Module for Efficient Video Understanding is accepted by ICCV’19. Video understanding is more computationally intensive than images, making it harder to deploy on edge devices. Frames in the temporal dimension is highly redundant. TSM uses 2D convolution’s computation complexity and achieves better temporal modeling ability than 3D convolution. TSM also enables low-latency, real-time video recognition (13ms latency on Jetson Nano and 70ms latency on Raspberry PI-3). [paper][demo][code][poster][industry integration][MIT News][Engadget][MIT Technology Review][NVIDIA News][NVIDIA Jetson Developer Forum]

     

  • June 2019: HAN Lab team is awarded the first place in the Visual Wake-up Word Challenge@CVPR’19. The task is human detection on IoT device that has a tight computation budget:  <250KB model size, <250KB peak memory usage, <60M MAC. The techniques are described in the ProxylessNAS paper. [code][Raspberry Pi and Pixel 3 demo]
  • Jan 2019: “Song Han: Democratizing artificial intelligence with deep compression” by MIT Industry Liaison Program. [article][video]
  • Dec 2018: Our work on Defensive Quantization: When Efficiency Meets Robustness is accepted by ICLR’19. Neural network quantization is becoming an industry standard to compress and efficiently deploy deep learning models. Is model compression a free lunch? No, if not treated carefully. We observe that the conventional quantization approaches are vulnerable to adversarial attacks. This paper aims to raise people’s awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models. [paper][MIT News]
  • Dec 2018: Our work on Learning to Design Circuits appeared at NeurIPS workshop on Machine Learning for Systems. Analog IC design relies on human experts to search for parameters that satisfy circuit specifications with their experience and intuitions, which is highly labor intensive and time consuming. This paper propose a learning based approach to size the transistors and help engineers to shorten the design cycle. [paper]
  • Dec 2018: Our work on ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware is accepted by ICLR’19. Neural Architecture Search (NAS) is computation intensive. ProxylessNAS saves the GPU hours by 200x than NAS, saves GPU memory by 10x than DARTS, while directly searching on ImageNet. ProxylessNAS is hardware-aware. It can design specialized neural network architecture for different hardware, making inference fast. With >74.5% top-1 accuracy, the measured latency of ProxylessNAS is 1.8x faster than MobileNet-v2, the current industry standard for mobile vision. [paper][code][demo][poster][MIT news][IEEE Spectrum]
  • Sep 2018: Song Han received Amazon Machine Learning Research Award.
  • Sep 2018: Song Han received SONY Faculty Award.
  • Sep 2018: Our work on AMC: AutoML for Model Compression and Acceleration on Mobile Devices is accepted by ECCV’18. This paper proposes learning-based method to perform model compression, rather than relying on human heuristics and rule-based methods. AMC can automate the model compression process, achieve better compression ratio, and also be more sample efficient. It takes shorter time can do better than rule-based heuristics. AMC compresses ResNet-50 by 5x without losing accuracy. AMC makes MobileNet-v1 2x faster with 0.4% loss of accuracy. [paper / bibTeX]
  • June 2018: Song presents invited paper “Bandwidth Efficient Deep Learning” at Design Automation Conference (DAC’18). The paper talks about techniques to save memory bandwidth, networking bandwidth, and engineer bandwidth for efficient deep learning.
  • Feb 26, 2018: Song presented “Bandwidth Efficient Deep Learning: Challenges and Trade-offs” at FPGA’18 panel session.
  • Jan 29, 2018: Deep Gradient Compression is accepted by ICLR’18. This technique can reduce the communication bandwidth by 500x and improves the scalability of large-scale distributed training. [slides].

Education

  • Ph.D. Stanford University, Sep. 2012 to Sep. 2017
  • B.S. Tsinghua University, Aug. 2008 to Jul. 2012

Contact

  • Email: FirstnameLastname [at] mit [dot] edu