Song Han

Associate Professor, MIT EECS

About

Song Han is an associate professor at MIT EECS. He received his PhD degree from Stanford University. He proposed the “Deep Compression” technique including pruning and quantization that is widely used for efficient AI computing, and “Efficient Inference Engine” that first brought weight sparsity to modern AI chips, which is a top-5 cited paper in 50 years of ISCA. He pioneered the TinyML research that brings deep learning to IoT devices, enabling learning on the edge (appeared on MIT home page). His team’s work on hardware-aware neural architecture search (once-for-all network) enables users to design, optimize, shrink and deploy AI models to resource-constrained hardware devices, receiving the first place in many low-power computer vision contests in flagship AI conferences.  His team’s recent work on large language model quantization/acceleration (SmoothQuant, AWQ, StreamingLLM) has effectively improved the efficiency of LLM inference, adopted by NVIDIA TensorRT-LLM. Song received best paper awards at ICLR and FPGA, faculty awards from Amazon, Facebook, NVIDIA, Samsung and SONY. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning”, IEEE “AIs 10 to Watch: The Future of AI” award, and Sloan Research Fellowship. Song’s research in efficient AI computing has witnessed successful commercialization and influenced the industry. He was the cofounder of DeePhi (now part of AMD), and cofounder of OmniML (now part of NVIDIA). Song developed the EfficientML.ai course to disseminate efficient ML research.

Recent work: accelerating LLM and Generative AI [slides]

  • StreamingLLM: enable LLMs to generate infinite-length texts with a fixed memory budget by preserving the "attention sinks" in the KV-cache. github
  • EfficientViT: a new family of vision models for high-resolution dense prediction with global receptive field and multi-scale learning. EfficientViT-SAM accelerates SAM from 12 img/s to 538 img/s. github
  • AWQ & TinyChat: on-device LLM inference system that uses 4bit quantization to alleviate the memory bottleneck of LLMs, running Llama2-13B and VILA on Macbook and Jetson Orin. github
  • SmoothQuant: A training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs. github

Teaching

Research

The incredible potential of large models in Artificial Intelligence Generated Content (AIGC), including cutting-edge technologies like Large Language Models (LLMs) and Diffusion Models, have revolutionized a wide range of applications, spanning natural language processing, content generation, creative arts, and more. However, large model size, and high memory and computational requirements present formidable challenges. We aim to tackle these hurdles head-on and make these advanced AI technologies more practical, democratizing access to these future-changing technologies for everyone.
Efficiency improvements in deep learning often start with refining algorithms, but these theoretical gains, like reducing FLOPs and model size, don't always easily lead to practical speed and energy savings. The demand arises for specialized hardware and software systems to bridge this gap. These specialized software and hardware systems create a fresh design dimension independent of the algorithm space. This opens up opportunities for holistic optimization by co-designing both the algorithm and the software/hardware systems.

Research Interests

Industry Impact

Our efficient ML research has influenced and landed in many industry products, thanks to the close collaboration with our sponsors: Intel OpenVino, Intel Neural Compressor, Apple Neural Engine, NVIDIA Sparse Tensor Core, NVIDIA FasterTransformer, AMD-Xilinx Vitis AI, Qualcomm AI Model Efficiency Toolkit (AIMET), Amazon AutoGluon, Microsoft NNI, SONY Neural Architecture Search Library, SONY Model Compression Toolkit,  ADI MAX78000/MAX78002 Model Training and Synthesis Tool, Ford Trailer Backup Assist.

Open source projects with over 1K GitHub stars:

Honors and Awards

2023 Sloan Research Fellowship
5/1/2023
2022 Red Dot Award
5/1/2022
2021 Samsung Global Research Outreach (GRO) Award
5/1/2021
2021 NVIDIA Academic Partnership Award
5/1/2021
2020 NVIDIA Academic Partnership Award
5/1/2020
2020 IEEE "AIs 10 to Watch: The Future of AI" Award
5/1/2020
2020 NSF CAREER Award
5/1/2020
2019 MIT Technology Review list of 35 Innovators Under 35
5/1/2019
2020 SONY Faculty Award
5/1/2020
2017 SONY Faculty Award
5/1/2017
2018 SONY Faculty Award
5/1/2018
2018 Amazon Machine Learning Research Award
5/1/2018
2019 Amazon Machine Learning Research Award
5/1/2019
2019 Facebook Research Award
5/1/2019
6/15/2023
Top 5 cited papers in 50 years of ISCA
of
EIE
5/15/2017
Best Paper Award
of
FPGA 2017
5/15/2016
Best Paper Award
of
ICLR 2016

Competition Awards

First Place (1/150)
,
ACM/IEEE TinyML Design Contest
,
Memory Occupation Track
, @
ICCAD
,
2022
MCUNetV3
First Price
,
6th AI Driving Olympics
,
nuScenes Semantic Segmentation
, @
ICRA
,
2021
SPVNAS
First Place
,
SemanticKITTI leaderboard
,
3D semantic segmentation
, @
ECCV
,
2020
SPVNAS
First Place
,
Low-Power Computer Vision Challenge
,
CPU Detection, FPGA
, @
CVPR
,
2020
OFA
First Place
,
Low-Power Computer Vision Workshop at ICCV 2019
,
DSP
, @
ICCV
,
2019
OFA
First Place
,
Visual Wake Words Challenge
,
TF-lite track
, @
CVPR
,
2019
ProxylessNAS
First Place
,
Low-Power Image Recognition Challenge
,
classification, detection
, @
IEEE
,
2019
OFA

News

  • Feb 2024

    AWQ has been accepted to MLSys 2024!

    AWQ
  • Feb 2024

    We released new version of quantized GEMM/GEMV kernels in TinyChat, leading to 38 tokens/second inference speed on NVIDIA Jetson Orin!

    AWQ
  • Jan 2024

    SwiftInfer, a TensorRT-based implementation makes StreamingLLM more production-grade.

    StreamingLLM
  • Dec 2023

    Congrats Ji Lin completed and defended his PhD thesis: "Efficient Deep Learning Computing: From TinyML to Large Language Model". Ji joined OpenAI after graduation.

  • Dec 2023

    StreamingLLM enables endless and efficient LLM generation on iPhone!

    StreamingLLM
  • Dec 2023

    AWQ is integrated by HuggingFace Transformers' main branch.

    AWQ
  • Dec 2023

    SmoothQuant is integrate by NVIDIA TensorRT-LLM.

    SmoothQuant
  • Jul 2023

    The TinyML and Efficient Deep Learning Computing course will be returning in Fall, with live sessions on YouTube!

    6.5940
  • Jul 2023

    We released TinyChat, an efficient and lightweight chatbot interface based on AWQ. TinyChat enables efficient LLM inference on both cloud and edge GPUs. Llama-2-chat models are supported! Check out our implementation here.

    AWQ
  • Nov 2022
    Congrats
    MCUNetV3
     team on
    First Place (1/150)
     of
    ACM/IEEE TinyML Design Contest
     on
    Memory Occupation Track
     @
    ICCAD
     
    2022
    .
    MCUNetV3
  • Jul 2020
    Congrats
    SPVNAS
     team on
    First Place
     of
    SemanticKITTI leaderboard
     on
    3D semantic segmentation
     @
    ECCV
     
    2020
    .
    SPVNAS
  • Jun 2021
    Congrats
    SPVNAS
     team on
    First Price
     of
    6th AI Driving Olympics
     on
    nuScenes Semantic Segmentation
     @
    ICRA
     
    2021
    .
    SPVNAS
  • Oct 2019
    Congrats
    OFA
     team on
    First Place
     of
    Low-Power Computer Vision Workshop at ICCV 2019
     on
    DSP
     @
    ICCV
     
    2019
    .
    OFA
  • Jun 2019
    Congrats
    OFA
     team on
    First Place
     of
    Low-Power Image Recognition Challenge
     on
    classification, detection
     @
    IEEE
     
    2019
    .
    OFA
  • Jun 2020
    Congrats
    OFA
     team on
    First Place
     of
    Low-Power Computer Vision Challenge
     on
    CPU Detection, FPGA
     @
    CVPR
     
    2020
    .
    OFA
  • Jun 2019
    Congrats
    ProxylessNAS
     team on
    First Place
     of
    Visual Wake Words Challenge
     on
    TF-lite track
     @
    CVPR
     
    2019
    .
    ProxylessNAS
  • Nov 2023
    Congrats
    Zhijian Liu
     on
    2023 Rising Stars in Data Science
    .
  • Jan 2023
    Congrats
    Hanrui Wang
     on
    MARC 2023 Best Pitch Award
    .
  • Nov 2022
    Congrats
    Hanrui Wang
     on
    Gold Medal of ACM Student Research Competition
    .
  • Aug 2023
    Congrats
    Hanrui Wang
     on
    2023 Rising Stars in ML and Systems
    .
  • May 2023
    Congrats
    Song Han
     on
    2023 Sloan Research Fellowship
    .
  • May 2022
    Congrats
    Song Han
     on
    2022 Red Dot Award
    .
  • May 2021
    Congrats
    Song Han
     on
    2021 Samsung Global Research Outreach (GRO) Award
    .
  • May 2021
    Congrats
    Song Han
     on
    2021 NVIDIA Academic Partnership Award
    .
  • May 2020
    Congrats
    Song Han
     on
    2020 NVIDIA Academic Partnership Award
    .
  • May 2020
    Congrats
    Song Han
     on
    2020 IEEE "AIs 10 to Watch: The Future of AI" Award
    .
  • May 2020
    Congrats
    Song Han
     on
    2020 NSF CAREER Award
    .
  • May 2019
    Congrats
    Song Han
     on
    2019 MIT Technology Review list of 35 Innovators Under 35
    .
  • May 2020
    Congrats
    Song Han
     on
    2020 SONY Faculty Award
    .
  • May 2017
    Congrats
    Song Han
     on
    2017 SONY Faculty Award
    .
  • May 2018
    Congrats
    Song Han
     on
    2018 SONY Faculty Award
    .
  • May 2018
    Congrats
    Song Han
     on
    2018 Amazon Machine Learning Research Award
    .
  • May 2019
    Congrats
    Song Han
     on
    2019 Amazon Machine Learning Research Award
    .
  • May 2019
    Congrats
    Song Han
     on
    2019 Facebook Research Award
    .
  • Aug 2022
    Congrats
    Ligeng Zhu
     on
    the 2022 Qualcomm Innovation Fellowship
    .
  • Aug 2022
    Congrats
    Ji Lin
     on
    the 2022 Qualcomm Innovation Fellowship
    .
  • Aug 2023
    Congrats
    Zhijian Liu
     on
    2023 Rising Stars in ML and Systems
    .
  • May 2021
    Congrats
    Hanrui Wang
     on
    the 2021 Qualcomm Innovation Fellowship
    .
  • May 2021
    Congrats
    Han Cai
     on
    the 2021 Qualcomm Innovation Fellowship
    .
  • May 2021
    Congrats
    Zhijian Liu
     on
    the 2021 Qualcomm Innovation Fellowship
    .
  • May 2020
    Congrats
    Ji Lin
     on
    the 2020 Nvidia Graduate Fellowship Finalist
    .
  • May 2021
    Congrats
    Yujun Lin
     on
    the 2021 DAC Young Fellowship
    .
  • May 2022
    Congrats
    Hanrui Wang
     on
    2022 ACM Student Research Competition Award 1st Place
    .
  • Aug 2022
    Congrats
    Zhijian Liu
     on
    the 2022 MIT Ho-Ching and Han-Ching Fund Award
    .
  • May 2021
    Congrats
    Yujun Lin
     on
    the 2021 Qualcomm Innovation Fellowship
    .
  • May 2020
    Congrats
    Hanrui Wang
     on
    the 2020 Nvidia Graduate Fellowship Finalist
    .
  • May 2020
    Congrats
    Hanrui Wang
     on
    the 2021 Analog Devices Outstanding Student Designer Award
    .
  • May 2020
    Congrats
    Hanrui Wang
     on
    the 2020 DAC Young Fellowship
    .
  • Aug 2018
    Congrats
    Yujun Lin
     on
    the 2018 Robert J. Shillman Fellowship
    .
  • Jun 2023
    Congrats
    Song HanEIE
     team
     on
    Top 5 cited papers in 50 years of ISCA
     of
     
    .
    EIE
  • May 2017
    Congrats
    Song Han
     team
     on
    Best Paper Award
     of
    FPGA 2017
     
    .
  • May 2016
    Congrats
    Song Han
     team
     on
    Best Paper Award
     of
    ICLR 2016
     
    .
  • Jul 2023
    Congrats
    Hanrui WangSpAtten
     team
     on
    the Best University Demo Award
     of
    DAC 2023
     
    for “An Energy-Scalable Transformer Accelerator Supporting Adaptive Model Configuration and Word Elimination” in collaboration with Anantha Chandrakasan’s team
    .
    SpAtten
  • May 2023
    Congrats
    Wei-Chen Wang
     team
     on
    the 2023 NSF Athena AI Institute Best Poster Award rank #1
     of
     
    .
  • May 2022
    Congrats
    Hanrui Wang
     team
     on
    the 2022 NSF AI Institute Best Poster Award rank #1
     of
     
    .
  • Dec 2020
    Congrats
    Hanrui Wang
     team
     on
    the Young Fellow Best Presentation Award
     of
    DAC 2020
     
    .
  • Oct 2021
    Congrats
    Wei-Chen Wang
     team
     on
    the Best Paper Award
     of
    IEEE NVMSA 2021
     
    .
  • Oct 2019
    Congrats
    Wei-Chen Wang
     team
     on
    the Best Paper Award
     of
    ACM/IEEE CODES+ISSS 2019
     
    .
  • Mar 2024
    A new blog post
    Patch Conv: Patch Convolution to Avoid Large GPU Memory Usage of Conv2D
     is published.
    In this blog, we introduce Patch Conv to reduce memory footprint when generating high-resolution images. PatchConv significantly cuts down the memory usage by over 2.4× compared to existing PyTorch implementation. Code: https://github.com/mit-han-lab/patch_conv
  • Mar 2024
    A new blog post
    TinyChat: Visual Language Models & Edge AI 2.0
     is published.
    Explore the latest advancement in TinyChat and AWQ – the integration of Visual Language Models (VLM) on the edge! The exciting advancements in VLM allows LLMs to comprehend visual inputs, enabling seamless image understanding tasks like caption generation, question answering, and more. With the latest release, TinyChat now supports leading VLMs such as VILA, which can be easily quantized with AWQ, empowering users with seamless experience for image understanding tasks.
  • Nov 2022
    A new blog post
    On-Device Training Under 256KB Memory
     is published.
    In MCUNetV3, we enable on-device training under 256KB SRAM and 1MB Flash, using less than 1/1000 memory of PyTorch while matching the accuracy on the visual wake words application. It enables the model to adapt to newly collected sensor data and users can enjoy customized services without uploading the data to the cloud thus protecting privacy.
  • May 2020
    A new blog post
    Efficiently Understanding Videos, Point Cloud and Natural Language on NVIDIA Jetson Xavier NX
     is published.
    Thanks to NVIDIA’s amazing deep learning eco-system, we are able to deploy three applications on Jetson Xavier NX soon after we receive the kit, including efficient video understanding with Temporal Shift Module (TSM, ICCV’19), efficient 3D deep learning with Point-Voxel CNN (PVCNN, NeurIPS’19), and efficient machine translation with hardware-aware transformer (HAT, ACL’20).
  • Jul 2020
    A new blog post
    Reducing the carbon footprint of AI using the Once-for-All network
     is published.
    “The aim is smaller, greener neural networks,” says Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science. “Searching efficient neural network architectures has until now had a huge carbon footprint. But we reduced that footprint by orders of magnitude with these new methods.”
  • Sep 2023
    A new blog post
    TinyChat: Large Language Model on the Edge
     is published.
    Running large language models (LLMs) on the edge is of great importance. In this blog, we introduce TinyChat, an efficient and lightweight system for LLM deployment on the edge. It runs Meta's latest LLaMA-2 model at 30 tokens / second on NVIDIA Jetson Orin and can easily support different models and hardware.
  • Oct 2023
    Song Han
     presented "
    Efficient Vision Transformer
    " at
    the ICCV 2023 Workshop on Resource-Efficient Deep Learning for Computer Vision (RCV'23)
    .
    VideoSlidesMediaEvent
  • Oct 2023
    Song Han
     presented "
    Quantization for Foundation Models
    " at
    the ICCV 2023 Workshop on Low-Bit Quantized Neural Networks
    .
    VideoSlidesMediaEvent
  • Sep 2023
    Song Han
     presented "
    TinyChat for On-device LLM
    " at
    the IAP MIT Workshop on the Future of AI and Cloud Computing Applications and Infrastructure
    .
    VideoSlidesMediaEvent
  • Jun 2023
    Song Han
     presented "
    Efficient Deep Learning Computing with Sparsity
    " at
    CVPR Workshop on Efficient Computer Vision
    .
    VideoSlidesMediaEvent
  • Nov 2021
    Song Han
     presented "
    TinyML and Efficient Deep Learning for Automotive Applications
    " at
    Hyundai Motor Group Developers Conference
    .
    VideoSlidesMediaEvent
  • Nov 2021
    Song Han
     presented "
    Plenary: Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    TinyML Technical Forum Asia
    .
    VideoSlidesMediaEvent
  • Oct 2021
    Song Han
     presented "
    TinyML Techniques for Greener, Faster and Sustainable AI
    " at
    IBM IEEE CAS/EDS – AI Compute Symposium
    .
    VideoSlidesMediaEvent
  • Oct 2021
    Song Han
     presented "
    Challenges and Directions of Low-Power Computer Vision
    " at
    International Conference on Computer Vision (ICCV) Workshop Panel
    .
    VideoSlidesMediaEvent
  • Aug 2021
    Song Han
     presented "
    AutoML for Tiny Machine Learning
    " at
    AutoML Workshop at Knowledge Discovery and Data Mining (KDD) Conference
    .
    VideoSlidesMediaEvent
  • Aug 2021
    Song Han
     presented "
    Frontiers of AI Accelerators: Technologies, Circuits and Applications
    " at
    Hong Kong University of Science and Technology, AI Chip Center for Emerging Smart Systems
    .
    VideoSlidesMediaEvent
  • Aug 2021
    Song Han
     presented "
    Putting AI On A Diet: TinyML and Efficient Deep Learning
    " at
    Silicon Research Cooperation (SRC) AI Hardware E-Workshops
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Song Han
     presented "
    NAAS: Neural-Accelerator Architecture Search
    " at
    4th International Workshop on AI-assisted Design for Architecture at ISCA
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Song Han
     presented "
    Machine Learning for Analog and Digital Design
    " at
    VLSI symposia workshop on AI/Machine Learning for Circuit Design and Optimization
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    Efficient Deep Learning for Computer Vision Workshop at CVPR
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    MLOps World – Machine Learning in Production
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    Shanghai Jiaotong University
    .
    VideoSlidesMediaEvent
  • May 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    Apple’s On-Device ML Workshop
    .
    VideoSlidesMediaEvent
  • Apr 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    MLSys’21 On-Device Intelligence Workshop
    .
    VideoSlidesMediaEvent
  • Apr 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    ISQED’21 Embedded Tutorials
    .
    VideoSlidesMediaEvent
  • Jan 2021
    Song Han
     presented "
    Efficient AI: Reducing the Carbon Footprint of AI in the Internet of Things (IoT)
    " at
    MIT ILP Japan conference
    .
    VideoSlidesMediaEvent
  • Nov 2020
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    MIT ILP webinar session on low power/edge/efficient computing
    .
    VideoSlidesMediaEvent
  • Apr 2020
    Song Han
     presented "
    Once-for-All: Train One Network and Specialize it for Efficient Deployment
    " at
    TinyML Webinar
    .
    VideoSlidesMediaEvent

Contact

Email: FirstnameLastname [at] mit [dot] edu

Office: 38-344. I’m fortunate to be at Prof. Paul Penfield and Prof. Paul E. Grey's former office.

If you work on efficient LLM, VLM, GenAI and are interested in joining my lab, please fill in the recruiting form. I do not reply inquiry emails if the recruiting form is incomplete.
PhD applicants: select "ML+System" track in the MIT PhD application system.