Microsoft Research Introduces Not One, Not Two, But Four New AI Compilers

Parallelism, computation, memory, hardware acceleration and control flow are some of the capabilities addressed by the new compilers.

Jesus Rodriguez
Towards AI

--

Created Using Midjourney

I recently started an AI-focused educational newsletter, that already has over 160,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:

Compilers are seeing a renaissance in the era of generative AI. In the context of AI, a compiler is responsible for translating a neural network architecture into executable code in a specific hardware topology. Those two areas: model and hardware architectures, have been an explosion in innovation, regularly making AI compilers obsolete.

The challenges in AI compilation are many, from hardware acceleration to computation and memory efficiency. Microsoft Research has been at the forefront of the AI compiler research, and recently, they unveiled a quartet of cutting-edge AI compilers, each tailored to address specific challenges in the realm of deep neural networks (DNNs). The list includes the following compilers:

· Rammer: For parallelism

· Roller: For computation

· Welder: For memory

· Grinder: For control flow and hardware acceleration

Let’s dive into each one.

Rammer: Pioneering Parallel Hardware Utilization

Deep neural networks (DNNs) have become integral to various intelligence tasks, spanning image classification to natural language processing. To harness their power, a plethora of computing devices, including CPUs, GPUs, and specialized DNN accelerators, are employed. A critical factor influencing DNN computation efficiency is scheduling, the process that dictates the order of computational tasks on hardware. Conventional AI compilers often represent DNN computation as a data flow graph with nodes symbolizing DNN operators, scheduled to run on accelerators independently. This methodology, however, introduces significant scheduling overhead and underutilizes hardware resources.

Enter Rammer, a DNN compiler that envisions the scheduling space as a two-dimensional plane. Here, computational tasks are akin to bricks, with varied shapes and sizes. Rammer’s mission is to arrange these bricks snugly on the two-dimensional plane, akin to constructing a seamless wall. No gaps are allowed to optimize hardware utilization and execution speed. Rammer effectively acts as a compactor within this spatial domain, efficiently placing DNN program bricks on different computing units of the accelerator, thus mitigating runtime scheduling overhead. Additionally, Rammer introduces novel hardware-independent abstractions for computing tasks and hardware accelerators, broadening the scheduling space and enabling more efficient schedules.

Image Credit: Microsoft Research

Roller: Enhancing Computational Efficiency

Accelerators boasting parallel computing units and intricate memory hierarchies necessitate a systematic data transfer approach. Data must ascend through memory layers, partitioned into smaller bricks at each step, before reaching the top-level processor for computation. The challenge lies in partitioning and filling memory space with large bricks to optimize memory utilization and efficiency. The current approach employs machine learning for brick partitioning strategies, requiring numerous search steps evaluated on the accelerator. This lengthy process can take days or weeks to compile a full AI model.

Roller expedites compilation while maintaining optimal computation efficiency. At its core, the Roller embodies a unique concept akin to the operation of a road roller. This innovative system smoothly deposits high-dimensional tensor data onto a two-dimensional memory structure, much like skillfully tiling a floor. It does so with precision, discerning the ideal tile sizes based on the specific memory attributes. Simultaneously, Roller intelligently encapsulates the tensor shape to harmonize with the hardware nuances of the underlying accelerator. This strategic alignment significantly streamlines the compilation process by constraining the range of shape options, ultimately leading to highly efficient outcomes.

Image Credit: Microsoft Research

Welder: Streamlining Memory Access

As DNN models increasingly demand higher-fidelity data and faster computing cores in modern hardware accelerators, memory bandwidth bottlenecks have surfaced. To counter this, Welder, the deep learning compiler, comprehensively optimizes memory access efficiency in the end-to-end DNN model. The process involves multiple stages, where input data is divided into blocks that traverse different operators and memory layers. Welder transforms this process into an efficient assembly line, welding together different operators and data blocks, reducing memory access traffic at lower-level memory layers.

Image Credit: Microsoft Research

Grinder: Mastering Control Flow Execution

In AI computation, complex control logic sometimes accompanies data block movement. Current AI compilers predominantly focus on data flow execution efficiency, neglecting efficient support for control flow. Grinder bridges this gap by seamlessly integrating control flow into data flow, enabling efficient execution on accelerators. It unifies the representation of AI models through uTask, a novel abstraction, and leverages heuristic strategies to optimize control flow execution across hardware parallelism levels. The grinder efficiently moves control flow into device kernels, thereby optimizing performance across control flow boundaries.

Image Credit: Microsoft Research

In summary, Microsoft Research’s quartet of AI compilers — Rammer, Roller, Welder, and Grinder — pave the way for enhanced DNN workload optimization, memory access efficiency, and control flow execution on hardware accelerators, marking a significant leap forward in AI compiler technology.

--

--

CEO of IntoTheBlock, President of Faktory, President of NeuralFabric and founder of The Sequence , Lecturer at Columbia University, Wharton, Angel Investor...