Loading…
Monday, May 20 • 12:10pm - 12:30pm
Deep Learning Inference Service at Microsoft

Sign up or log in to save this to your schedule and see who's attending!

This paper introduces the Deep Learning Inference Service, an online production service at Microsoft for ultra-low-latency deep neural network model inference. We present the system architecture and deep dive into core concepts such as intelligent model placement, heterogeneous resource management, resource isolation, and efficient routing. We also present production scale and performance numbers.

Speakers
JL

Jason Li

Microsoft
ML

Mingqin Li

Microsoft
Mingqin Li is the software engineering manager at Microsoft, who leads Bing's deep learning platform. Low latency, large scale, and highly efficient deep learning vector search service are developed for various scenarios like web search, similar image search, question-and-answering... Read More →
JZ

Jeffrey Zhu

Microsoft
Jeffrey Zhu is a program manager at Microsoft who drives the development of Bing's deep learning platform. This platform powers some of Bing's most innovative features, such as machine reading comprehension and visual search. It serves millions of deep learning model inferences per... Read More →
YL

Yingnan Li

Microsoft
YH

Yuxiong He

Microsoft
EZ

Elton Zheng

Microsoft
AO

Adi Oltean

Microsoft
MM

Maya Mosyak

Microsoft
CB

Chris Barnes

Microsoft
TL

Thomas Liu

Microsoft
JW

Junhua Wang

Microsoft


Monday May 20, 2019 12:10pm - 12:30pm
Stevens Creek Room

Attendees (11)