OpML '19: 2019 USENIX Conference on Operational Machine Learning: Full Schedule

10:30am PDT

MLOp Lifecycle Scheme for Vision-based Inspection Process in Manufacturing

Recent advances in machine learning and the proliferation of edge computing have enabled manufacturing industry to integrate machine learning into its operation to boost productivity. In addition to building high performing machine learning models, stakeholders and infrastructures within the industry should be taken into an account in building an operational lifecycle. In this paper, a practical machine learning operation scheme to build the vision inspection process is proposed, which is mainly motivated from field experiences in applying the system in large scale corporate manufacturing plants. We evaluate our scheme in four defect inspection lines in production. The results show that deep neural network models outperform existing algorithms and the scheme is easily extensible to other manufacturing processes.

Speakers

Junsung Lim

Samsung Research

Hoejoo Lee

Samsung Research

Youngmin Won

Samsung Research

Hunje Yeon

Samsung Research

Monday May 20, 2019 10:30am - 10:50am PDT
Stevens Creek Room

Paper Presentation

10:30am PDT

Opportunities and Challenges Of Machine Learning Accelerators In Production

The rise of deep learning has resulted in tremendous demand for compute power, with the FLOPS required for leading machine learning (ML) research doubling roughly every 3.5 months since 2012. This increase in demand for compute has coincided with the end of Moore’s Law.

As a result, major industry players such as NVIDIA, Intel, and Google have invested in ML accelerators that are purpose built for deep learning workloads.

ML accelerators present many opportunities and challenges in production environments. This paper discusses some high level observations from experience internally at Google.

Speakers

Rajagopal Ananthanarayanan

Google, Inc.

Peter Brandt

Google, Inc.

Maheswaran Sathiamoorthy

Google, Inc.

Manasi Joshi

Google, Inc.

Monday May 20, 2019 10:30am - 10:50am PDT
Winchester Room

Paper Presentation

10:50am PDT

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft

The application of deep learning models presents significant improvement to many Microsoft services and products. In this paper, we introduce our experience and methodology of developing and applying the DeepCPU library for serving deep learning models in production at large scale with remarkable latency improvement and infrastructure cost reduction. We describe two ways to use the library, through customized optimization or framework integration, targeting different scenarios.

Speakers

Minjia Zhang

Microsoft AI and Research

Samyam Rajbandari

Microsoft AI and Research

Wenhan Wang

Microsoft AI and Research

Elton Zheng

Microsoft

Olatunji Ruwase

Microsoft AI and Research

Jeff Rasley

Microsoft AI and Research

Jason Li

Microsoft

Junhua Wang

Microsoft

Yuxiong He

Microsoft

Monday May 20, 2019 10:50am - 11:10am PDT
Winchester Room

Paper Presentation

11:50am PDT

Shooting the Moving Target: Machine Learning in Cybersecurity

We introduce a platform used to productionize machine learning models for detecting cyberthreats. To keep up with a diverse and ever-evolving threat landscape, it is of paramount importance to seamlessly iterate over the two pillars of machine learning: data and models. To satisfy this requirement, the introduced platform is modular, extensible, and automates the continuous improvement of the detection models. The platform counts more than 1000 successful model deployments at over 30 production environments.

Speakers

Ankit Arun

PatternEx

Ignacio Arnaldo

PatternEx

Monday May 20, 2019 11:50am - 12:10pm PDT
Stevens Creek Room

Paper Presentation

12:10pm PDT

Deep Learning Inference Service at Microsoft

This paper introduces the Deep Learning Inference Service, an online production service at Microsoft for ultra-low-latency deep neural network model inference. We present the system architecture and deep dive into core concepts such as intelligent model placement, heterogeneous resource management, resource isolation, and efficient routing. We also present production scale and performance numbers.

Speakers

Jonathan Soifer

Microsoft

Jason Li

Microsoft

Mingqin Li

Microsoft

Mingqin Li is the software engineering manager at Microsoft, who leads Bing's deep learning platform. Low latency, large scale, and highly efficient deep learning vector search service are developed for various scenarios like web search, similar image search, question-and-answering... Read More →

Jeffrey Zhu

Microsoft

Jeffrey Zhu is a program manager at Microsoft who drives the development of Bing's deep learning platform. This platform powers some of Bing's most innovative features, such as machine reading comprehension and visual search. It serves millions of deep learning model inferences per... Read More →

Yingnan Li

Microsoft

Yuxiong He

Microsoft

Elton Zheng

Microsoft

Adi Oltean

Microsoft

Maya Mosyak

Microsoft

Chris Barnes

Microsoft

Thomas Liu

Microsoft

Junhua Wang

Microsoft

Monday May 20, 2019 12:10pm - 12:30pm PDT
Stevens Creek Room

Paper Presentation

1:50pm PDT

Towards Taming the Resource and Data Heterogeneity in Federated Learning

Machine learning model training often require data from multiple parties. However, in some cases, data owners cannot or are not willing to share their data due to legal or privacy constraints but would still like to benefit from training a model jointly with multiple parties. To this end, federated learning (FL) has emerged as an alternative way to do collaborative model training without sharing the training data. Such collaboration leads to more accurate and performant models than any party owning a partial set of all the data sources could hope to learn in isolation.

In this paper, we study the impact of resource (e.g., CPU, memory, and network resources) and data (e.g., training dataset sizes) heterogeneity on the training time of FL. Then, we discuss the research problems and their challenges involved in taming such resource and data heterogeneity in FL systems.

Speakers

Zheng Chai

George Mason University

Hannan Fayyaz

York University

Zeshan Fayyaz

Ryerson University

Ali Anwar

IBM Research–Almaden

Yi Zhou

IBM Research–Almaden

Nathalie Baracaldo

IBM Research–Almaden

Heiko Ludwig

IBM Research–Almaden

Yue Cheng

George Mason University

Monday May 20, 2019 1:50pm - 2:10pm PDT
Stevens Creek Room

Paper Presentation

2:50pm PDT

MPP: Model Performance Predictor

Operations is a key challenge in the domain of machine learning pipeline deployments involving monitoring and management of real-time prediction quality. Typically, metrics like accuracy, RMSE etc., are used to track the performance of models in deployment. However, these metrics cannot be calculated in production due to the absence of labels. We propose using an ML algorithm - Model Performance Predictor, to track the performance of the models in deployment. We argue that an ensemble of such metrics can be used to create a score representing the prediction quality in production. This in turn facilitates formulation and customization of ML alerts, that can be escalated by an operations team to the data science team. Such a score automates monitoring and enables ML deployments at scale.

Speakers

Sindhu Ghanta

ParallelM

Sriram Subramanian

ParallelM

Lior Khermosh

ParallelM

Harshil Shah

ParallelM

Yakov Goldberg

ParallelM

Swaminathan Sundararaman

ParallelM

Drew Roselli

ParallelM

Nisha Talagala

ParallelM

Monday May 20, 2019 2:50pm - 3:10pm PDT
Stevens Creek Room

Paper Presentation

4:00pm PDT

KnowledgeNet: Disaggregated and Distributed Training and Serving of Deep Neural Networks

Deep Neural Networks (DNNs) have a significant impact on numerous applications, such as reinforcement learning, object detection, video processing, virtual/augmented reality, etc. The ever-changing environment forces the DNN models to evolve, accordingly. Also, the transition from the cloud-only to edge-cloud paradigm has made the deployment and training of these models challenging. Addressing these challenges requires new methods and systems for continuous training and distribution of these models in a heterogeneous environment. In this paper, we propose KnowledgeNet (KN), which is a new architectural technique for a simple disaggregation and distribution of the neural networks for both training and serving. Using KN, DNNs can be partitioned into multiple small blocks and be deployed on a distributed set of computational nodes. Also, KN utilizes the knowledge transfer technique to provide small scale models with high accuracy in edge scenarios with limited resources. Preliminary results are showing that our new method can ensure a state-of-the-art accuracy for a DNN model while being disaggregated among multiple workers. Also, by using knowledge transfer technique, we can compress the model by 62% for deployment, while maintaining the same accuracy.

Speakers

Saman Biookaghazadeh

Arizona State University

Yitao Chen

Arizona State University

Kaiqi Zhao

Arizona State University

Ming Zhao

Arizona State University

Monday May 20, 2019 4:00pm - 4:20pm PDT
Stevens Creek Room

Paper Presentation

4:00pm PDT

Low-latency Job Scheduling with Preemption for the Development of Deep Learning

Efficient job scheduling of trial-and-error (TE) jobs is a challenging problem in deep learning projects. Unfortunately, existing job schedulers to date do not feature well-balanced scheduling for the mixture of TE and best-effort (BE) jobs, or they can handle the mixture in limited situations at most. To fill in this niche, we present an algorithm that efficiently schedules both TE and BE jobs by selectively preempting the BE jobs that can be, when the time comes, resumed without much delay. In our simulation study with synthetic workloads, we were able to reduce the 95th percentile of the slowdown rates for the TE jobs in the standard FIFO strategy by 96.6% while compromising the median of the BE slowdown rates by only 18.0% and the 95th percentile by only 23.9%.

Speakers

Hidehito Yabuuchi

The University of Tokyo

Daisuke Taniwaki

Preferred Networks, Inc.

Shingo Omura

Preferred Networks, Inc.

Monday May 20, 2019 4:00pm - 4:20pm PDT
Lawrence/San Tomas/Lafayette Rooms

Paper Presentation

4:20pm PDT

Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform

Large organizations rely increasingly on continuous ML pipelines in order to keep machine-learned models continuously up-to-date with respect to data. In this scenario, disruptions in the pipeline can increase model staleness and thus degrade the quality of downstream services supported by these models. In this paper we describe the operation of continuous pipelines in the Tensorflow Extended (TFX) platform that we developed and deployed at Google. We present the main mechanisms in TFX to support this type of pipelines in production and the lessons learned from the deployment of the platform internally at Google.

Speakers

Denis Baylor

Google Research

Kevin Haas

Google Research

Konstantinos (Gus) Katsiapis

Google Research

Sammy Leong

Google Research

Rose Liu

Google Research

Clemens Menwald

Google Research

Hui Miao

Google Research

Neoklis Polyzotis

Google Research

Mitchell Trott

Google Research

Martin Zinkevich

Google Research

Monday May 20, 2019 4:20pm - 4:40pm PDT
Stevens Creek Room

Paper Presentation

4:20pm PDT

tensorflow-tracing: A Performance Tuning Framework for Production

The growing popularity of Deep Neural Networks (DNN) within the mainstream \cite{gartnerhype} has had a rapid transformative effect on clusters and data centers.

DNN training jobs are becoming one of the largest tenants within clusters, and often take hours to weeks to complete; and even a slight performance improvement can save substantial runtime costs. Despite this fact, the DNN specific performance tuning tools are yet to keep up with the needs of the new changes in production environments.

On one hand, the existing application-agnostic resource-level tools such as top, Nvidia Nsight (for GPU utilization), IPM (for MPI network monitoring) are too limited to predict or explain the behavior and performance of a job accurately. In DNN applications, there exists a complex relationship among resources. Even though measuring coarse metrics such as bandwidth, latency, and GPU/CPU utilization can draw an overall picture of cluster performance, these metrics are not easily translatable to application-level metrics and do not provide actionable insights on how to handle performance bottlenecks.

On the other hand, the short list of application-aware tools, such as MLModelScope \cite{dakkak2018mlmodelscope}, TensorBoard \cite{tensorboard}, and \texttt{tf.RunOptions} \cite{tensorflow-trace}, while able to provide actionable insights, are mainly designed for the need of application developers and are not intended for production use. Such tools require substantial modification to applications, and early planning as to what, when and how data should be collected.

In this article, we introduce \texttt{tensorflow-tracing}~to fill the gap between these two classes of performance tuning tools. To achieve this goal, \texttt{tensorflow-tracing}~addresses the following technical challenges:

\begin{itemize}[noitemsep,topsep=0pt,leftmargin=*] \item Collecting the application-level runtime metrics, such as the timing of each operation or the iteration time, needs explicitly expressed in the training job source code. To makes it possible to trace ML jobs without requiring any application modification, \texttt{tensorflow-tracing}~ \textit{monkeypatches} the \texttt{tensorflow} library at the system level. \item Collecting some metrics is expensive and have a significant overhead on the runtime. \texttt{tensorflow-tracing}~treats metrics differently; it collects low-overhead metrics automatically, while expensive ones are collected on demand through an admin interface. \item There is no easy way to exchange runtime metrics among users and admins --- \texttt{tensorflow-tracing}~facilities this through a portable file format and supporting tools to explore these metrics offline. \end{itemize}

The \texttt{tensorflow-tracing}~is publicly available under \texttt{Apache-2.0} license\footnote{\url{https://github.com/xldrx/tensorflow-tracer}}. It supports native TensorFlow \cite{tensorflow}, Horovod \cite{horovod}, and IBM PowerAI \cite{powerai} applications.

Speakers