Loading…
Paper Presentation [clear filter]
Monday, May 20
 

10:30am PDT

MLOp Lifecycle Scheme for Vision-based Inspection Process in Manufacturing
Recent advances in machine learning and the proliferation of edge computing have enabled manufacturing industry to integrate machine learning into its operation to boost productivity. In addition to building high performing machine learning models, stakeholders and infrastructures within the industry should be taken into an account in building an operational lifecycle. In this paper, a practical machine learning operation scheme to build the vision inspection process is proposed, which is mainly motivated from field experiences in applying the system in large scale corporate manufacturing plants. We evaluate our scheme in four defect inspection lines in production. The results show that deep neural network models outperform existing algorithms and the scheme is easily extensible to other manufacturing processes.

Speakers
JL

Junsung Lim

Samsung Research
HL

Hoejoo Lee

Samsung Research
YW

Youngmin Won

Samsung Research
HY

Hunje Yeon

Samsung Research


Monday May 20, 2019 10:30am - 10:50am PDT
Stevens Creek Room

10:30am PDT

Opportunities and Challenges Of Machine Learning Accelerators In Production
The rise of deep learning has resulted in tremendous demand for compute power, with the FLOPS required for leading machine learning (ML) research doubling roughly every 3.5 months since 2012. This increase in demand for compute has coincided with the end of Moore’s Law.

As a result, major industry players such as NVIDIA, Intel, and Google have invested in ML accelerators that are purpose built for deep learning workloads.

ML accelerators present many opportunities and challenges in production environments. This paper discusses some high level observations from experience internally at Google.

Speakers

Monday May 20, 2019 10:30am - 10:50am PDT
Winchester Room

10:50am PDT

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft
The application of deep learning models presents significant improvement to many Microsoft services and products. In this paper, we introduce our experience and methodology of developing and applying the DeepCPU library for serving deep learning models in production at large scale with remarkable latency improvement and infrastructure cost reduction. We describe two ways to use the library, through customized optimization or framework integration, targeting different scenarios.

Speakers
MZ

Minjia Zhang

Microsoft AI and Research
SR

Samyam Rajbandari

Microsoft AI and Research
WW

Wenhan Wang

Microsoft AI and Research
EZ

Elton Zheng

Microsoft
OR

Olatunji Ruwase

Microsoft AI and Research
JR

Jeff Rasley

Microsoft AI and Research
JL

Jason Li

Microsoft
JW

Junhua Wang

Microsoft
YH

Yuxiong He

Microsoft


Monday May 20, 2019 10:50am - 11:10am PDT
Winchester Room

11:50am PDT

Shooting the Moving Target: Machine Learning in Cybersecurity
We introduce a platform used to productionize machine learning models for detecting cyberthreats. To keep up with a diverse and ever-evolving threat landscape, it is of paramount importance to seamlessly iterate over the two pillars of machine learning: data and models. To satisfy this requirement, the introduced platform is modular, extensible, and automates the continuous improvement of the detection models. The platform counts more than 1000 successful model deployments at over 30 production environments.

Speakers
AA

Ankit Arun

PatternEx


Monday May 20, 2019 11:50am - 12:10pm PDT
Stevens Creek Room

12:10pm PDT

Deep Learning Inference Service at Microsoft
This paper introduces the Deep Learning Inference Service, an online production service at Microsoft for ultra-low-latency deep neural network model inference. We present the system architecture and deep dive into core concepts such as intelligent model placement, heterogeneous resource management, resource isolation, and efficient routing. We also present production scale and performance numbers.

Speakers
JL

Jason Li

Microsoft
ML

Mingqin Li

Microsoft
Mingqin Li is the software engineering manager at Microsoft, who leads Bing's deep learning platform. Low latency, large scale, and highly efficient deep learning vector search service are developed for various scenarios like web search, similar image search, question-and-answering... Read More →
JZ

Jeffrey Zhu

Microsoft
Jeffrey Zhu is a program manager at Microsoft who drives the development of Bing's deep learning platform. This platform powers some of Bing's most innovative features, such as machine reading comprehension and visual search. It serves millions of deep learning model inferences per... Read More →
YL

Yingnan Li

Microsoft
YH

Yuxiong He

Microsoft
EZ

Elton Zheng

Microsoft
AO

Adi Oltean

Microsoft
MM

Maya Mosyak

Microsoft
CB

Chris Barnes

Microsoft
TL

Thomas Liu

Microsoft
JW

Junhua Wang

Microsoft


Monday May 20, 2019 12:10pm - 12:30pm PDT
Stevens Creek Room

1:50pm PDT

Towards Taming the Resource and Data Heterogeneity in Federated Learning
Machine learning model training often require data from multiple parties. However, in some cases, data owners cannot or are not willing to share their data due to legal or privacy constraints but would still like to benefit from training a model jointly with multiple parties. To this end, federated learning (FL) has emerged as an alternative way to do collaborative model training without sharing the training data. Such collaboration leads to more accurate and performant models than any party owning a partial set of all the data sources could hope to learn in isolation.

In this paper, we study the impact of resource (e.g., CPU, memory, and network resources) and data (e.g., training dataset sizes) heterogeneity on the training time of FL. Then, we discuss the research problems and their challenges involved in taming such resource and data heterogeneity in FL systems.

Speakers
ZC

Zheng Chai

George Mason University
HF

Hannan Fayyaz

York University
ZF

Zeshan Fayyaz

Ryerson University
AA

Ali Anwar

IBM Research–Almaden
YZ

Yi Zhou

IBM Research–Almaden
NB

Nathalie Baracaldo

IBM Research–Almaden
HL

Heiko Ludwig

IBM Research–Almaden
YC

Yue Cheng

George Mason University


Monday May 20, 2019 1:50pm - 2:10pm PDT
Stevens Creek Room

2:50pm PDT

MPP: Model Performance Predictor
Operations is a key challenge in the domain of machine learning pipeline deployments involving monitoring and management of real-time prediction quality. Typically, metrics like accuracy, RMSE etc., are used to track the performance of models in deployment. However, these metrics cannot be calculated in production due to the absence of labels. We propose using an ML algorithm - Model Performance Predictor, to track the performance of the models in deployment. We argue that an ensemble of such metrics can be used to create a score representing the prediction quality in production. This in turn facilitates formulation and customization of ML alerts, that can be escalated by an operations team to the data science team. Such a score automates monitoring and enables ML deployments at scale.

Speakers

Monday May 20, 2019 2:50pm - 3:10pm PDT
Stevens Creek Room

4:00pm PDT

KnowledgeNet: Disaggregated and Distributed Training and Serving of Deep Neural Networks
Deep Neural Networks (DNNs) have a significant impact on numerous applications, such as reinforcement learning, object detection, video processing, virtual/augmented reality, etc. The ever-changing environment forces the DNN models to evolve, accordingly. Also, the transition from the cloud-only to edge-cloud paradigm has made the deployment and training of these models challenging. Addressing these challenges requires new methods and systems for continuous training and distribution of these models in a heterogeneous environment. In this paper, we propose KnowledgeNet (KN), which is a new architectural technique for a simple disaggregation and distribution of the neural networks for both training and serving. Using KN, DNNs can be partitioned into multiple small blocks and be deployed on a distributed set of computational nodes. Also, KN utilizes the knowledge transfer technique to provide small scale models with high accuracy in edge scenarios with limited resources. Preliminary results are showing that our new method can ensure a state-of-the-art accuracy for a DNN model while being disaggregated among multiple workers. Also, by using knowledge transfer technique, we can compress the model by 62% for deployment, while maintaining the same accuracy.

Speakers
SB

Saman Biookaghazadeh

Arizona State University
YC

Yitao Chen

Arizona State University
KZ

Kaiqi Zhao

Arizona State University
MZ

Ming Zhao

Arizona State University


Monday May 20, 2019 4:00pm - 4:20pm PDT
Stevens Creek Room

4:00pm PDT

Low-latency Job Scheduling with Preemption for the Development of Deep Learning
Efficient job scheduling of trial-and-error (TE) jobs is a challenging problem in deep learning projects. Unfortunately, existing job schedulers to date do not feature well-balanced scheduling for the mixture of TE and best-effort (BE) jobs, or they can handle the mixture in limited situations at most. To fill in this niche, we present an algorithm that efficiently schedules both TE and BE jobs by selectively preempting the BE jobs that can be, when the time comes, resumed without much delay. In our simulation study with synthetic workloads, we were able to reduce the 95th percentile of the slowdown rates for the TE jobs in the standard FIFO strategy by 96.6% while compromising the median of the BE slowdown rates by only 18.0% and the 95th percentile by only 23.9%.

Speakers
HY

Hidehito Yabuuchi

The University of Tokyo
DT

Daisuke Taniwaki

Preferred Networks, Inc.
SO

Shingo Omura

Preferred Networks, Inc.


Monday May 20, 2019 4:00pm - 4:20pm PDT
Lawrence/San Tomas/Lafayette Rooms

4:20pm PDT

Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform
Large organizations rely increasingly on continuous ML pipelines in order to keep machine-learned models continuously up-to-date with respect to data. In this scenario, disruptions in the pipeline can increase model staleness and thus degrade the quality of downstream services supported by these models. In this paper we describe the operation of continuous pipelines in the Tensorflow Extended (TFX) platform that we developed and deployed at Google. We present the main mechanisms in TFX to support this type of pipelines in production and the lessons learned from the deployment of the platform internally at Google.

Speakers
DB

Denis Baylor

Google Research
KH

Kevin Haas

Google Research
SL

Sammy Leong

Google Research
RL

Rose Liu

Google Research
CM

Clemens Menwald

Google Research
HM

Hui Miao

Google Research
NP

Neoklis Polyzotis

Google Research
MT

Mitchell Trott

Google Research
MZ

Martin Zinkevich

Google Research


Monday May 20, 2019 4:20pm - 4:40pm PDT
Stevens Creek Room

4:20pm PDT

tensorflow-tracing: A Performance Tuning Framework for Production
The growing popularity of Deep Neural Networks (DNN) within the mainstream \cite{gartnerhype} has had a rapid transformative effect on clusters and data centers.

DNN training jobs are becoming one of the largest tenants within clusters, and often take hours to weeks to complete; and even a slight performance improvement can save substantial runtime costs. Despite this fact, the DNN specific performance tuning tools are yet to keep up with the needs of the new changes in production environments.

On one hand, the existing application-agnostic resource-level tools such as top, Nvidia Nsight (for GPU utilization), IPM (for MPI network monitoring) are too limited to predict or explain the behavior and performance of a job accurately. In DNN applications, there exists a complex relationship among resources. Even though measuring coarse metrics such as bandwidth, latency, and GPU/CPU utilization can draw an overall picture of cluster performance, these metrics are not easily translatable to application-level metrics and do not provide actionable insights on how to handle performance bottlenecks.

On the other hand, the short list of application-aware tools, such as MLModelScope \cite{dakkak2018mlmodelscope}, TensorBoard \cite{tensorboard}, and \texttt{tf.RunOptions} \cite{tensorflow-trace}, while able to provide actionable insights, are mainly designed for the need of application developers and are not intended for production use. Such tools require substantial modification to applications, and early planning as to what, when and how data should be collected.

In this article, we introduce \texttt{tensorflow-tracing}~to fill the gap between these two classes of performance tuning tools. To achieve this goal, \texttt{tensorflow-tracing}~addresses the following technical challenges:

\begin{itemize}[noitemsep,topsep=0pt,leftmargin=*] \item Collecting the application-level runtime metrics, such as the timing of each operation or the iteration time, needs explicitly expressed in the training job source code. To makes it possible to trace ML jobs without requiring any application modification, \texttt{tensorflow-tracing}~ \textit{monkeypatches} the \texttt{tensorflow} library at the system level. \item Collecting some metrics is expensive and have a significant overhead on the runtime. \texttt{tensorflow-tracing}~treats metrics differently; it collects low-overhead metrics automatically, while expensive ones are collected on demand through an admin interface. \item There is no easy way to exchange runtime metrics among users and admins --- \texttt{tensorflow-tracing}~facilities this through a portable file format and supporting tools to explore these metrics offline. \end{itemize}

The \texttt{tensorflow-tracing}~is publicly available under \texttt{Apache-2.0} license\footnote{\url{https://github.com/xldrx/tensorflow-tracer}}. It supports native TensorFlow \cite{tensorflow}, Horovod \cite{horovod}, and IBM PowerAI \cite{powerai} applications.

Speakers
SH

Sayed Hadi Hashemi

University of Illinois at Urbana-Champaign
BR

Benjamin Rabe

University of Illinois at Urbana-Champaign
KC

Kuan-Yen Chou

University of Illinois at Urbana-Champaign
SL

Simeng Liu

University of Illinois at Urbana-Champaign
VK

Volodymyr Kindratenko

University of Illinois at Urbana-Champaign
RH

Roy H Campbell

University of Illinois at Urbana-Champaign


Monday May 20, 2019 4:20pm - 4:40pm PDT
Lawrence/San Tomas/Lafayette Rooms

5:00pm PDT

Disdat: Bundle Data Management for Machine Learning Pipelines
Modern machine learning pipelines can produce hundreds of data artifacts (such as features, models, and predictions) throughout their lifecycle. During that time, data scientists need to reproduce errors, update features, re-train on specific data, validate / inspect outputs, and share models and predictions. Doing so requires the ability to publish, discover, and version those artifacts.

This work introduces Disdat, a system to simplify ML pipelines by addressing these data management challenges. Disdat is built on two core data abstractions: bundles and contexts. A bundle is a versioned, typed, immutable collection of data. A context is a sharable set of bundles that can exist on local and cloud storage environments. Disdat provides a bundle management API that we use to extend an existing workflow system to produce and consume bundles. This bundle-based approach to data management has simplified both authoring and deployment of our ML pipelines.

Speakers
SR

Sean Rowan

Intuit, Inc.
JL

Jonathan Lunt

Intuit, Inc.
TM

Theodore M. Wong

23andMe, Inc.
KY

Ken Yocum

Intuit, Inc.


Monday May 20, 2019 5:00pm - 5:20pm PDT
Lawrence/San Tomas/Lafayette Rooms

5:00pm PDT

Katib: A Distributed General AutoML Platform Based on Kubernetes
Automatic Machine Learning (AutoML) is a powerful mechanism to design and tune models. We present Katib, a scalable Kubernetes-native general AutoML platform that can support a range of AutoML algorithms including both hyper-parameter tuning and neural architecture search. The system is divided into separate components, encapsulated as micro-services. Each micro-service operates within a Kubernetes pod and communicates with others via well-defined APIs, thus allowing flexible management and scalable deployment at a minimal cost. Together with a powerful user interface, Katib provides a universal platform for researchers as well as enterprises to try, compare and deploy their AutoML algorithms, on any Kubernetes platform.

Speakers
JZ

Jinan Zhou

Cisco Systems
AV

Andrey Velichkevich

Cisco Systems
KP

Kirill Prosvirov

Cisco Systems
AG

Anubhav Grag

Cisco Systems
YO

Yuji Oshima

NTT Software Innovation Center
DD

Debo Dutta

Cisco Systems


Monday May 20, 2019 5:00pm - 5:20pm PDT
Stevens Creek Room

5:20pm PDT

TonY: An Orchestrator for Distributed Machine Learning Jobs
Training machine learning (ML) models on large datasets requires considerable computing power. To speed up training, it is typical to distribute training across several machines, often with specialized hardware like GPUs or TPUs. Managing a distributed training job is complex and requires dealing with resource contention, distributed configurations, monitoring, and fault tolerance. In this paper, we describe TonY, an open-source orchestrator for distributed ML jobs built at LinkedIn to address these challenges.

Speakers
AH

Anthony Hsu

LinkedIn
KH

Keqiu Hu

LinkedIn
AS

Arun Suresh

LinkedIn
ZZ

Zhe Zhang

LinkedIn


Monday May 20, 2019 5:20pm - 5:40pm PDT
Lawrence/San Tomas/Lafayette Rooms

5:40pm PDT

Stratum: A Serverless Framework for Lifecycle Management of Machine Learning based Data Analytics Tasks
With the proliferation of machine learning (ML) libraries and frameworks, and the programming languages that they use, along with operations of data loading, transformation, preparation and mining, ML model development is becoming a daunting task. Furthermore, with a plethora of cloud-based ML model development platforms, heterogeneity in hardware, increased focus on exploiting edge computing resources for low-latency prediction serving and often a lack of a complete understanding of resources required to execute ML workflows efficiently, ML model deployment demands expertise for managing the lifecycle of ML workflows efficiently and with minimal cost. To address these challenges, we propose an end-to-end data analytics, a serverless platform called Stratum. Stratum can deploy, schedule and dynamically manage data ingestion tools, live streaming apps, batch analytics tools, ML-as-a-service (for inference jobs), and visualization tools across the cloud-fog-edge spectrum. This paper describes the Stratum architecture highlighting the problems it resolves.

Speakers
AB

Anirban Bhattacharjee

Vanderbilt University
YB

Yogesh Barve

Vanderbilt University
SK

Shweta Khare

Vanderbilt University
SB

Shunxing Bao

Vanderbilt University
AG

Aniruddha Gokhale

Vanderbilt University
TD

Thomas Damiano

Lockheed Martin Advanced Technology Labs


Monday May 20, 2019 5:40pm - 6:00pm PDT
Stevens Creek Room

5:40pm PDT

Transfer Learning for Performance Modeling of Deep Neural Network System
Modern deep neural network (DNN) systems are highly configurable with large a number of options that significantly affect their non-functional behavior, for example inference time and energy consumption. Performance models allow to understand and predict the effects of such configuration options on system behavior, but are costly to build because of large configuration spaces. Performance models from one environment cannot be transferred directly to another; usually models are rebuilt from scratch for different environments, for example different hardware. Recently, transfer learning methods have been applied to reuse knowledge from performance models trained in one environment in another. In this paper, we perform an empirical study to understand the effectiveness of different transfer learning strategies for building performance models of DNN systems. Our results show that transferring information on the most influential configuration options and their interactions is an effective way of reducing the cost to build performance models in new environments.

Speakers
MS

Md Shahriar Iqbal

University of South Carolina
LK

Lars Kotthoff

University of Wyoming
PJ

Pooyan Jamshidi

University of South Carolina


Monday May 20, 2019 5:40pm - 6:00pm PDT
Lawrence/San Tomas/Lafayette Rooms
 
Filter sessions
Apply filters to sessions.