Industry Workshops

Netflix Industry Workshop: Video Encoding at Scale
Monday, September 18
12:30-14:00, Room 209AB

We strive to deliver the best video quality possible to our members, no matter what they watch (action, drama, or animation) and where they watch (on a TV at home or on a cellular connection with a phone). In this session we discuss the practical video encoding challenges we face and how we address them through research and engineering. We will cover automated video quality assessment, our encoding pipeline in the cloud for optimal video encoding and recent compression results on royalty-free video codecs.

Anne Aaron is Director of Video Algorithms at Netflix and leads the team responsible for video analysis, processing and encoding in the Netflix cloud-based media pipeline.  Prior to Netflix, Anne had technical lead roles at Cisco, working on the software deployed with millions of Flip Video cameras, Dyyno, an early stage startup which developed a real-time peer-to-peer video distribution system, and Modulus Video, a broadcast video encoder company. During her Ph.D. studies at Stanford University, she was a member of the Image, Video and Multimedia Systems Laboratory, led by Prof. Bernd Girod. Her research was one of the pioneering work in the sub-field of Distributed Video Coding. Anne is originally from Manila, Philippines. She holds B.S. degrees in Physics and Computer Engineering from Ateneo de Manila University and M.S. and Ph.D. degrees in Electrical Engineering from Stanford University.

Jan De Cock has been working as Senior Research Scientist at Netflix since 2015. Before that, he worked as post-doctoral researcher and assistant professor at Ghent University in Belgium. He has performed research on a range of topics in the field of video compression, including scalable video coding and transcoding.

Ioannis Katsavounidis received the Diploma (B.S./M.S.) degree from the Aristotle University of Thessaloniki, Greece, in 1991 and the M.S. and Ph.D. degrees from the University of Southern California, Los Angeles, in 1992 and 1998 respectively, all in Electrical Engineering. From 1996 to 2000, he worked in Italy as an engineer for the High-Energy Physics Department at CalTech. From 2000 to 2007, he worked at InterVideo, Inc., in Fremont, CA, as Director of Software for advanced technologies, in charge of MPEG2, MPEG4 and H.264 video codec development. Between 2007 and 2008, he served as CTO of Cidana, a mobile multimedia software company in Shanghai, China, covering all aspects of DTV standards and codecs. From 2008 to 2015 he was an associate professor with the department of electrical and computer engineering at the University of Thessaly in Volos, Greece, teaching undergraduate and graduate courses in signals, controls, image processing, video compression and information theory. He is currently a senior research scientist at Netflix, working on video quality and video codec optimization problems. His research interests include image and video quality, compression and processing, information theory and software-hardware optimization of multimedia applications.

Zhi Li is with the Video Algorithms group at Netflix. His current interests focus on improving video streaming experience for consumers through understanding how human perceives video quality and applying that knowledge to encoding/streaming system design and optimization. He has broad interests in applying mathematics in solving real-world engineering problems. Zhi received his B. Eng. and M. Eng. degrees in Electrical Engineering, both from the National University of Singapore, Singapore, in 2005 and 2007, respectively, and a Ph.D. degree from Stanford University, California, USA in 2012. His doctoral work focuses on source and channel coding methods in multimedia networking problems. He was a recipient of the Best Student Paper Award at IEEE ICME for his work on cryptographic watermarking, and a co-recipient of Multimedia Communications Best Paper Award from IEEE Communications Society for a work on multimedia authentication.

Megha Manohara is a Senior Software Engineer in the Video Algorithms team at Netflix. She is passionate about the scalable productization of R&D on her team. Recently, she has helped bring dynamic-optimized streams to Netflix customers. Megha has her Master's in Electrical Engineering from University of California, Santa Barbara.

Wolfram Industry Workshop: Mathematica's Framework for Deep Neural Networks and Image Processing
Tuesday, September 19
12:30-14:00, Room 209AB

For three decades, Mathematica has defined the state of the art in technical computing and provided the principal computation environment for millions of innovators, educators, and students around the world. This tutorial addresses the new comprehensive framework for deep neural networks, which is the latest built-in addition to Mathematica. For those new to the underlying Wolfram Language, the tutorial begins with a brief overview of the concepts and scope of the software. A quick recapitulation of the deep neural network (DNN) basics will lead to a comprehensive introduction of the framework. The elegant design of the DNN implementation accommodates the needs of novices and experts alike, and the tight integration into the Wolfram Language provides a smooth image processing workflow. Topics covered include neural network layer types, network topology, loss functions, training and regularisation methods, GPU-implementation, data handling, and pre-trained networks. A collection of application examples in image processing will complement this 1.5 hour workshop.

Markus van Almsick is a German physicist who worked for the University of Illinois at Urbana-Champaign, USA, the Max-Planck Institute for Theoretical Biophysics in Frankfurt, Germany, and the Technical University of Eindhoven, the Netherlands. His area of expertise ranges from quantum gravity, molecular dynamics, and neural networks, to numerical mathematics and image processing. He has been a freelance consultant for Wolfram Research Inc., the maker of Mathematica and Wolfram | Alpha since 1988.

MathWorks Industry Workshop: Image Classification & Object Detection using Deep Learning with MATLAB
Tuesday, September 19
10:30 - 12:30, Room 208AB

We will use an object recognition/image classification example to teach how to apply deep learning to practical problems.  You will learn how to: import and manage large datasets; train, evaluate and compare different deep learning models; extract discriminative information from images, and; use transfer learning to fine-tune neural networks for new tasks. We will use the new MATLAB framework for deep learning and real-world examples including data used for ADAS and autonomous driving.

Jianghao Wang is the data science industry manager at MathWorks. In her role, Jianghao supports deep learning research and teaching at universities. Before joining MathWorks in 2016, Jianghao obtained her Ph.D. in Earth Sciences from University of Southern California and B.S. in Applied Mathematics from Nankai University.

Alex Taylor is a senior software engineer at MathWorks. Alex has been a developer for the Image Processing Toolbox since 2006. In his role, Alex works on deep learning features in the Neural Network Toolbox and the Image Processing Toolbox. Alex completed his M.S. and B.S. in Electrical Engineering at Virginia Tech.

Google Industry Workshop
Wednesday, September 20
12:30-14:00, Room 308

A Technical Overview of the Emergent AV1 Video Codec from the Alliance for Open Media
Google embarked on the WebM Project in 2010 to develop open source, royalty-­free video codecs designed specifically for media on the Web. The second generation codec released by the WebM project, VP9, ­is currently served by YouTube, and enjoys billions of views per day. Realizing the need for even greater compression efficiency and to cope with the ever-increasing demand for video on the web, the WebM team started an ambitious project to develop a next edition royalty-free codec AV1, in a consortium of major tech companies called the Alliance for Open Media. The goal of AV1 is to achieve a generational improvement in coding efficiency over VP9 at a practical hardware and software complexity, and is scheduled to be finalized the end of 2017. In this talk, we will provide a technical overview of the major prediction, transform and in-loop filtering tools under consideration in AV1. Preliminary results will be presented on standard test sets.

Debargha Mukherjee received his M.S./Ph.D. degrees in ECE from University of California Santa Barbara in 1999. Thereafter, through 2009 he was with HP Laboratories, conducting research on video/image coding and processing. Since 2010 he has been with Google Inc., where he is currently involved with open-source video codec development. Prior to that he was responsible for video quality control and 2D-3D conversion on YouTube. Debargha has authored/co-authored more than 80 papers on various signal processing topics, and holds more than 40 US patents, with several more pending. He has delivered many workshops and talks on Google's VPx line of codecs since 2012. He currently serves as an Associate Editor of the IEEE Trans. on Circuits and Systems for Video Technology and has previously served as Associate Editor of the IEEE Trans. on Image Processing; he is also a member of the IEEE Image, Video, and Multidimensional Signal Processing Technical Committee (IVMSP TC).

Recent Advances and Core Challenges for Image Compression with Deep Networks
Creating effective image compression algorithms using deep neural networks has emerged as an active and growing area of research. This talk will motivate the use of machine learning for lossy image compression and will briefly summarize recent research and results. We will focus on core challenges for learning compact image representations including backpropagation through quantized codes, controlling entropy while training, and spatially adaptive rate allocation. Existing methods will be organized and compared based on how they address these challenges, and we will discuss the resulting rate-distortion performance across several metrics.

David Minnen is a Senior Software Engineer in Machine Perception at Google Research where he focuses on deep learning for image and video compression. Previously, he developed user preference models for real-time frame selection for a "smart" camera application on Android, and he created a vision-based finger tracking and classification system for interactive gestural interfaces at Oblong Industries. David received his Ph.D. at the Georgia Institute of Technology in 2008 where his research on unsupervised time series analysis was funded by an NSF Graduate Research Fellowship.

Processing Pipelines for Immersive Video
Youtube added support for 360 degree (spherical) video in March 2015. Since then, we’ve continued to increase the immersivity of the video with additional support for VR video (360 degree + 3D), which we then complemented with support of spatial audio. In this talk, we will give an overview of the processing pipelines used to deliver this immersive content and the challenges we ran into making this content available and accessible to over 1 billion users. We will describe the quality analysis used to identify a better map projection, the details of a metadata specification that allows us to deliver video an arbitrary projection, and other research topics.

Balu Adsumilli did his PhD at University of California in 2005, on watermark-based error resilience in video communications. From 2005 to 2011, he was Sr. Research Scientist at Citrix Online, and from 2011-2016, he was Sr. Manager Advanced Software at GoPro, at both places developing algorithms for images/video enhancement, compression, and transmission. He is currently leading the Media Algorithms team at YouTube/Google. He is an active member of IEEE, ACM, SPIE, and VES, and has co-authored more than 80 papers and patents. His fields of research include image/video processing, machine vision, video compression, spherical capture, VR/AR, visual effects, and related areas.

Neil Birkbeck obtained his PhD from the University of Alberta in 2011 working on topics in computer vision, graphics and robotics, with a specific focus on image-based modeling and rendering. He went on to become a Research Scientist at Siemens corporate research working on automatic detection and segmentation of anatomical structures in full body medical images. He is now a software engineer in the transcoding team at YouTube/Google, with an interest in video processing aspects of new technologies like 360/VR/Omnidirectional video and HDR video.

Damien Kelly is a Software Engineer in the Media Infrastructure team at YouTube. Prior to YouTube, Damien was an engineer at Green Parrot Pictures, Dublin, Ireland, developing video enhancement technologies. Damien has a research background in digital signal processing, including video-enhancement, audio-visual tracking and acoustic source localization. In 2010, he received his PhD from Trinity College Dublin, Ireland, where he also spent time as a Research Fellow in the Signal Processing Media Applications Group.