19th Jun 23
Talk Ideas with Karthik Pattabiraman
Speaker: Karthik Pattabiraman
Date: 23 of June, 2023
Time: 2:00 pm
Place: Room G4.1
Presentation title: “Building Error Resilient Machine Learning Systems from Unreliable Components”
Karthik Pattabiraman is a Professor of Electrical and Computer Engineering at the University of British Columbia (UBC). He received his MS and PhD in computer science from the University of Illinois at Urbana Champaign (UIUC) in 2004 and 2009, and spent a postdoctoral year at Microsoft Research (MSR), Redmond before joining UBC in 2010. His research interests are in dependability, security, and software engineering. Karthik has won multiple awards such as the Inaugural Rising Star in Dependability Award, 2020, from the IEEE and the IFIP, the distinguished alumnus award from the University of Illinois (UIUC), CS department, 2018, and multiple UBC-wide awards for excellence in research and mentoring. Together with his students and collaborators, he has published over 100 papers, many of which have received distinguished paper awards at venues such as DSN and ICSE. He is a distinguished contributor of the IEEE computer society, a distinguished member of the ACM, and the vice-chair of the IFIP Working Group on dependable computing and fault-tolerance (WG 10.4). A more detailed biography may found at: https://blogs.ubc.ca/karthik/about/full-bio/.
Machine Learning (ML) has increasingly been adopted in safety-critical systems such as Autonomous vehicles (AVs) and industrial robotics. In these domains, reliability and safety are important considerations, and hence it is critical to ensure the resilience of ML systems to faults and errors. Hardware faults such as soft errors are becoming more frequent in commodity computer systems due to the effects of technology scaling and reduced supply voltages. These faults can lead to ML systems malfunctioning, and cause safety violations. Further, errors in the training data have been widely observed even in mature training datasets, and these can lead to significant degradation of accuracy in ML algorithms. Therefore, there is a compelling need to protect ML systems from both hardware faults and training data errors.
In this talk, I’ll present some of the work we’re doing in my group to ensure the dependability of ML systems in the presence of hardware faults and training data errors. For the former, we introduce Ranger, an automated transformation for Deep Neural Network (DNN)-based systems that can filter out the hardware faults that are likely to have the most impact in the DNN. For the latter, I’ll present the use of ensemble-based techniques, and show that they outperform most other techniques proposed in the ML community for dealing with training data errors. This is joint work with my students and colleagues at UBC, as well as with industry collaborators.