Understanding the Bug Characteristics and Fix Strategies of Federated Learning Systems

Image credit: Unsplash

Abstract

Federated learning (FL) is an emerging machine learning paradigm that aims to address the problem of isolated data islands. To preserve privacy, FL allows machine learning models and deep neural networks to be trained from decentralized data kept privately at individual devices. FL has been increasingly adopted in mission-critical fields such as finance and healthcare. However, bugs in FL systems are inevitable and may result in catastrophic consequences such as financial loss, inappropriate medical decision, and violation of data privacy ordinance. While many recent studies were conducted to understand the bugs in machine learning systems, there is no existing study to characterize the bugs arising from the unique nature of FL systems. To fill the gap, we collected 395 real bugs from six popular FL frameworks (Tensorflow Federated, PySyft, FATE, Flower, PaddleFL and Fedlearner) in GitHub and StackOverflow, and then manually analyzed their symptoms and impacts, prone stages, root causes and fix strategies, and report a series of findings and actionable implications. Finally, we provide possible suggestions or solutions for developers of FL systems based on the above findings and implications.

Date
Dec 7, 7070 11:30 AM — 11:45 AM
Dr. Jialun, CAO (Bella)
Dr. Jialun, CAO (Bella)
Research Assistant Professor

Jialun’s research interests lie in SE4AI and AI4SE, trustworthy AI, LLM4SE.