Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comprehensive review of federated learning challenges: a data preparation viewpoint

2025·13 Zitationen·Journal Of Big DataOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Machine learning model accuracy, generalization, and reliability are greatly affected by the training data quality. High-quality data-characterized by completeness, consistency, accuracy, representativeness and homogeneity enables meaningful pattern learning and robust prediction. In federated learning (FL), the learning process is collaborative and conducted across decentralized and locally private data nodes. The heterogeneity of data across these nodes degrade model performance and may lead to overfitting, underfitting, and erroneous decision-making. Heterogeneity is caused by inconsistent labeling, missing values, and class imbalances across these nodes. Proper data preparation, including cleaning, normalization, and augmentation, is essential to mitigate these issues and ensure that these distributed datasets reflect the problem domain accurately. The raw data, which is generated from diverse sources with the fundamental constraint that this data cannot be shared among learning nodes exacerbates these challenges. Although data preparation has received great interest in recent years; little attention has been given to data challenges posed when FL is used. Although some surveys mention FL challenges, it is discussed superficially. These papers predominantly focus on one aspect of data challenges such as quality, homogeneity or balance discussing FL within the context of these specific challenges. No recent survey examine all data-related challenges in FL, including their interdependencies and interactions. To address these limitations, the main contribution of this paper is providing a comprehensive overview of data challenges in FL, encompassing data heterogeneity, skewness, representation, quality, bias, and fairness. The paper begins by identifying the data challenges highlighted in the existing literature, with a particular focus on the interrelationships among these challenges, which are categorized into two main groups: non-independently and non-identically distributed (Non-IID) data issues and data quality issues. Subsequently, the paper reviews and compares recognized data challenges solution approaches exploring additional data preparation techniques that could serve as candidate solutions. The paper aims to define the necessary work to optimize the effectiveness of these techniques with respect to distributed and isolated data in FL.

Autoren

Institutionen

German University in Cairo(EG)

Themen

Privacy-Preserving Technologies in DataIoT and Edge/Fog ComputingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Comprehensive review of federated learning challenges: a data preparation viewpoint

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen