Spark: A Statistical Comparison and Evaluation of  Classification Algorithms for Fault Prediction in  Electrical Secondary Distribution Networks

David  T. Makota; Naiman Shililiandumi; Hashim U. Iddi

doi:10.56279/jicts.v2i2.9028

Authors

David T. Makota Institute of Finance Management
Naiman Shililiandumi University of Dar es Salaam
Hashim U. Iddi University of Dar es Salaam

DOI:

https://doi.org/10.56279/jicts.v2i2.9028

Abstract

Managing faults in electrical secondary distribution networks is a challenging task given the nature, size, and complexity. Predicting faults before they occur helps in increasing the safety and reliability of the power distribution system. Various statistical and machine learning techniques are being used to predict different types of faults. This study applies classification algorithms available in the Apache Spark framework, through its python interface PySpark, to predict electrical secondary distribution network faults. The study evaluates and compares ten algorithms: Decision tree, Gradient-boosted tree, Binomial Logistic Regression, Multinomial Logistic Regression, Naïve Bayes, Multilayer perceptron, Random Forest, Linear Support Vector Machine, One-versus-rest, and Factorization machines. The research uses Friedman’s test followed by Nemenyi post hoc test to find the significance of performance differences among the algorithms. The results show significant differences among the algorithms. Gradient-boosted tree and One-versus-rest with Gradient-boosted tree had the best performance for binary and multiclass classification, respectively, while Naïve Bayes had the worst performance. By identifying the most effective algorithms, this research provides a practical reference for selecting suitable models, aiding in fault prediction, reducing system downtime, and optimizing maintenance strategies. Additionally, the results can inform the selection of base models for ensemble methods, further improving prediction accuracy.

Author Biographies

David T. Makota, Institute of Finance Management

Department of Computer Science

Naiman Shililiandumi, University of Dar es Salaam

Department of Electronics and Telecommunications Engineering

Hashim U. Iddi, University of Dar es Salaam

Department of Electronics and Telecommunications Engineering

Spark: A Statistical Comparison and Evaluation of Classification Algorithms for Fault Prediction in Electrical Secondary Distribution Networks

Authors

DOI:

Abstract

Author Biographies

David T. Makota, Institute of Finance Management

Naiman Shililiandumi, University of Dar es Salaam

Hashim U. Iddi, University of Dar es Salaam

Downloads

Published

Issue

Section

journal_image