Automated Data Cleaning Techniques Using Machine Learning Algorithms in Big Data Pipelines

Derek McAuley

Authors

Derek McAuley School of Computer Science, University of Nottingham, UK

Abstract

In today's data-driven landscape, the integrity of insights derived from big data is crucial for informed decision-making. This paper explores automated data cleaning techniques using machine learning algorithms to address common data quality issues such as missing values, duplicates, and inconsistencies. By analyzing various machine learning approaches—including supervised, unsupervised, and semi-supervised learning—we demonstrate the efficacy of these techniques in enhancing data quality management within big data pipelines. Our findings indicate that machine learning not only automates but also improves the precision and efficiency of data cleaning processes, making it an invaluable tool for organizations aiming to harness the full potential of their data.

Automated Data Cleaning Techniques Using Machine Learning Algorithms in Big Data Pipelines

Authors

Abstract

Downloads

Published

Issue

Section

Make a Submission

Information

Indexing