Applying Principal Component Analysis and Autoencoders for Dimensionality Reduction in Data Stream

Authors

  • Mayur Prakashrao Gore Principal Software Engineer, CGI Inc, Austin, Texas
  • Amol Ashokrao Shinde Lead Software Engineer, Mastech Digital Technologies Inc, Pittsburgh PA, United States
  • Amit Choudhury Department of Information Technology, Dronacharya College of Engineering, Gurgaon

Keywords:

Dimensionality Reduction, Data Streams, Principal Component Analysis (PCA), Autoencoders, Real-Time Processing, High-Dimensional Data

Abstract

This research focuses on the role and efficiency of Principal Component Analysis (PCA) and autoencoders, when working separately and concurrently for dimensionality reduction in large scale data. This is because the data obtained from sources such as the IoT sensors and the social media platforms is a lot more complicated than before and therefore proper dimensionality reduction is critical for real time analysis. The performance of these methods is analyzed on synthetic and real datasets based on which explained variance, reconstruction error and processing time of these methods are compared to define the optimal configuration. The results show that solely PCA is fast in linear data and autoencoders capture nonlinear dependence with slightly higher time complexity. This preserves considerable variance alongside a reasonable reconstruction error and thus makes the PCA-autoencoder model well suited to dynamic environments while incurring less computational expense than alternative PCA models. This work shows that it is possible to utilize relevant combinations of methods for dimensionality reduction to boost real-time data stream analysis especially in applications that demand for high accuracy at the same time as low delays.

Downloads

Published

2024-11-03

Issue

Section

Articles