Laura Balzano receives NSF CAREER Award to improve machine learning for big data applications
Her research deciphering messy data sets will first tackle applications in genetics and computer vision.
Prof. Laura Balzano received an NSF CAREER award to support research that aims to improve the use of machine learning in big data problems involving elaborate physical, biological, and social phenomena. The project, called “Robust, Interpretable, and Efficient Unsupervised Learning with K-set Clustering,” is expected to have broad applicability in data science.
Modern machine learning techniques aim to design models and algorithms that allow computers to learn efficiently from vast amounts of previously unexplored data, says Balzano. Typically the data is broken down in one of two ways. Dimensionality-reduction uses an algorithm to break down high-dimensional data into low-dimensional structure that is most relevant to the problem being solved. Clustering, on the other hand, attempts to group pieces of data into meaningful clusters of information.
However, explains Balzano, “as increasingly higher-dimensional data are collected about progressively more elaborate physical, biological, and social phenomena, algorithms that aim at both dimensionality reduction and clustering are often highly applicable, yet hard to find.”
Balzano plans to develop techniques that combine the two key approaches used in machine learning to decipher data, while being applicable to data that is considered “messy.” Messy data is data that has missing elements, may be somewhat corrupted, or is filled heterogeneous information – in other words, it describes most data sets in today’s world.
She will apply her techniques to specific applications in genetics and computer vision.
Balzano directs the Signal Processing Algorithm Design and Analysis (SPADA) lab, which studies algorithms for statistical signal processing and machine learning with applications in data analysis, computer vision, environmental monitoring, image processing, control systems, power grids, genetic expression data analysis, consumer preference modeling, and computer network analysis.