ECE and data science: a natural connection
Electrical and Computer Engineering (ECE) faculty and students at Michigan are part of the revolution in data science that is happening today.
Electrical and Computer Engineering (ECE) faculty and students at Michigan are part of the revolution in data science that is happening today. In fact, ECE researchers, especially those trained in information and signal processing techniques, are uniquely positioned to develop improved techniques to understand data, and teach others how make use of those techniques.
Take a look at some of the major programs and activities in data science happening within ECE and across the University of Michigan.
Research in Data Science
Michigan ECE faculty build the sensors that are collecting massive amounts of data, design computing systems smart and powerful enough to process the data, and then devise innovative ways to make sense of the data.
At Michigan, research in data science is focused in the areas of Signal & Image Processing and Machine Learning; Network, Communication, and Information Systems; and Computer Vision. Research conducted by senior to assistant professors alike has been impacted by data science and machine learning. The faculty and projects mentioned below are representative, but not inclusive, of Michigan ECE’s research involving data science.
For example, a leader for several decades in the area of signal processing, Prof. Al Hero, the John H. Holland Distinguished University Professor of EECS, now focuses on building foundational theory and methodology for data science and engineering. His algorithms are being to applied to network data analysis, personalized health, multi-modality information fusion, data-driven physical simulation, materials science, dynamic social media, and database indexing and retrieval. [Watch his recent lecture: Locating the Nodes: From Sensor Arrays to Genomic Networks.]
One of the many projects his group is tackling is knowing when there is enough of the right kind of data to provide reasonable assumptions. His group is attempting to establish theoretical fundamental limits that can aid practitioners and data analysts in acquiring the appropriate amount of data for reliable extraction of information. His group has experience working with astronomical data, network data, biomedical diagnostics, and predictive health.
Hero is co-author of Foundations and Applications of Sensor Management with David Castañón (Boston University), Douglas Cochran (Arizona State University) and Keith Kastella of General Dynamics, and co-editor of Big Data Over Networks with Shuguang (Robert) Cui (Texas A&M University), Zhi-Quan Luo (Chinese University of Hong Kong), and José M.F. Moura (Carnegie Mellon University).
Jeff Fessler, William L. Root Collegiate Professor of Electrical Engineering and Computer Science, conducts research in medical imaging, tomography, nonparametric estimation, and inverse problems, with current and past projects in X-ray CT, MRI, PET, SPECT, radiation therapy, and image registration. His research, based on a new way of processing data, has been utilized in a major medical scanner called Veo, manufactured by General Electric and introduced at the University of Michigan hospital in 2012. More recently, he is applying data science to achieve ultra-low dose CT image reconstruction in collaboration with Prof. Yong Long (PhD EE:Systems ’11) of the University of Michigan-Shanghai Jiao Tong University (UM-SJTU).
[watch his 2016 lecture, Signal processing methods for improving medical imaging.]
Clay Scott studies study patterns in large, complex data sets, and makes quantitative predictions and inferences about those patterns. He is particularly interested in developing new algorithms and proving performance guarantees for new and existing algorithms. Recent projects include applying machine learning methodology to functional neuroimaging (fMRI) data across a variety of mental health disorders, and nuclear particle classification to improve detection.
Jason Corso’s research in computer vision includes building better tools for imaging scientists and intelligence analysts to fully tap the information-rich image and video. His long-term goal is a comprehensive and robust methodology of automatically mining, quantifying, and generalizing information in large sets of projective and volumetric images and video. He is partnering with Kate Saenko (Boston University) and Walter Scheirer (University of Notre Dame) to centralize available data in the intelligent systems community through a COmputer Vision Exchange for Data, Annotations and Tools, called COVE.
In another project, Corso is working with colleague Laura Balzano to help develop a toolkit so that even non-data scientists can make use of the avalanche of data coming in every moment from researchers, hospitals, companies, consumers and government agencies. The goal of this $1.6M DARPA project is to create a system that is intelligent about how it selects which algorithms to apply to a specific data set, somewhat like a system of automated machine learning. The project is called SPIDER (Subspace Primitives that are Interpretable and DivERse).
Laura has several other projects that focus on big data problems, particularly the problem of messy data. She leads the Signal Processing Algorithm Design and Analysis (SPADA) lab, and has applied her research to environmental sensing, power systems, and computer network inference.
Both Laura Balzano and Clay Scott are members of the MIDAS-funded Center for Single Cell Genomic Data Analytics.
Sandeep Pradhan has developed a new “law of small numbers” in collaboration with researchers at Michigan and University of Cambridge that can used in distributed information processing where the information is gathered by a network of sensors. The technique could help power the distributed information processing required for future networks of robots, autonomous cars, sensors, and data centers.
And finally, faculty in the Division of Computer Science and Engineering, the other half of the Department of Electrical Engineering and Computer Science at Michigan, have significant research programs in data science.
Education in Data Science
The programs offered in data science at Michigan are diverse and ever-expanding.
Among established courses in ECE, enrollments have followed the explosion of interest in big data and data science. For example, Prof. Clay Scott has taught the graduate course in machine learning since 2007. Enrollment has grown during that time from 40 to more than 200 in recent terms. It draws from a wide variety of disciplines throughout the university.
In 2016, Scott introduced a new undergraduate course in information science that is focused on extracting information from data, ECE style (ie, including essentials of Shannon information theory).
Prof. Raj Nadakuditi recently introduced a graduate level course in computational data science that is attracting students from more than 60 different majors across the university (watch a video about this new course).
“There’s a real excitement to be able to use tools that take just 1-2 courses to learn that can be applied to as many problems as your imagination can take you,” said Raj.
In this course, a biomedical engineer can take algorithms developed for cars and apply the same technique, with some tweaking, to cells. As Raj often states, “ECE folks have always been able to put the pieces together to figure out how things work.”
Prof. Laura Balzano has incorporated a major data science-oriented project to her undergraduate course in digital signal processing. She says that once the students begin to understand that a technique used in one sphere can be used in another, they start to get really creative. [more about the course]
Other educational opportunities in data science at Michigan
MIDAS launched a Graduate Data Science Certificate Program in 2015 that is open to students across the University. As of 2018, 84 students from 14 different schools across campus are enrolled in the program, including 14 from ECE. Prof. Laura Balzano has acted as mentor and advisor to the students.
In recent years, new undergraduate and graduate degree programs have been established in data science that are collaborations between several departments, schools, and Institutes across the University.
Finally, there is an extension certificate in data science available online to non-University of Michigan graduate students, a variety of massive open online courses (MOOCs) covering the foundations, core and advance data science and predictive data analytics, and more.
The students themselves, looking to connect better with their colleagues, initiated the Michigan Data Science Team. Last year, the team welcomed 367 new members, nearly a quarter of whom were new to data science. One of the group’s activities that draws the most participants is its involvement in data science challenges, competitions, and projects run by companies and organizations around the country.
At Michigan, faculty have access to massive data sets in areas as diverse as public health and personalized medicine, transportation and connected vehicles, brain sciences, environmental and earth science, astronomy, materials science, genomics and proteomics, computational social science, business analytics, learning analytics, computational finance, information forensics, and national defense.
Alfred O. Hero co-directs the Institute along with Brian Athey, Michael A. Savageau Collegiate Professor and Chair of the Department of Computational Medicine and Bioinformatics.
Today, MIDAS includes more than 200 faculty members across 60 departments. It holds annual Symposiums as well as targeted events such as Women in Big Data at Michigan; it is a focal point for educational opportunities available in data science across the University; and it provides funding to researchers as well as research hubs to facilitate collaboration in areas including transportation, health, education, social science, and music.
MIDAS is organized under U-M’s Advanced Research Computing (ARC), which houses advanced computing resources to enable data-intensive and computational research. Co-directed by Eric Michielssen, Louise Ganiard Johnson Professor of Engineering and Associate Vice President for Advanced Research Computing, ARC is also home to a new institutional initiative in precision health.
Precision Health is a research, education, and service initiative that uses big data, computational science, genetics, biology, and social factors to better understand and prevent disease, promote wellness, and develop better treatment options that allow patients to improve their health and wellness. Co-directors include Michielssen; Michael Boehnke, Richard G. Cornell Distinguished University Professor of Biostatistics and Director of the Center for Statistical Genetics and Genome Science Training Program; and Sachin Kheterpal, associate professor of anesthesiology and associate dean for research information technology at the Medical School.
For the first time, Precision Health is making high-quality databases of medical data available to researchers. Previously, these databases have been available to only a handful of researchers in the medical school, mainly because the technology wasn’t available to curate all the data. Thanks to the resources of Michigan’s top 5 ranked hospital system, the database includes more than four million electronic patient records. A second key database includes genomics data on 60K patients.
Michielssen says his dream is to correlate people’s genomic information to what is in the electronic health record, saying, “If you can figure out that a certain gene can predict a certain cancer, then you can start to proactively treat patients.”
Like MIDAS, Precision Health sponsors faculty research.
Future of Data Science
Data science has reached the level of maturity, especially with the advent of machine learning, to allow for two different approaches within academia. Some faculty will continue to pursue fundamental research to develop better and more powerful data science tools, while others will use existing tools to define and solve new problems.
For example, in 2014 Mingyan Liu, the Peter and Evelyn Fuss Chair of Electrical and Computer Engineering, started the data-driven cyber risk security company, QuadMetrics. However, her previous research focused on determining fundamental performance limits and energy efficiency, as well as enabling structural health monitoring, in wireless, mobile, ad hoc and sensor networks. It was only recently that her research turned to modeling and mining of large-scale Internet measurement data and the design of incentive mechanisms for cyber security. By applying more or less standard machine learning techniques to a rather unconventional problem, Liu was able to build the world’s first enterprise cybersecurity ratings system. The company was acquired after two years.
This commercial success is evidence that while the research community will continue to push the frontier in developing data science tools, there are many opportunities where even standard tools can enable the conceptualization of new problem and solution spaces.
Today, it is more the rule than the exception that faculty and students in electrical and computer engineering are doing fundamental and applied research that is enhancing everyone’s ability to make important discoveries from the massive data available today. These efforts have the capability of improving life for all.