Equipping Student Engineers with Data Science Skills
By Thomas Popham, University of Warwick
Data science is emerging as a highly desirable skill for engineers. Most engineering degrees currently offer data science topics as an option during the final year, usually in a narrow field of study. In 2018, we introduced data science as a core topic for all undergraduate engineering streams (including Civil, Mechanical, and Electronic and Systems) at the University of Warwick. From our industrial experience we know that data analytics is affecting virtually every area of engineering development and operation.
Data science and machine learning will soon be essential skills for all engineers, whether they are applying machine learning algorithms, providing data to feed these algorithms, or making decisions based on the results. That is why we introduced data science as a thread through the Warwick Engineering degree, starting from the introduction of programming and simple statistical models during the first year, moving to a core data analytics module in the second year, and then offering more stream-specific modules in years 3 and 4.
Year 1: Systems Modeling, Simulation, and Computation
All first-year engineering undergraduates take ES197: Systems Modelling, Simulation, and Computation. In this module, students learn how to use both physical and (simple) data-driven approaches for modeling engineering systems. This module also serves as an introduction to programming.
To familiarize themselves with programming and with MATLAB®, students complete lessons from the online MATLAB Fundamentals course. From an educator’s perspective, this approach works really well, as it allows students to learn at their own pace by completing the various programming exercises and getting immediate feedback.
After applying the MATLAB skills they’ve acquired to assignments on curve fitting and deriving simple models and relationships from data, the students tackle modeling and simulation problems using examples from electrical, thermal, and translational systems. Unlike computer science students, who view programming as a necessary skill, many engineering students may not initially appreciate its relevance. By introducing programming in the context of modeling and simulation, we aim to show students that coding is a skill that will be useful to them throughout their careers.
In later assignments, students incorporate noise or other random effects into the model. For example, we have them create a simple model in MATLAB in which particles shoot up into the air and fall back down while being acted upon by random forces. The simulation produces an interesting 3D visualization (Figure 1). The entire project gives students confidence in their abilities to create their own models programmatically.
Year 2: Engineering Mathematics and Data Analytics
The second-year module ES2C7 Engineering Mathematics and Data Analytics focuses on solving regression, classification, and clustering problems. When I worked in industry, I saw that solving data science problems was relatively straightforward once the data was clean and in the proper format, but that is rarely the case with real-world data. With this in mind, I teach the students how to identify and remove outliers, handle missing values, and organize data in tables.
MATLAB Live scripts are particularly useful during lectures because I can include formatted text and images to remind me of what I want to cover and because the output of the code appears along with the code that produced it. The Classification Learner and Regression Learner apps in Statistics and Machine Learning Toolbox™, meanwhile, make it possible to teach the broad principles of regression and classification without delving into implementation details (Figure 2).
Once students see what the apps do and how they can be used, I show the class how the underlying algorithms work in MATLAB.
After completing lab assignments on regression, classification, and clustering, the students work on a group project in which I ask them to imagine working for an engineering consultancy tasked with assessing the quality of manufactured steel components. The students must predict which components are most likely to fail using two data sets, one that is fairly clean and one that is messy and complicated.
Working with noisy data in a variety of file formats, including Excel®, CSV, and plain text, the students remove outliers, perform joins, and prepare the data to be used in training a model. Most groups use the Regression Learner app or implement linear regression in a MATLAB script; some try both approaches. To complete the project and demonstrate the skills they developed throughout the module, each group creates a video that presents their findings and the methods they employed.
Year Three and Beyond
For students interested in exploring data science and machine learning further, Warwick offers a third-year module on intelligent system design that covers computer vision and more advanced machine learning techniques. In this module, I introduce students to the sense-perceive-act framework used in many autonomous control system applications. The quadcopter model in Simulink® (Figure 3) is very useful for showing this basic framework while introducing students to topics to be covered later in the module, such Kalman filtering and optical flow.
Later, students develop a gesture recognition app with MATLAB that combines computer vision and machine learning. For this project, students develop a model capable of interpreting webcam images of their own hands and classifying them as one of several predetermined hand gestures. The project is particularly engaging for the students because they are working with their own data and need to think about factors such as lighting and how many different images are needed to train an accurate classifier.
Students who learn how to apply data science techniques in the context of real-world problems early in their studies are well prepared not only for advanced coursework in subsequent years but also for careers as practicing engineers. We have already received very positive feedback from our students on this approach—they have found that they are able to apply these techniques during undergraduate internships and to talk about these skills in interviews.
With device connectivity enabling companies to base their design decisions on data rather than on intuition or previous experience, engineers with a background in data analytics are very much in demand. While few of our graduates will enter the workforce ready to demonstrate new tools to senior engineers, we are confident that they will be able to apply machine learning and data analytics whenever the situation requires it.