Chapter 2

Prepare the Data

The data you’ll be working with is heterogeneous. It comes from multiple sources (sensors, databases, audio files, and so on), in different formats, from different domains, and with different time intervals. And it is noisy.

Effective preparation of all this data is critical. To be of use in an AI system, the data must be filtered, cleaned, and labeled.

Easy to say; difficult and time-consuming to do. For example:

  • What if the dataset is too large to load into memory?
  • How do you preprocess the data so that the network will give accurate results?
  • What is the quickest way to label all the data?
  • What if there isn’t enough data to train a network?

Let’s look at how MATLAB helps you handle these challenges.