Large Files and Big Data

Access and process collections of files and large data sets

Large data sets can be in the form of large files that do not fit into available memory or files that take a long time to process. A large data set also can be a collection of numerous small files. There is no single approach to working with large data sets, so MATLAB® includes a number of tools for accessing and processing large data.

Begin by creating a datastore that can access small portions of the data at a time. You can use the datastore to manage incremental import of the data. To analyze the data using common MATLAB functions, such as mean and histogram, create a tall array on top of the datastore. For more complex problems, you can write a MapReduce algorithm that defines the chunking and reduction of the data.