Big Data with MATLAB |
Big data refers to the dramatic increase in the amount and rate of data being created and made available for analysis.
A primary driver of this trend is the ever increasing digitization of information. The number and types of acquisition devices and other data generation mechanisms are growing all the time.
Big data sources include streaming data from instrumentation sensors, satellite and medical imagery, video from security cameras, as well as data derived from financial markets and retail operations. Big data sets from these sources can contain gigabytes or terabytes of data, and may grow on the order of megabytes or gigabytes per day.
Big data represents an opportunity for analysts and data scientists to gain greater insight and to make more informed decisions, but it also presents a number of challenges. Big data sets may not fit into available memory, may take too long to process, or may stream too quickly to store. Standard algorithms are usually not designed to process big data sets in reasonable amounts of time or memory. There is no single approach to big data. Therefore, MATLAB provides a number of tools to tackle these challenges.
memmapfile function in MATLAB lets you map a file, or a portion of a file, to a MATLAB variable in memory. This allows you to efficiently access big data sets on disk that are too large to hold in memory or that take too long to load.matfile function lets you access MATLAB variables directly from MAT-files on disk, using MATLAB indexing commands, without loading the full variables into memory. This allows you to do block processing on big data sets that are otherwise too large to fit in memory.fft, inv, and eig, are multithreaded. By running in parallel, these functions take full advantage of the multiple cores of your computer, providing high-performance computation of big data sets.for-loop that runs your MATLAB code and algorithms in parallel on multicore computers. If you use MATLAB Distributed Computing Server, you can execute in parallel on clusters of machines that can scale up to thousands of computers. blockproc function in Image Processing Toolbox lets you work with really big images by processing them efficiently a block at a time. Computations run in parallel on multiple cores and GPUs when used with Parallel Computing Toolbox. See also: HDF5 files, large data import (in Database Toolbox)
Large Data Sets in MATLAB 47:41 (Webinar)