Import Data

If you have an external data set and want to analyze it in MuPAD®, import the data to the MuPAD session. To import an ASCII data file to the MuPAD session, use the import::readdata function. Suppose, you want to analyze the world population growth and compare it to the US population growth between 1970 and 2000. The text file "WorldPopulation" contains the required data. To be able to work with the data in MuPAD, import the contents of the file line-by-line by using the import::readdata function. The function returns the following nested list:

data := import::readdata("WorldPopulation")

You can convert the resulting nested list to other data structures, For example, represent the imported data as a sample. A sample is a collection of statistical data in the form of a matrix. To convert the nested list of imported data to a sample, use the stats::sample function:

s := stats::sample(data)
year  world(thousands)  US(thousands)  AnnualRateWorld  AnnualRateUS
1970           3711962         205052             2.07          1.26
1971           3789539         207661             1.99          1.07
1972           3865804         209896             1.94          0.95
1973           3941551         211909             1.87          0.91
1974           4016056         213854             1.79          0.99
1975           4088612         215973             1.73          0.95
1976           4159763         218035             1.71          1.01
1977           4231510         220239             1.68           1.1
1978           4303134         222585             1.71          1.18
1979           4377497         225055              1.7          0.98
1980           4452548         227726              1.7          0.96
1981           4528882         229966             1.75          0.91
1982           4608682         232188             1.75          0.87
1983           4690278         234307              1.7          0.89
1984           4770468         236348              1.7          0.91
1985           4852052         238466             1.71          0.89
1986           4935874         240651             1.73          0.91
1987           5022023         242804             1.71          0.94
1988           5108860         245021             1.69          1.12
1989           5195713         247342             1.68          1.33
1990           5283687         250132             1.57          1.33
1991           5367185         253493             1.56           1.3
1992           5451672         256894              1.5          1.21
1993           5534138         260255             1.46          1.18
1994           5615311         263436             1.44          1.16
1995           5696677         266557              1.4           1.2
1996           5776857         269667             1.35          1.17
1997           5855087         272912             1.31          1.15
1998           5932091         276115             1.28          1.02
1999           6008255         279295             1.25          1.01
2000           6083550         282172             1.24          0.94

The first row in that sample contains text. The statistical functions cannot work with the text. Before you start analyzing the data, delete the first row:

s := stats::sample::delRow(s, 1)
1970  3711962  205052  2.07  1.26
1971  3789539  207661  1.99  1.07
1972  3865804  209896  1.94  0.95
1973  3941551  211909  1.87  0.91
1974  4016056  213854  1.79  0.99
1975  4088612  215973  1.73  0.95
1976  4159763  218035  1.71  1.01
1977  4231510  220239  1.68   1.1
1978  4303134  222585  1.71  1.18
1979  4377497  225055   1.7  0.98
1980  4452548  227726   1.7  0.96
1981  4528882  229966  1.75  0.91
1982  4608682  232188  1.75  0.87
1983  4690278  234307   1.7  0.89
1984  4770468  236348   1.7  0.91
1985  4852052  238466  1.71  0.89
1986  4935874  240651  1.73  0.91
1987  5022023  242804  1.71  0.94
1988  5108860  245021  1.69  1.12
1989  5195713  247342  1.68  1.33
1990  5283687  250132  1.57  1.33
1991  5367185  253493  1.56   1.3
1992  5451672  256894   1.5  1.21
1993  5534138  260255  1.46  1.18
1994  5615311  263436  1.44  1.16
1995  5696677  266557   1.4   1.2
1996  5776857  269667  1.35  1.17
1997  5855087  272912  1.31  1.15
1998  5932091  276115  1.28  1.02
1999  6008255  279295  1.25  1.01
2000  6083550  282172  1.24  0.94

The MuPAD statistical functions accept the resulting sample because it contains only numeric data. Now, you can analyze the sample. For example, compute the correlation between the US population and total world population stored in the second and third columns of the sample. Use the float function to approximate the result:

float(stats::correlation(s, 2, 3))

The correlation coefficient is close to 1. Therefore, the world population data and the US population data are linearly related. Now, compute the correlation coefficient for the population growth rates stored in the fourth and fifth columns of the sample. In this case, you can omit the float function. MuPAD returns a floating-point result because the input data contains floating-point numbers:

stats::correlation(s, 4, 5)

The correlation coefficient indicates that the data for the world population growth rates and the data for the US population growth rates are not linearly related.

Was this topic helpful?