File Exchange

image thumbnail

DataFrame

version 1.1.0.0 (211 KB) by Nicholas
An object oriented table-like data structure, similar to Pandas DataFrame

6 Downloads

Updated 17 Aug 2015

GitHub view license on GitHub

Matlab impelementation of DataFrame/Pandas concept. I wanted to be able to customize the Matlab Table however I wanted, so I could extend the functionality. Plus, I find that Matlab's approach of having many functions floating around a bit hard to remember which functions you need for which data structures. Instead, I have wrapped the Matlab Table with a DataFrame class, which tries to stay out of the way as much as possible so we can leverage the Table as much as possible, while providing flexibility to extend it how we wish. Additionally, I have attached as methods to the DataFrame all of the existing functions that can operate on Tables, while providing some initial new ones. I'll likely add more methods over time to fill in where I see the Table lacking.
See the Github page for a basic walkthrough if its use. Please interact through Github for contributions/issues. I'll likely not check this page very often, if it all.
Use Cases:
One reason for doing this is that I'd like to show in the future how you could inherit from the generic DataFrame type, then perform type checking. So, if you require that an AddressBook table always be initialized with a name and address column, you could add that to a specialized subclass of DataFrame.

Note: There are some initial tests and example in the github repository.

Cite As

Nicholas (2020). DataFrame (https://www.github.com/rothnic/DataFrame), GitHub. Retrieved .

Comments and Ratings (4)

Hi, very nice, exactly looking for this.
I'm trying to add new rows to a data frame iteratively in a loop. Any ideas on how to do this? Perhaps creating a temp dataframe with the new data and then merging both somehow?

Thanks for help!

neiho

The dataframe displays shifting when there is number and string, like

df = DataFrame({'Simith'}, 0, 0, 0, ...
'VariableNames', {'test1', 'test2','test3','test4',},...
'RowNames',{'ROW'});

df =

test1 test2 test3 test4
________ _____ _____ _____

ROW 'Simith' 0 0 0

Oren Rosen

Ok, one note now that I've had some time to play with this.

I love the functionality of dataset arrays, tables, etc. and like you have based some of my own classes around then.

However, my biggest pet-peeve is that the subsref command can be quite slow.

Compare execution time for this:

data = randn(1000,1);
out = nan(1000,1);

tic; for n = 1:1000, out(n) = 2*data(n); end; toc

To this:

tb = table(data,'VariableNames',{'Col1'});

tic; for n = 1:1000, out(n) = 2*tb.Col1(n); end; toc

This gets compounded within this dataframe construction:

df = DataFrame(data,'VariableNames',{'Col1'});

tic; for n = 1:1000, out(n) = 2*df.Col1(n); end; toc

The source I believe is the extra overhead in your subsref necessary to distinguish between method calls that are native to table vs. custom for dataframe

I know the above example can be vectorized, but its indicative of more complex examples from real like that cannot.

Would love to hear your thoughts on such things. A comprehensive solution would need to start with the Table class itself.

Still a great submission, thanks for sharing.

Oren Rosen

Great submission, thanks for sharing!

Updates

1.1.0.0

Updated project name

MATLAB Release Compatibility
Created with R2013b
Compatible with any release
Platform Compatibility
Windows macOS Linux