How does 'unique' work under the hood (in general terms)
Show older comments
I am just curious as to how this function works by default, under the hood.
I know that a couple of ways of implementing unique manually is to copy over the elements from one array into a hash table or a binary search tree.
Given that the default implementation of 'unique' also sorts the elements, I would guess that it maybe copies over the elements into a binary search tree.
As a result, if I need to both sort and remove duplicate elements/rows from an array I would only need to call 'unique' instead of sort(unique(...)) or sortrows(unique(...), ...)?
3 Comments
Without opening up the function, there are only two ways to tell:
- Reading the documentation
- Executing several examples
What you describe seems to be correct, so I don't really understand what your question is.
Two notes:
- The unique function didn't always sort the output. I don't have local installs old enough right now. The online R2017a doc mentions sorting as the default.
- Unique doesn't accept all data types. I created a wrapper for unique that does accept all datatypes (or at least, most of them). It uses ComputeNonCryptHash to hash all elements and then calls unique on that cellstr. It's not nearly as polished as the things I would normally publish, but I attached it just in case you need something like this.
help unique_array
Erik Johannes Loo
on 14 Feb 2022
Rik
on 14 Feb 2022
Those who know can't tell, those who can tell don't know. You could try to dig in the source code, but you can't do anything with that knowledge.
I would be very hesitant to apply a big O notation on my function. I rolled my own hashing algorithm. You should be deeply, DEEPLY, mistrustful of me. It works for what I do with it: confirm my test functions return the same output. I only use this unique_array function if I'm there to catch any strange errors.
I suspect the unique function will run different code when it is called with the 'stable' flag. Mathworks employs smart people, who're probably aware of this trade-off.
Accepted Answer
More Answers (1)
Walter Roberson
on 14 Feb 2022
1 vote
unique() does a sort() and then checks where adjacent elements are different.
You can read the code; it is a .m file.
Categories
Find more on Shifting and Sorting Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!