Scatter-plot millions of points with zoom

I'm trying to plot tens of millions of 2d points using the scatter function but it's extremely slow.
It takes several seconds to minutes to display the plot.
Once displayed, zooming and panning is tolerable but still has some latency, the other big problem is that when I'm done inspecting the plot and close the figure window, MATLAB freezes completely and it takes several minutes for MATLAB to become responsive again.
Q1: is there any way I can dump the data quicker? once i decide to close the plot I don't understand why MATLAB seizes up...
I've looked into plot (big) and similar submissions but they seem to address the problem by down-sampling the data prior to display (and then the higher resolution data is lost), or to only work with line plots, or to only work with constant-frequency time series data.
Down-sampling the entire plot doesn't work because I need the ability to zoom/pan into some smaller areas and display all of (or at least more of) the points again.
My data is not exactly constant frequency but the x-axis data is positive increasing and I could live with displaying it at constant frequency for analysis purposes.
I'm using r2018a on a PC with an i7 processor and 32GB of ram.
I've tried using tall-arrays instead which definitely helped with the "seizing up" when i close the figure but it still takes a while to display the figure, and a bigger problem is that I can't seem to use a third tall array as my colour-coding.
Error using tall/scatter>parseinput (line 87)
Argument 4 to scatter must be a non-tall array.
In short - I'm looking for a solution that does one of the following :
1) allows me to plot a Nx3 tall array, with columns 1 and 2 being my X/Y data, and column 3 being a scalar that I can use to define a colour map for the points.
2) a solution like plot(big) that downsamples the data to the pixel size, but with the ability to zoom and re-sample to the zoomed-extents, and to display points rather than lines.

4 Comments

Scatter plots don't really work well with vast amounts of data points, as you are seeing.
Personally I would do like I have done with image data on occasions which I interpolate to a certain level and then interpolate further when I zoom in.
In your case this would mean doing something like the examples you saw that down-sample your data, but then write your own zoom callback which will re-sample the data appropriately as you zoom in, and when you zoom in far enough it wouldn't resample at all. The idea being that you can plot the detail when you are zoomed in if you doin't also have the graphics objects for the vast number of points you aren't looking at.
It's not trivial and may take quite a bit of work, but scatter plotting tens of millions of points simply isn't viable in general.
Thanks for your comments Adam,
I've been continuing today with using scatter() on my "tall" arrays and compromising by losing my colour map and that's been working for me for the time being. As I mentioned before, it still takes a few seconds to populate the plot and navigate, but it's quite tolerable. And closing the figure window does not render matlab useless for several minutes anymore (which was the major problem I had just using scatter on my Nx3 array).
With that in mind, I had the thought that maybe it was the colour map causing the intolerable parts of the latency i was experiencing before...
So, I split my array into a few arrays based on my third "colourmap" column (since in my case they were only a few discrete values colorizing the points but those few colours added very useful info for my analysis).
Then I used scatter to plot each array as an individual series to get my colours back...
Not exactly elegant but it's making my plotting usable now so for my purposes it works! (even using my original array, not the tall array)
For example, my old, very slow code:
size=5;
figure(2)
scatter(HugeArray(:,1),HugeArray(:,2),size,HugeArray(:,3),'filled');
Slightly better, using tall array but losing colormap:
size=5;
tallArray=tall(HugeArray);
figure(2)
scatter(tallArray(:,1),tallArray(:,2),size,'filled');
Much better performance maintaining (discrete) colours
HugeArray_mode0=HugeArray(HugeArray(:,3)==0,:); %subset of HugeArray with "mode"=0
HugeArray_mode1=HugeArray(HugeArray(:,3)==1,:);
HugeArray_mode2=HugeArray(HugeArray(:,3)==2,:);
HugeArray_mode3=HugeArray(HugeArray(:,3)==3,:);
size=5;
figure(2)
hold on;
scatter(HugeArray_mode0(:,1),HugeArray_mode0(:,2),size,'filled');
scatter(HugeArray_mode1(:,1),HugeArray_mode1(:,2),size,'filled');
scatter(HugeArray_mode2(:,1),HugeArray_mode2(:,2),size,'filled');
scatter(HugeArray_mode3(:,1),HugeArray_mode3(:,2),size,'filled');
hold off;
My actual code still uses the tall arrays but i tested using the 'normal' arrays and it's still MUCH MUCH faster than before. (unfortunately as the major latency was in closing the figure window, i'm not sure how to benchmark it other than pulling out a stopwatch...)
Hopefully someone else finds this tip useful!
Can't you reduce points for displaying?
i = 1:1000:length(x);
scatter3(x(i),y(i),z(i))
I havbe found with image data that when I look in the profiler what is taking time for my image updates applying the colourmap does seem to be quite long relatively so it may well be even more so for scatter plotted data.

Sign in to comment.

Answers (0)

Categories

Asked:

on 15 May 2019

Commented:

on 17 May 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!