searching for a line of pixels in an image

I have been set a task for which part is to read data from an image of a graph.
I have been able to detect the red line fairly simply by seaching for where the rgb valuse match the red on the graph (the image reads as a 3D array of uint8 type rgb values) but i need to be able to find the black axes lines for reference to calculate the values. This is more tricky as the numbers and labels are also black. I planned to search for a row and a column where most of the values are (0,0,0) and are consecutive. What is the best way to achieve this?

 Accepted Answer

DGM
DGM on 15 Mar 2023
Edited: DGM on 12 Nov 2023
There are FEX submissions that are meant for this, but I have my opinions.
This is my advice and an example of the basic approach I use for plot transcription. See the links therein for caveats.
Applied to the given image:
% using the following FEX tools:
% https://www.mathworks.com/matlabcentral/fileexchange/72225-load-svg-into-your-matlab-code
% filename of manually-fit svg file
fname = 'redtemp.svg';
% data range from original image axis labels
xrange = [0 2000];
yrange = [0 2000];
% spline discretization parameter [0 1]
coarseness = 0.001;
% get plot box geometry
str = fileread(fname);
str = regexp(str,'((?<=<rect)(.*?)(?=\/>))','match');
pbx = regexp(str,'((?<=x=")(.*?)(?="))','match');
pby = regexp(str,'((?<=y=")(.*?)(?="))','match');
pbw = regexp(str,'((?<=width=")(.*?)(?="))','match');
pbh = regexp(str,'((?<=height=")(.*?)(?="))','match');
pbrect = [str2double(pbx{1}{1}) str2double(pby{1}{1}) ...
str2double(pbw{1}{1}) str2double(pbh{1}{1})];
% get coordinates representing the curve
S = loadsvg(fname,coarseness,false);
x = S{1}(:,1); % assuming the first path is the correct one
y = S{1}(:,2);
% if there are multiple paths you want to extract
% you'll need to do do the rescaling, etc for each element of S
% rescale to fit data range
x = xrange(1) + diff(xrange)*(x-pbrect(1))/pbrect(3);
y = yrange(1) + diff(yrange)*(pbrect(4) - (y-pbrect(2)))/pbrect(4);
% get rid of nonunique points
[x,idx,~] = unique(x);
y = y(idx);
% plot
plot(x,y); grid on; hold on
xlim(xrange)
ylim(yrange)
The SVG file is attached. You can tweak the spline as you see fit. As I mention in the other threads, the advantage of using external tools is largely the improved controls (both view controls and spline editing). Visual fitting is no loss when we accept that the source is grossly imprecise to begin with.
This thread includes a rough tutorial on how the SVG file is created (see the comments)

3 Comments

If you really wanted to do it by image processing, this is one example.
inpict = imread('redtemp.jpg');
% box extents in data coordinates
xrange = [0 2000];
yrange = [0 2000];
% find the red line
hsvpict = rgb2hsv(inpict);
lim = [0.9 0.07; 0.3 1; 0.4 1];
mask = (hsvpict(:,:,1)>=lim(1,1) | hsvpict(:,:,1)<=lim(1,2)) ...
& (hsvpict(:,:,2)>=lim(2,1) & hsvpict(:,:,2)<=lim(2,2)) ...
& (hsvpict(:,:,3)>=lim(3,1) & hsvpict(:,:,3)<=lim(3,2));
% close holes and bridge the gap in the vertical portion
mask = imclose(mask,ones(20,1));
% reduce the blob to a central line
trace = bwskel(mask);
imshow(trace)
% find plot box
lim = [0 1; 0 1; 0 0.75];
mask = (hsvpict(:,:,1)>=lim(1,1) & hsvpict(:,:,1)<=lim(1,2)) ...
& (hsvpict(:,:,2)>=lim(2,1) & hsvpict(:,:,2)<=lim(2,2)) ...
& (hsvpict(:,:,3)>=lim(3,1) & hsvpict(:,:,3)<=lim(3,2));
% get rid of small speckles, and select only long straight lines
mask = bwareaopen(mask,100);
mask = imopen(mask,ones(50,1)) | imopen(mask,ones(1,50));
mask = bwskel(mask);
imshow(mask)
% box extents in image coordinates
[x1 x2] = bounds(find(any(mask,1)));
[y1 y2] = bounds(find(any(mask,2)));
% convert the trace to xy data
[y0 x0] = find(trace,1); % find initial point
B = bwtraceboundary(trace,[y0 x0],'E'); % [y x]
x = B(:,2);
y = B(:,1);
% rescale the trace to data coordinates
x = xrange(1) + diff(xrange)*(x-x1)/(x2-x1);
y = yrange(1) + diff(yrange)*((y2-y1) - (y-y1))/(y2-y1);
% get rid of nonunique points
[x,idx,~] = unique(x);
y = y(idx);
% plot
plot(x,y); grid on; hold on
xlim(xrange)
ylim(yrange)
Of course, this is fragile and generally can't be expected to work directly with other images. The fact that the source is a JPG is a big obstacle, and doing the segmentation in HSV only reduces the available degrees of freedom for these particular colors. It wouldn't take much for the trace to become more difficult to separate from the plot box or the background.
The recovered trace retains all the arifacts of the source, and there's no simple way to get rid of them without altering the shape.
In short, doing this by direct processing of a single raster image takes more time, and a lot of trial and error. The results are poorly-controlled and typically full of artifacts, and the process can't be simply applied to other images without investing a bunch more time in trial and error. For example, if this were the next image to be processed, it would fail and require more tedious adjustment to the script, even though it's literally just a transcoded copy of the first image.
Worse yet, because the results are poorly-controlled, you'll have to spend time to make sure that each result actually makes sense. There's nothing stopping this from misinterpreting the plot box size and giving you something that's subtly (or grossly) scaled wrong.
Consider the following admittedly extreme example:
Let's ignore for now the fact that the image is rotated and deformed, and that the traces are not red.
Note that the extents of the tick labels are not what's described by the plot box. Even if you could automatically find the plot box width, you don't know what it means in data coordinates; all you know is that it's something around 285 or so. We're only assuming that the plot box begins and ends with a labeled tick mark. If it's a possibility that it won't, then what we really need to find is the location of the first and last labeled tick mark on each axis. That's a lot harder to do programmatically.
Also note that the tick marks at the origin will be caught by the box finding routine, causing both the height and width of the plot box to be wrong; consequently, the data scaling will be wrong. It doesn't have to be a tick mark that throws things off. It could be an unexpected tick label exponent or just some JPG crust.
The trace extraction would obviously be a horrible nightmare to attempt programmatically in this case. On the other hand, this could be transcribed by hand in a few minutes with a simple polyline tool, and you could be certain that that the data was interpreted as you intended it to be interpreted.
thanks for the help. May I ask a few very basic questions? Can you explain what is a plot box, a spline and what is the purpose of the svgload function?
DGM
DGM on 17 Mar 2023
Edited: DGM on 17 Mar 2023
The plot box is the rectangular region described by the extent of the x,y axis rulers. It's the green area in this image.
The location and extent of the plot box (or at least the box defined by the extreme axis ticks) is important because it essentially provides calibration information. Knowing where x=0 and x=2000 in pixels tells us what a given pixel's position is in data coordinates (i.e. seconds).
When I say "spline", I'm talking somewhat loosely. In the context of what would be done natively within MATLAB if building a curve from interactively selected points, you'd be creating some uncontrolled interpolating spline or smoothing spline. If you're using an actual graphics tool to do the work in a practical manner, you'd be creating a Bézier spline, where the control points are actually adjustable. In Inkscape, it would look like:
Node type and symmetry are configurable per-point. Again, note that the plot box is defined by the light orange rectangle.
As the original graph image is traced in Inkscape (or any competent vector editing program), the result is a vector graphics file. In this case, it's an SVG file. Correspondingly, svgload() is used to read the file and the properties of the vector objects therein. In this case, I'm only using it to read the spline (the trace) as xy data in image coordinates. Again, the light orange rectangle in the image above tells the script where the plot box is.
Can you use GRABIT or other MATLAB transcription tools? Yes. Most should work for a simple graph like this.
As I mentioned in the linked answer, I feel that anything that does the task in MATLAB is going to be a compromise. The view controls are universally cumbersome, laggy, and buggy. Depending on the specific tool, you constantly have to disable your cursor tool so that you can move or zoom without dumping errors and warnings to console. I also don't like that GRABIT has no real way to assert that the image is already grid-aligned. You're forced to carefully try to pick the box extents by hand without misaligning a single click, otherwise it will perform a projective transformation to the detriment of accuracy. I also don't like that there's no good way to say that the x and y axes begin in the same location -- which is something incredibly basic. There's also no way to adjust a point -- or even see where it's located after you click.
In the end, you just get a set of questionably selected points -- a simple rough polyline.
I prefer to just transcribe the curve with a spline in a vector image editor with actual view controls made for human use. The spline can be fit visually by you to whatever degree you feel best fits your particular JPG pixel salad. For a tiny low-res graph like this, the benefits of fitting a spline are questionable, but it's what I'm sticking with. Even if a rough polyline suffices, I'd rather have usable view controls.
Like I said, I have my opinions. As it's something that I don't tolerate myself, it's not something that I recommend beyond mere mention. It's ultimately up to you to decide what you are comfortable using. I admit that the solution I gave might not be comfortable to someone who isn't used to doing vector graphics.

Sign in to comment.

More Answers (1)

Here are some File Exchange submissions and you can see how they did it.

Categories

Products

Release

R2022a

Asked:

on 15 Mar 2023

Edited:

DGM
on 12 Nov 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!