Pdist gives NAN but there are no missing values in input array
6 views (last 30 days)
Show older comments
Hi Everyone,
I am using the pdist function to calculate pairwise distances (jaccard) between two observations. Now what happens sometimes, seemingly at random, is that instead of getting the distance, the value that is calculated gets shown as 'NaN'. The matrix that is taken as input, however, does not have any missing values. Rather it seems that the correct answer for these places should be a '0' (as in, they do not have anything in common - calculating a similarity measure using 1-pdist) . The same piece of code seems to work just fine on later versions of the data, but going back in time (when observations should be less similar) the 'NaN's start appearing. The script that I am using the calculate these measures is below. I have also attached one of the input csv files resulting in 'NaN's, as well the matrix in which everything is supposed to go (the first two columns are the pairs of observations, the third column is a successfully calculated similarity measure for a different time period, and the fourth column is supposed to be filled with the new similarity values).
Any help would be highly(!!) appreciated.
%read data
clearvars -except jaccard_dyadic
A = readmatrix('190529ML_lawyer_4');
%variables used to create matrix
args = A(:,1);
lawgs = A(:,2);
%matrix w/ ones if both args have had same law
arg_lawg = zeros(max(unique(A(:,1))), max(unique(A(:,2))));
empty_dim = size(A);
for i=1:empty_dim(1)
arg_lawg(args(i),lawgs(i))=1;
end
%loop calculating jaccard similarity measure (works on other iterations of data) filling in larger matrix
for i = 1:(find(jaccard_dyadic(:,1)==0, 1, 'first')-1)
jaccard_dyadic(i,4)=1-pdist2(arg_lawg(jaccard_dyadic(i,1),:),arg_lawg(jaccard_dyadic(i,2),:),'jaccard');
end
2 Comments
Walter Roberson
on 2 Jun 2019
>> find(isnan(jaccard_dyadic))
ans =
300007
300009
300011
300019
300022
300023
300027
300035
300041
300052
300063
300064
300068
300072
300074
Accepted Answer
Walter Roberson
on 3 Jun 2019
jaccard: One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ.
Now if there are no non-zero coordinates, then the number of differences is 0 and the number of coordinates is 0, so you are working with a 0/0
More Answers (0)
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!