Pdist gives NAN but there are no missing values in input array

Question

John Kirk on 2 Jun 2019

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/465190-pdist-gives-nan-but-there-are-no-missing-values-in-input-array

Commented: John Kirk on 3 Jun 2019

Hi Everyone,

I am using the pdist function to calculate pairwise distances (jaccard) between two observations. Now what happens sometimes, seemingly at random, is that instead of getting the distance, the value that is calculated gets shown as 'NaN'. The matrix that is taken as input, however, does not have any missing values. Rather it seems that the correct answer for these places should be a '0' (as in, they do not have anything in common - calculating a similarity measure using 1-pdist) . The same piece of code seems to work just fine on later versions of the data, but going back in time (when observations should be less similar) the 'NaN's start appearing. The script that I am using the calculate these measures is below. I have also attached one of the input csv files resulting in 'NaN's, as well the matrix in which everything is supposed to go (the first two columns are the pairs of observations, the third column is a successfully calculated similarity measure for a different time period, and the fourth column is supposed to be filled with the new similarity values).

Any help would be highly(!!) appreciated.

    %read data
    clearvars -except jaccard_dyadic
    A = readmatrix('190529ML_lawyer_4');
    
    %variables used to create matrix
    args = A(:,1);
    lawgs = A(:,2);
    %matrix w/ ones if both args have had same law
    arg_lawg = zeros(max(unique(A(:,1))), max(unique(A(:,2))));
    empty_dim = size(A);
    for i=1:empty_dim(1)
        arg_lawg(args(i),lawgs(i))=1;
    end
    %loop calculating jaccard similarity measure (works on other iterations of data) filling in larger matrix
    for i = 1:(find(jaccard_dyadic(:,1)==0, 1, 'first')-1)
            jaccard_dyadic(i,4)=1-pdist2(arg_lawg(jaccard_dyadic(i,1),:),arg_lawg(jaccard_dyadic(i,2),:),'jaccard');
    end

2 Comments
Show NoneHide None

Walter Roberson on 2 Jun 2019

>> find(isnan(jaccard_dyadic))

ans =

300007

300009

300011

300019

300022

300023

300027

300035

300041

300052

300063

300064

300068

300072

300074

John Kirk on 3 Jun 2019

Thanks for the response Walter. Have you been able to tell why these NaNs pop up to begin with? From my understanding, these should have been zeros, but I am now wondering whether I messed up somewhere.

Sign in to comment.

Sign in to answer this question.