how can I do a code that compares each document one by one and get all the information I want?

5 views (last 30 days)
flashpode
flashpode on 29 Dec 2021
Commented: flashpode on 31 Dec 2021
Hi, I've got a code that compares two different text documents and then makes me a figure with geoscatter. Then I get the numbers from the variables that I want and save the figure. As its a big amount of documents I would like to know if there is any way to programm a code that imports me the documents from each folder that I want and then runs my code, saves the numbers I want and saves the image I know It sounds A little bit complicated but I am gonna post the code and explain from where it comes each thing I mentioned.
Firstly I have two different folders AIS 1 and AIS 2 in each folder we can find different folders for different days and in each day we have 24 text documents that has the following names:
2021033020AIS
2021033021AIS
2021033022AIS
2021033023AIS
As you can see there are just canging the las 2 digits that go from 00 to 23. The data has to be in strings in order to be compared.
My first problem is to import one document from each folder that ends the same way as in the other folder. Then once is imported this code is gonna run:
%% PRIMERA PARTE
AIS1(strlength(AIS1) < 15) = [];
AIS2(strlength(AIS2) < 15) = [];
% AIS1 = unique(AIS1);
% AIS2 = unique(AIS2);
N=size(AIS1,1); %% Importante detras que sino daba error el codigo
msg_AIS1 = regexp(AIS1, '.*(?=\d{4}$)', 'match', 'once'); % todo el mensaje menos las ultimas 4 cifras
msg_AIS2 = regexp(AIS2, '.*(?=\d{4}$)', 'match', 'once');
t1 = regexp(AIS1, '\d{4}$', 'match', 'once'); % sacar ultimas 4 cifras
t2 = regexp(AIS2, '\d{4}$', 'match', 'once');
Time_AIS1 = duration(strcat('00:',extractBefore(t1,3),':',extractAfter(t1,2))); % Poner en formato hh:mm:ss
Time_AIS1 = Time_AIS1+hours(cumsum([0;diff(Time_AIS1)<0])); %añadir una unidad en hh cada vez que se reinicia mm:ss
Time_AIS2 = duration(strcat('00:',extractBefore(t2,3),':',extractAfter(t2,2)));
Time_AIS2 = Time_AIS2+hours(cumsum([0;diff(Time_AIS2)<0]));
mask1 = ismissing(msg_AIS1) | ismissing(Time_AIS1);
mask2 = ismissing(msg_AIS2) | ismissing(Time_AIS2);
origi_AIS1 = (1:length(msg_AIS1));
origi_AIS2 = (1:length(msg_AIS2));
msg_AIS1(mask1) = []; Time_AIS1(mask1) = []; origi_AIS1(mask1) = [];
msg_AIS2(mask2) = []; Time_AIS2(mask2) = []; origi_AIS2(mask2) = [];
[H1, M1, S1] = hms(Time_AIS1); % Dar tiempo en 2 variables, utilizar M1 para crear rangos
[H2, M2, S2] = hms(Time_AIS2);
msg_match = cell(N, 1);
complete_match_AIS = [];
%% ENTRAR EN EL LOOP DE COMPARACION
for K = 1:1:N
all_match_AIS = find(msg_AIS1(K) == msg_AIS2); % encontrar mensajes iguales
if isempty(all_match_AIS) %fprintf para escribir datos en un archivo de texto
% fprintf('No hay coincidencias para la linia #%d -> "%s"\n', origi_AIS1(K), msg_AIS1(K)); % '%s' para un string
continue;
end
% fprintf('potencial coincidencia #%d -> "%s", checking times\n', origi_AIS1(K), msg_AIS1(K));
% disp(K), disp(all_match_AIS)
if H1(K)== H2(all_match_AIS)
% crear rango de coincidencia de minutos
complete_match_AIS = all_match_AIS(M1(K) == M2(all_match_AIS) | M1(K) == M2(all_match_AIS) - 1 | M1(K) == M2(all_match_AIS) + 1);% Rango creado +-1 minuto de cada mensaje
msg_match{K} = msg_AIS1(complete_match_AIS);
Time_msg_match{K} = complete_match_AIS;
end
if isempty(complete_match_AIS)
% fprintf('line %#d -> "%s" coincide texto pero no tiempo\n', origi_AIS1(K), msg_AIS1(K));
else
% fprintf ('line %#d -> "%s" coincide tambien el tiempo. Los resultados son:\n', origi_AIS1(K), msg_AIS1(K));
msg_AIS2(complete_match_AIS) %IMPORTANTE
end
end
%% AL ACABAR EL LOOP QUITAR CELDAS VACIAS Y CAMBIAR DATA TYPE DE LAS VARIABLES QUE QUEREMOS
%# encontrar celdas vacias (creacion de la variable)
emptyCells = cellfun(@isempty,msg_match);
emptyCells2 = cellfun(@isempty,Time_msg_match);
%# quitar las celdas vacias
msg_match(emptyCells) = [];
Time_msg_match(emptyCells2) = [];
% tenemos la posicion de los mensajes en data type cell. tenemos que
% pasarlo en un formato que nos deje indexarlo.
Time_msg_match = Time_msg_match';
% Quitar los strings de dentro de la cell (cat)--> para concadenar
Matching_msg = cellstr(cat(1, msg_match{:}));
Matching_msg = string(Matching_msg);
% QUITAR los double dentro de las cells de Time_msg_match2
numCells = numel(Time_msg_match);
Time_msg_match2 = zeros(numCells+10000, 1);
vector2Index = 1;
for k = 1 : numCells
len = length(Time_msg_match{k});
if len == 1
Time_msg_match2(vector2Index) = Time_msg_match{k};
vector2Index = vector2Index + 1;
else
fprintf('Row %d has %d elements in it.\n', k, length(Time_msg_match{k}));
for k2 = 1 : len
thisVector = Time_msg_match{k};
Time_msg_match2(vector2Index) = thisVector(k2);
vector2Index = vector2Index + 1;
end
end
end
if vector2Index < numCells
Time_msg_match2 = Time_msg_match2(1 : vector2Index - 1);
end
fprintf('Original Time_msg_match had %d rows.\n', numCells) %Nos dice de que numero a que numero pasamos al quitar las celdas vacias
fprintf('Afterwards Time_msg_match had %d rows.\n', numel(Time_msg_match2))
% quitar los ceros restantes
Time_msg_match2(Time_msg_match2 == 0) =[];
% creamos unas variables de AIS1 y AIS2 que nos dan los valores de cada
% linea. Para asi utilizar el setdiff y coger los la posicion de los
% mensajes para mirar el rango de horas que nos interesa.
P = length(msg_AIS2);
AIS2_MSG = (1:P); %s'ga de crear sa variable P
AIS2_MSG = AIS2_MSG';
AIS1_MSG = (1:N);
AIS1_MSG = AIS1_MSG';
% COGEREMOS AHORA LOS MENSAJES REALES QUE NO ESTAN EN LA COMPARACION
% IMPORTANTE AHORA : cogeremos los mensajes no repetido y luego la hora de
% estos.
NoMatchAIS1 = setdiff(AIS1_MSG,Time_msg_match2);
NoMatchAIS2 = setdiff(AIS2_MSG,Time_msg_match2);
% PODEM OBSERVAR QUE LA NO COINCIDENCIA DELS MISSATGES NO DEPEN DE LA HORA
% DEL DIA A LA QUE ENS TROBAM. LLAVORS A QUE?
% Hacemos variable de tiempo para los mensajes no repetidos de ambos AIS!!!
Time_NoMatchAIS1 = Time_AIS1(NoMatchAIS1);
Time_NoMatchAIS2 = Time_AIS2(NoMatchAIS2);
% CREAMOS LAS VARIABLES DE LOS MENSAJES NO REPETIDOS DE CADA AIS
msg_NoMatchAIS1 = msg_AIS1(NoMatchAIS1);
msg_NoMatchAIS2 = msg_AIS2(NoMatchAIS2);
disp 'Ya ha acabado de comparar'
L = size(msg_NoMatchAIS1,1);
J = size(msg_NoMatchAIS2,1);
% Para visualizar los barcos cada hora de los dos AIS
lat1 = [];
lon1 = [];
for i=1:1:L
seq1 = msg_NoMatchAIS1(i);
linia=convertStringsToChars(seq1);
if linia(13)=='A' && linia(15)=='1'
sequencia = ais_to_bit(linia(15:44));
s_longitud=sequencia(62:89);
longitud = bin2dec(num2str(s_longitud))/600000; % en graus
lon1 = [lon1, longitud];
s_latitud=sequencia(90:116);
latitud = bin2dec(num2str(s_latitud))/600000; % en graus
lat1 = [lat1, latitud];
end
end
lat2 = [];
lon2 = [];
for j=1:1:J
seq2 = msg_NoMatchAIS2(j);
linia=convertStringsToChars(seq2);
if linia(13)=='A' && linia(15)=='1'
sequencia = ais_to_bit(linia(15:44));
s_longitud = sequencia(62:89);
longitud = bin2dec(num2str(s_longitud))/600000; % en graus
lon2 = [lon2, longitud];
s_latitud = sequencia(90:116);
latitud = bin2dec(num2str(s_latitud))/600000; % en graus
lat2 = [lat2, latitud];
end
end
figure(1)
geoscatter(lat1, lon1)
hold on
geoscatter(lat2,lon2,'filled')
legend('AIS1','AIS2')
hold off
X = sprintf('Mensajes AIS1 %.f no Match %.f y mensajes AIS2 %.f no Match %.f \n',K,L,P,J);
disp(X)
As you can see compares the two documents and then makes a figure. How could I make to save this figure and in the end How could I save the following numbers from the variables K, L, P, J.
I am wasting a lot of time because I do not know how to do it so If anybody know just let me know Thank You In advance
  12 Comments

Sign in to comment.

Accepted Answer

Voss
Voss on 30 Dec 2021
Based on a comment you made on a another related question (https://www.mathworks.com/matlabcentral/answers/1453769-ismember-and-ways-to-implement-it#comment_1737439), I gather that the variables AIS1 and AIS2 are string arrays with each string element corresponding to one line of their respective input .txt files. (I can't say for certain because I don't have access to those variables, just the text files.) Based on that working assumption, I put together a function that will "import" these files, that is, read the file and return a string array for use in your code above.
The function (import_AIS) is defined at the bottom of the script below. In calling it, you would replace fn1 and fn2 with the full paths to the two files you want to compare. Then use what it returns as AIS1 and AIS2 in your code above. Wrap the whole thing in a loop if you want to go through 2 directories and compare each pair of same-named files.
% Specify the two files (using two you attached previously for demonstration):
fn1 = fullfile(pwd(),'2021030102AIS.txt');
fn2 = fullfile(pwd(),'2021030102AIS (2).txt');
% call import_AIS with the two file names:
AIS1 = import_AIS(fn1);
AIS2 = import_AIS(fn2);
% to demonstrate what the function returned:
display(size(AIS1)); display(AIS1([1 end])); display(size(AIS2)); display(AIS2([1 end]));
1 18973 1×2 string array "!AIVDM,1,1,,A,H33mw2Q>uV0luHTpN3800000000,2*080000" "!AIVDM,1,1,,A,13F:b60P0909tF8GbLMbH?wl0@Ra,0*0F5959" 1 13091 1×2 string array "!AIVDM,1,1,,A,H33mw2Q>uV0luHTpN3800000000,2*080000" "!AIVDM,1,1,,B,33=Orb1000P:0:tGa?779Qon0000,0*225959"
function AIS = import_AIS(fn)
fid = fopen(fn);
if fid == -1
AIS = string();
return
end
data = fread(fid,'*char');
fclose(fid);
AIS = string(strsplit(data.',newline()));
if strlength(AIS(end)) == 0
AIS(end) = [];
end
end
  19 Comments

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!