import csv file and use the data in Matlab

I have a problem with one huge CSV file which I want to just have the import information as some variable for make more calculation. here I do not know how to get ride of quotations mark and other symbol after importing the csv file to matlab
T = readtable('C:\Users\mabah\Downloads\ip83_to_ip37_clean.csv')
I import csv file like this and the table is like :
'1578' '81.866260614' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4857'
'1579' '81.875650007' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4865'
'1580' '81.885960309' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4866'
'1581' '81.896168537' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4876'
'1582' '81.906376207' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4885'
..............
..................

5 Comments

Use the detectImportOptions function to preprocess the file and create an import options object for readtable. Then use it when reading the file.
It should be able to detect the numeric values from the non; the IP addresses will have to be read as string or perhaps categorical variables; they aren't valid numeric representations.
It would be most helpful if you would attach the first 20 records or so of the file; be certain to include any header info if there is such.
thank you for your help. I stock with syntax . I do not need '....' I want just the data to make more calculation on it.
here is the first 20 rows :
'1578' '81.866260614' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4857'
'1579' '81.875650007' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4865'
'1580' '81.885960309' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4866'
'1581' '81.896168537' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4876'
'1582' '81.906376207' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4885'
'1583' '81.916706046' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4893'
'1584' '81.927024970' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4897'
'1585' '81.937320618' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4903'
'1586' '81.947794823' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4912'
'1587' '81.957728564' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4920'
'1588' '81.967817205' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4928'
'1589' '81.978317914' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4932'
'1590' '81.988481605' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4938'
'1591' '81.998799386' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4939'
'1592' '82.012383323' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4942'
'1593' '82.020065820' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4946'
'1594' '82.029564626' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4956'
'1595' '82.039907112' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4959'
'1596' '82.050066959' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4964'
'1597' '82.060212786' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4965'
The readtable function is typically smart enough to import and column of numbers as a numeric type. My best guess, without having seen the input file, is that you have some entries (even in the numeric columns) that are text. Therefore, MATLAB makes the whole column text.
We could take a look, if you upload the file, or a representative sample. (If you upload a sample, see if you get the same issue on it.)
I uplaod the file.
what I would really like to do is to use the diffrent column for the further calculation, especially on the IPFlags (column 12) and also time . regarthing the IPFlag, if the 0xc000 is IP Flag count it as 0 and if the IP Flag is 0x4000 count it as one and after that convert all the result to ASCII code.

Sign in to comment.

 Accepted Answer

dpb
dpb on 1 Dec 2019
Edited: dpb on 1 Dec 2019
The problem is the file has explicit "" for string variables built into it...that overrides what readtable would otherwise do for numeric columns--it presumes since you wrote the file as character data explicitly, that's what you meant.
Either fix the file format to write numeric data as numeric or force the interpretation of the numeric columns as numeric via the import options object as suggested earlier...
opt=detectImportOptions('ip83_to_ip37_clean.csv'); % get default import object
opt.VariableTypes([1:2 6:11 14:15])={'double'}; % tell it which variables are numeric
opt.VariableTypes([3:5 12:13])={'categorical'}; % treat protocol data as categorical
t=readtable('ip83_to_ip37_clean.csv',opt);
results in
>> t(1:10,:)
ans =
10×15 table
No_ Time Source Destination IPProtocol IPTTL Length SrcPort DestPort TCPFlags IPVersion IPFlags IPDSCPValue IPFragOffset IPID
____ ______ _____________ _____________ __________ _____ ______ _______ ________ ________ _________ _______ ___________ ____________ ____
1578 81.866 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4857
1579 81.876 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4865
1580 81.886 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4866
1581 81.896 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4876
1582 81.906 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4885
1583 81.917 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4893
1584 81.927 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4897
1585 81.937 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4903
1586 81.948 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4912
1587 81.958 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4920
>>
As for the IPFlags,
>> categories(t.IPFlags)
ans =
2×1 cell array
{'0x4000'}
{'0xc000'}
>>
shows only the two values exist so I'd probably create a logical variable as you indicated...
>> t.IPFlag=(t.IPFlags=='0x4000');

More Answers (0)

Categories

Tags

Asked:

on 1 Dec 2019

Edited:

dpb
on 1 Dec 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!