Code covered by the BSD License  

Highlights from
Spardat2SSD

Be the first to rate this file! 0 Downloads (last 30 days) File Size: 8.99 KB File ID: #14367

Spardat2SSD

by Skynet

 

21 Mar 2007 (Updated 16 Apr 2007)

Convert a data file from spardat to SSD format.

| Watch this File

File Information
Description

spardat2ssd(FILEIN,FILEOUT,DATATYPE,DISPINTERVAL) converts the contents of the input file FILEIN to the output file FILEOUT. FILEIN is the name of a data file in Spardat format, particularly such as that used by SVM-Light. FILEOUT is the name of a data file in Simple Sparse Dataset (SSD) format, particularly such as that used by Auton Lab.

The argument DATATYPE is optional. Its value can be either Categorical or Real, with the default being Categorical. Categorical pertains to a data file which has attribute values of only 1. Real pertains to a data file which has real attribute values, e.g. -4, 3, 3.14, etc. If set to Categorical (default), the output data file will have two columns, with the first column representing the row number (starting from 0), and the second column representing the column number (with the class being column 1). If set to Real, the output data file will also have a third column - this represents the real numbered attribute value.

The argument DISPINTERVAL is also optional. This argument controls the frequency of display of the conversion status. Its default value is 100, which means that the status will display after processing every 100 lines from the input file. Irrespective of the value of this argument, the status will also display once the input file has been fully processed. If set to 0, the status will never display.

EXAMPLES:

spardat2ssd('spardat_categorical.sample.data','ssd_categorical.sample.csv','Categorical',2)

spardat2ssd('spardat_real.sample.data','ssd_real.sample.csv','Real',0)

spardat2ssd('spardat.data','ssd.data')

spardat2ssd('spardat.data','ssd.data','Real')

REMARKS:

The input file must contain cases that are only two-class. The class value must be represented in the first column of the input file. Positive classes must be represented as 1, and negative classes must be represented as either -1 or 0.

Lines beginning with the # character in the input file are ignored as comments. Additionally, anything after the # character in any line of the input file is also ignored as a comment.

At least at the time of writing this, Auton Lab's software products do not seem to support the SSD format output file containing real numbered attribute values. This output therefore might not have any practical use.

Cases that do not have any stated feature values are processed correctly.

Very limited testing of the source code has been done. Moreover, there is a lot of room to optimize it, especially for conciseness.

[Please subscribe to this file if you use it, so you can be notified of updates.]

MATLAB release MATLAB 7.3 (R2006b)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Please login to add a comment or rating.
Updates
23 Mar 2007

Added support for real numbered attribute values.

26 Mar 2007

(1) Set default value of argument DATATYPE to Categorical.
(2) Added option to never display status.
(3) Replaced : with squeeze function.

26 Mar 2007

Fixed minor errors in examples in online description.

16 Apr 2007

(1) Added a remark that empty cases are processed correctly.
(2) Removed a prohibitive validity check.

Tag Activity for this File
Tag Applied By Date/Time
data import Skynet 22 Oct 2008 09:05:27
data export Skynet 22 Oct 2008 09:05:27
conversion Skynet 22 Oct 2008 09:05:27
sparse Skynet 22 Oct 2008 09:05:27
spardat Skynet 22 Oct 2008 09:05:27
ssd Skynet 22 Oct 2008 09:05:27
svmlight Skynet 22 Oct 2008 09:05:27
auton lab Skynet 22 Oct 2008 09:05:27

Contact us at files@mathworks.com