function sD = som_normalize(sD,method,comps)
%SOM_NORMALIZE (Re)normalize data or add new normalizations.
%
% sS = som_normalize(sS,[method],[comps])
%
% sS = som_normalize(sD)
% sS = som_normalize(sS,sNorm)
% D = som_normalize(D,'var')
% sS = som_normalize(sS,'histC',[1:3 10])
%
% Input and output arguments ([]'s are optional):
% sS The data to which the normalization is applied.
% The modified and updated data is returned.
% (struct) data or map struct
% (matrix) data matrix (a matrix is also returned)
% [method] The normalization method(s) to add/use. If missing,
% or an empty variable ('') is given, the
% normalizations in sS are used.
% (string) identifier for a normalization method to be added:
% 'var', 'range', 'log', 'logistic', 'histD' or 'histC'.
% (struct) Normalization struct, or an array of such.
% Alternatively, a map/data struct can be given
% in which case its '.comp_norm' field is used
% (see below).
% (cell array) Of normalization structs. Typically, the
% '.comp_norm' field of a map/data struct. The
% length of the array must be equal to data dimension.
% (cellstr array) norm and denorm operations in a cellstr array
% which are evaluated with EVAL command with variable
% name 'x' reserved for the variable.
% [comps] (vector) the components to which the normalization is
% applied, default is [1:dim] ie. all components
%
% For more help, try 'type som_normalize' or check out online documentation.
% See also SOM_DENORMALIZE, SOM_NORM_VARIABLE, SOM_INFO.
%%%%%%%%%%%%% DETAILED DESCRIPTION %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% som_normalize
%
% PURPOSE
%
% Add/apply/redo normalization on data structs/sets.
%
% SYNTAX
%
% sS = som_normalize(sS)
% sS = som_normalize(sS,method)
% D = som_normalize(D,sNorm)
% sS = som_normalize(sS,csNorm)
% sS = som_normalize(...,comps)
%
% DESCRIPTION
%
% This function is used to (initialize and) add, redo and apply
% normalizations on data/map structs/sets. If a data/map struct is given,
% the specified normalizations are added to the '.comp_norm' field of the
% struct after ensuring that all normalizations specified therein have
% status 'done'. SOM_NORMALIZE actually uses function SOM_NORM_VARIABLE
% to handle the normalization operations, and only handles the data
% struct/set specific stuff itself.
%
% The different normalization methods are listed below. For more
% detailed descriptions, see SOM_NORM_VARIABLE.
%
% method description
% 'var' Variance is normalized to one (linear operation).
% 'range' Values are normalized between [0,1] (linear operation).
% 'log' Natural logarithm is applied to the values:
% xnew = log(x-m+1)
% where m = min(x).
% 'logistic' Logistic or softmax trasformation which scales all
% possible values between [0,1].
% 'histD' Histogram equalization, values scaled between [0,1].
% 'histC' Approximate histogram equalization with partially
% linear operations. Values scaled between [0,1].
% 'eval' freeform operations
%
% To enable undoing and applying the exactly same normalization to
% other data sets, normalization information is saved into a
% normalization struct, which has the fields:
%
% .type ; struct type, ='som_norm'
% .method ; normalization method, a string
% .params ; normalization parameters
% .status ; string: 'uninit', 'undone' or 'done'
%
% Normalizations are always one-variable operations. In the data and map
% structs the normalization information for each component is saved in the
% '.comp_norm' field, which is a cell array of length dim. Each cell
% contains normalizations for one vector component in a struct array of
% normalization structs. Each component may have different amounts of
% different kinds of normalizations. Typically, all normalizations are
% either 'undone' or 'done', but in special situations this may not be the
% case. The easiest way to check out the status of the normalizations is to
% use function SOM_INFO, e.g. som_info(sS,3)
%
% REQUIRED INPUT ARGUMENTS
%
% sS The data to which the normalization is applied.
% (struct) Data or map struct. Before adding any new
% normalizations, it is ensured that the
% normalizations for the specified components in the
% '.comp_norm' field have status 'done'.
% (matrix) data matrix
%
% OPTIONAL INPUT ARGUMENTS
%
% method The normalization(s) to add/use. If missing,
% or an empty variable ('' or []) is given, the
% normalizations in the data struct are used.
% (string) Identifier for a normalization method to be added:
% 'var', 'range', 'log', 'logistic', 'histD' or 'histC'. The
% same method is applied to all specified components
% (given in comps). The normalizations are first
% initialized (for each component separately, of
% course) and then applied.
% (struct) Normalization struct, or an array of structs, which
% is applied to all specified components. If the
% '.status' field of the struct(s) is 'uninit',
% the normalization(s) is initialized first.
% Alternatively, the struct may be map or data struct
% in which case its '.comp_norm' field is used
% (see the cell array option below).
% (cell array) In practice, the '.comp_norm' field of
% a data/map struct. The length of the array
% must be equal to the dimension of the given
% data set (sS). Each cell contains the
% normalization(s) for one component. Only the
% normalizations listed in comps argument are
% applied though.
% (cellstr array) norm and denorm operations in a cellstr array
% which are evaluated with EVAL command with variable
% name 'x' reserved for the variable.
%
% comps (vector) The components to which the normalization(s) is
% applied. Default is to apply to all components.
%
% OUTPUT ARGUMENTS
%
% sS Modified and/or updated data.
% (struct) If a struct was given as input argument, the
% same struct is returned with normalized data and
% updated '.comp_norm' fields.
% (matrix) If a matrix was given as input argument, the
% normalized data matrix is returned.
%
% EXAMPLES
%
% To add (initialize and apply) a normalization to a data struct:
%
% sS = som_normalize(sS,'var');
%
% This uses 'var'-method to all components. To add a method only to
% a few selected components, use the comps argument:
%
% sS = som_normalize(sS,'log',[1 3:5]);
%
% To ensure that all normalization operations have indeed been done:
%
% sS = som_normalize(sS);
%
% The same for only a few components:
%
% sS = som_normalize(sS,'',[1 3:5]);
%
% To apply the normalizations of a data struct sS to a new data set D:
%
% D = som_normalize(D,sS);
% or
% D = som_normalize(D,sS.comp_norm);
%
% To normalize a data set:
%
% D = som_normalize(D,'histD');
%
% Note that in this case the normalization information is lost.
%
% To check out the status of normalization in a struct use SOM_INFO:
%
% som_info(sS,3)
%
%
% SEE ALSO
%
% som_denormalize Undo normalizations of a data struct/set.
% som_norm_variable Normalization operations for a set of scalar values.
% som_info User-friendly information of SOM Toolbox structs.
% Copyright (c) 1998-2000 by the SOM toolbox programming team.
% http://www.cis.hut.fi/projects/somtoolbox/
% Version 2.0beta juuso 151199 150500
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% check arguments
error(nargchk(1, 3, nargin)); % check no. of input arguments is correct
% sD
struct_mode = isstruct(sD);
if struct_mode,
switch sD.type
case 'som_map', D = sD.codebook;
case 'som_data', D = sD.data;
otherwise, error('Illegal struct.')
end
else
D = sD;
end
[dlen dim] = size(D);
% comps
if nargin<3 | (ischar(comps) & strcmp(comps,'all')),
comps = [1:dim];
end
if isempty(comps), return; end
if size(comps,1)>1, comps = comps'; end % make it a row vector
% method
csNorm = cell(dim,1);
if nargin<2 | isempty(method),
if ~struct_mode,
warning('No normalization method given. Data left unchanged.');
return;
end
method = '';
else
% check out the given method
% (and if necessary, copy it for each specified component)
if ischar(method),
switch method,
case {'var','range','log','histD','histC','logistic'},
sN = som_set('som_norm','method',method);
otherwise,
error(['Unrecognized method: ' method]);
end
for i=comps, csNorm{i} = sN; end
elseif isstruct(method),
switch method(1).type,
case {'som_map','som_data'}, csNorm = method(1).comp_norm;
case {'som_norm'}, for i=comps, csNorm{i} = method; end
otherwise,
error('Invalid struct given as normalization method.')
end
elseif iscellstr(method),
[dummy,sN] = som_norm_variable(1,method,'init');
for i=comps, csNorm{i} = sN; end
elseif iscell(method),
csNorm = method;
else
error('Illegal method argument.')
end
% check the size of csNorm is the same as data dimension
if length(csNorm) ~= dim,
error('Given number of normalizations does not match data dimension.')
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% initialize
% make sure all the current normalizations for current
% components have been done
if struct_mode,
alldone = 1;
for i = comps,
for j=1:length(sD.comp_norm{i}),
sN = sD.comp_norm{i}(j);
if ~strcmp(sN.status,'done'),
alldone = 0;
[x,sN] = som_norm_variable(D(:,i), sN, 'do');
D(:,i) = x;
sD.comp_norm{i}(j) = sN;
end
end
end
if isempty(method),
if alldone,
warning('No ''undone'' normalizations found. Data left unchanged.');
else
fprintf(1,'Normalizations have been redone.\n');
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% action
% add the new normalizations to the old ones
for i = comps,
if ~isempty(csNorm{i}),
[x,sN] = som_norm_variable(D(:,i), csNorm{i}, 'do');
D(:,i) = x;
if struct_mode,
if isempty(sD.comp_norm{i}), sD.comp_norm{i} = sN;
else sD.comp_norm{i} = [sD.comp_norm{i}, sN]; end
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% output
if struct_mode,
switch sD.type
case 'som_map', sD.codebook = D;
case 'som_data', sD.data = D;
otherwise, error('Illegal struct.')
end
else
sD = D;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%