The Information-based Similarity (IBS) method was developed to effectively categorize symbolic sequences according to their information content. The method has been fully described and validated (4), with applications to heart rate time series (1), literary authorship disputes (2), and genetic sequences (3).
This toolbox provides an array of MATLAB functions for quantifying the distance (or dis-similarity) between a set of symbolic sequences, and for displaying the results in graphical form such as dendrogram. The type of symbolic sequences can be binary sequences mapping from a time series, written texts of any given language, or genetic sequences.
1. Yang AC, Hseu SS, Yien HW, Goldberger AL, Peng CK. Linguistic analysis of human heartbeats using frequency and rank order statistics. Phys. Rev. Lett. 90, 108103 (2003).
2. Yang AC, Peng CK, Yien HW, Goldberger AL. Information categorization approach to literary authorship disputes. Physica A, 329, 473 (2003).
3. Yang AC, Goldberger AL, Peng CK. Genomic classification using an information-based similarity index : application to the SARS coronavirus. J Comput Biol. 12(8):1103-16 (2005).
4. Peng CK, Yang AC, Goldberger AL. Statistical physics approach to categorize biologic signals: from heart rate dynamics to DNA sequences. Chaos 17: 015115 (2007).
Update general information
Inspired: Read Unicode Files