Finding the Similar Entries: A Quantitative Approach based on CPU Runtime Behavior

Entry to Matlab contest Spring 2009

C Jethro Lam

Version 1.2.0.0 (297 KB)

1.3K Downloads

(5)

8 Apr 2009

Download

Open in MATLAB Online

Download

Open in MATLAB Online

In this work, we are interested at the following questions:

1. How do we measure the similarity between two codes? (existence of similarity)

2. How do we identify entries that are similar to each other? (similarity with others)

3. How do the entries by one author evolve over time? (similarity with self)

In order to define 'similarity', one must first define a measure for 'difference'. Some intuitive methods suggest comparing the number of characters, comparing the number of nodes, or observing the function or variable names. Apparently, these methods can be beaten by some simple code obfuscation.

In this work, we introduce a measure of code similarity that is relatively immune to code obfuscation. The proposed approach is based on the algorithmic performance of the code. When a code is written, it consists of many operational statements(a=b+c), branching statements(if then else), memory allocation statements(zeros(100,1)), etc, that appear in a unique order characterized by the coding style of the author. When the code is executed, each statement takes up a certain amount of CPU runtime. If we measure and record the variation of CPU runtime across the lines of statements in the code, we can obtain a signature of the code that is unique to each author given that the code is sufficiently complicated. By correlating the signatures, we can provide a quantitative measurement of the similarity of the codes.

Cite As

C Jethro Lam (2026). Finding the Similar Entries: A Quantitative Approach based on CPU Runtime Behavior (https://www.mathworks.com/matlabcentral/fileexchange/23594-finding-the-similar-entries-a-quantitative-approach-based-on-cpu-runtime-behavior), MATLAB Central File Exchange. Retrieved June 5, 2026.

Acknowledgements

Inspired by: bsxfun, MATLAB Contest - Data Visualization, MATLAB Contest Statistics

General Information

Version 1.2.0.0 (297 KB)
No License

MATLAB Release Compatibility

Compatible with any release

Platform Compatibility

Windows
macOS
Linux

Open in new tab

Version	Published	Release Notes	Action
1.2.0.0	8 Apr 2009	I did not change the m files that I submitted. I only added acknowledge to the front info page.	Download
1.0.0.0	8 Apr 2009		Download

Finding the Similar Entries: A Quantitative Approach based on CPU Runtime Behavior

Cite As

Acknowledgements

Categories

Tags

General Information

Requires

MATLAB Release Compatibility

Platform Compatibility