# NMC Tcl Extension

n-Depth Mean Compare is an algorithm and Tcl extension for gauging the similarity of two blocks of binary data of potentially differing lengths. Given a depth (some integer greater than 0) and two blocks of data, a floating point number is returned indicating the similarity of the two blocks. The closer to zero, the more similar the two blocks are deemed to be. If a value of 0 is returned, the blocks are deemed identical.

# Building and Installing

Running “make” will build and test the extension. You can install with “make install”. The default prefix is /usr and the default lib path is \${PREFIX}/lib. You can override either of these like so:

make install PREFIX=\$HOME

# Walkthrough of Comparisons

Let's walk through some comparisons using nmc.

When using the Tcl extension, an n-depth of 0 means 'use the largest valid value for n'.

```% # Load the package and import the nmc command
% package require nmc
0.2
% namespace import nmc::nmc
% # Simple equality test of identical strings
% nmc 0 asdf asdf
0.000000000000000000
% # zero means identical.
% # If we compare slightly different strings of the same length:
% nmc 0 asdd asdf
0.894427190999915855
% # We get a value close to zero. How about a string that is a substring of the other?
% nmc 0 asdff asdf
0.447213595499957927
% # Also note that comparisons are commutative, like you'd expect:
% nmc 0 asdf asdff
0.447213595499957927
% # 'b' is closer to 'a' than 'z' is:
% nmc 0 asda asdb
0.447213595499957927
% nmc 0 asda asdz
11.18033988749894902
% # 'asdf' and 'asdfasdf' are more dissimilar than 'asdfasdf' and 'asdfasdfasdf'
% nmc 0 asdf asdfasdf
8.124038404635960830
% nmc 0 asdfasdf asdfasdfasdf
1.424000624219588395
% # The tcl extension can take a percent as its n argument, which will then be percentage of the maximum valid n.
% # Get a couple of large, equal sized random strings
% set fh [open /dev/urandom r]
% set rand1 [read \$fh [expr 0x1000]]; set rand2 [read \$fh [expr 0x1000]]; close \$fh
% # Depending on use case, we may wish to see purely random data as similar.
% # Percent-n is useful for this. Comparison using max-valid n:
% nmc 0 \$rand1 \$rand2
69.20999364325710700
% nmc 1% \$rand1 \$rand2
7.700022818067388641```

# Short Comings

The extension requires all the strings to be loaded into memory, thus really large files cannot be compared. A sloppy work around, would be breaking the files into chunks and comparing the chunks. Eventually, an nmc for file streams (C-based) and channels (tcl-based) will be implemented.

Floating point math is used and, as you probably know, you cannot expect 100% consistent results between architectures. It is for making comparisons on a single machine, not between different devices. An NMC value on one machine is not useful to compare against an NMC value of the same data on another machine.

# Licensing

All code included is released to the public domain, so long as the original author is credited.