====== NMC Tcl Extension ====== n-Depth Mean Compare is an [[n-Depth_Mean_Compare|algorithm]] and Tcl extension for gauging the similarity of two blocks of binary data of potentially differing lengths. Given a depth (some integer greater than 0) and two blocks of data, a floating point number is returned indicating the similarity of the two blocks. The closer to zero, the more similar the two blocks are deemed to be. If a value of 0 is returned, the blocks are deemed identical. ====== Download ====== * {{ :nmc:nmc-tcl-0.2.src.tar.xz |}} ====== Building and Installing ====== Running "make" will build and test the extension. You can install with "make install". The default prefix is /usr and the default lib path is ${PREFIX}/lib. You can override either of these like so: make install PREFIX=$HOME ====== Walkthrough of Comparisons ====== Let's walk through some comparisons using nmc. When using the Tcl extension, an n-depth of 0 means 'use the largest valid value for n'. % # Load the package and import the nmc command % package require nmc 0.2 % namespace import nmc::nmc % # Simple equality test of identical strings % nmc 0 asdf asdf 0.000000000000000000 % # zero means identical. % # If we compare slightly different strings of the same length: % nmc 0 asdd asdf 0.894427190999915855 % # We get a value close to zero. How about a string that is a substring of the other? % nmc 0 asdff asdf 0.447213595499957927 % # Also note that comparisons are commutative, like you'd expect: % nmc 0 asdf asdff 0.447213595499957927 % # 'b' is closer to 'a' than 'z' is: % nmc 0 asda asdb 0.447213595499957927 % nmc 0 asda asdz 11.18033988749894902 % # 'asdf' and 'asdfasdf' are more dissimilar than 'asdfasdf' and 'asdfasdfasdf' % nmc 0 asdf asdfasdf 8.124038404635960830 % nmc 0 asdfasdf asdfasdfasdf 1.424000624219588395 % # The tcl extension can take a percent as its n argument, which will then be percentage of the maximum valid n. % # Get a couple of large, equal sized random strings % set fh [open /dev/urandom r] % set rand1 [read $fh [expr 0x1000]]; set rand2 [read $fh [expr 0x1000]]; close $fh % # Depending on use case, we may wish to see purely random data as similar. % # Percent-n is useful for this. Comparison using max-valid n: % nmc 0 $rand1 $rand2 69.20999364325710700 % nmc 1% $rand1 $rand2 7.700022818067388641 ====== Short Comings ====== The extension requires all the strings to be loaded into memory, thus really large files cannot be compared. A sloppy work around, would be breaking the files into chunks and comparing the chunks. Eventually, an nmc for file streams (C-based) and channels (tcl-based) will be implemented. Floating point math is used and, as you probably know, you cannot expect 100% consistent results between architectures. It is for making comparisons on a single machine, not between different devices. An NMC value on one machine is not useful to compare against an NMC value of the same data on another machine. ====== Licensing ====== All code included is released to the public domain, so long as the original author is credited.