 — nmc-tcl [2017/05/13 19:13] (current)zashi created 2017/05/13 19:13 zashi created 2017/05/13 19:13 zashi created Line 1: Line 1: + ====== NMC Tcl Extension ====== + + n-Depth Mean Compare is an [[n-Depth_Mean_Compare|algorithm]] and Tcl extension for gauging the + similarity of two blocks of binary data of potentially differing lengths. + Given a depth (some integer greater than 0) and two blocks of data, a + floating point number is returned indicating the similarity of the two + blocks. The closer to zero, the more similar the two blocks are deemed to + be. If a value of 0 is returned, the blocks are deemed identical. + + ====== Download ====== + + * {{ :​nmc:​nmc-tcl-0.2.src.tar.xz |}} + + + ====== Building and Installing ====== + + Running "​make"​ will build and test the extension. You can install + with "make install"​. The default prefix is /usr and the default lib + path is \${PREFIX}/​lib. You can override either of these like so: + + make install PREFIX=\$HOME + + ====== Walkthrough of Comparisons ====== + + Let's walk through some comparisons using nmc. + + When using the Tcl extension, an n-depth of 0 means 'use the largest valid + value for n'. + + + % # Load the package and import the nmc command + % package require nmc + 0.2 + % namespace import nmc::nmc + % # Simple equality test of identical strings + % nmc 0 asdf asdf + 0.000000000000000000 + % # zero means identical. + % # If we compare slightly different strings of the same length: + % nmc 0 asdd asdf + 0.894427190999915855 + % # We get a value close to zero. How about a string that is a substring of the other? + % nmc 0 asdff asdf + 0.447213595499957927 + % # Also note that comparisons are commutative,​ like you'd expect: + % nmc 0 asdf asdff + 0.447213595499957927 + % # '​b'​ is closer to '​a'​ than '​z'​ is: + % nmc 0 asda asdb + 0.447213595499957927 + % nmc 0 asda asdz + 11.18033988749894902 + % # '​asdf'​ and '​asdfasdf'​ are more dissimilar than '​asdfasdf'​ and '​asdfasdfasdf'​ + % nmc 0 asdf asdfasdf + 8.124038404635960830 + % nmc 0 asdfasdf asdfasdfasdf + 1.424000624219588395 + % # The tcl extension can take a percent as its n argument, which will then be percentage of the maximum valid n. + % # Get a couple of large, equal sized random strings + % set fh [open /​dev/​urandom r] + % set rand1 [read \$fh [expr 0x1000]]; set rand2 [read \$fh [expr 0x1000]]; close \$fh + % # Depending on use case, we may wish to see purely random data as similar. + % # Percent-n is useful for this. Comparison using max-valid n: + % nmc 0 \$rand1 \$rand2 + 69.20999364325710700 + % nmc 1% \$rand1 \$rand2 + 7.700022818067388641 + ​ + + ====== Short Comings ====== + + The extension requires all the strings to be loaded into memory, thus + really large files cannot be compared. A sloppy work around, would be breaking + the files into chunks and comparing the chunks. Eventually, an nmc for file + streams (C-based) and channels (tcl-based) will be implemented. + + Floating point math is used and, as you probably know, you cannot expect + 100% consistent results between architectures. It is for making comparisons + on a single machine, not between different devices. An NMC value on one machine + is not useful to compare against an NMC value of the same data on + another machine. + + ====== Licensing ====== + + All code included is released to the public domain, so long as the original + author is credited.