I THINK ∴ I'M DANGEROUS

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nmc-tcl [2017/05/13 19:13] (current)
zashi created
Line 1: Line 1:
 +====== NMC Tcl Extension ======
 +
 +n-Depth Mean Compare is an [[n-Depth_Mean_Compare|algorithm]] and Tcl extension for gauging the
 +similarity of two blocks of binary data of potentially differing lengths.
 +Given a depth (some integer greater than 0) and two blocks of data, a
 +floating point number is returned indicating the similarity of the two
 +blocks. The closer to zero, the more similar the two blocks are deemed to
 +be. If a value of 0 is returned, the blocks are deemed identical.
 +
 +====== Download ======
 +
 +  * {{ :​nmc:​nmc-tcl-0.2.src.tar.xz |}}
 +
 +
 +====== Building and Installing ======
 +
 +Running "​make"​ will build and test the extension. You can install
 +with "make install"​. The default prefix is /usr and the default lib
 +path is ${PREFIX}/​lib. You can override either of these like so:
 +
 +make install PREFIX=$HOME
 +
 +====== Walkthrough of Comparisons ======
 +
 +Let's walk through some comparisons using nmc.
 +
 +When using the Tcl extension, an n-depth of 0 means 'use the largest valid
 +value for n'.
 +
 +<code tcl>
 +% # Load the package and import the nmc command
 +% package require nmc
 +0.2
 +% namespace import nmc::nmc
 +% # Simple equality test of identical strings
 +% nmc 0 asdf asdf
 +0.000000000000000000
 +% # zero means identical.
 +% # If we compare slightly different strings of the same length:
 +% nmc 0 asdd asdf
 +0.894427190999915855
 +% # We get a value close to zero. How about a string that is a substring of the other?
 +% nmc 0 asdff asdf
 +0.447213595499957927
 +% # Also note that comparisons are commutative,​ like you'd expect:
 +% nmc 0 asdf asdff
 +0.447213595499957927
 +% # '​b'​ is closer to '​a'​ than '​z'​ is:
 +% nmc 0 asda asdb
 +0.447213595499957927
 +% nmc 0 asda asdz
 +11.18033988749894902
 +% # '​asdf'​ and '​asdfasdf'​ are more dissimilar than '​asdfasdf'​ and '​asdfasdfasdf'​
 +% nmc 0 asdf asdfasdf
 +8.124038404635960830
 +% nmc 0 asdfasdf asdfasdfasdf
 +1.424000624219588395
 +% # The tcl extension can take a percent as its n argument, which will then be percentage of the maximum valid n.
 +% # Get a couple of large, equal sized random strings
 +% set fh [open /​dev/​urandom r]
 +% set rand1 [read $fh [expr 0x1000]]; set rand2 [read $fh [expr 0x1000]]; close $fh
 +% # Depending on use case, we may wish to see purely random data as similar.
 +% # Percent-n is useful for this. Comparison using max-valid n:
 +% nmc 0 $rand1 $rand2
 +69.20999364325710700
 +% nmc 1% $rand1 $rand2
 +7.700022818067388641
 +</​code>​
 +
 +====== Short Comings ======
 +
 +The extension requires all the strings to be loaded into memory, thus
 +really large files cannot be compared. A sloppy work around, would be breaking
 +the files into chunks and comparing the chunks. Eventually, an nmc for file
 +streams (C-based) and channels (tcl-based) will be implemented.
 +
 +Floating point math is used and, as you probably know, you cannot expect
 +100% consistent results between architectures. It is for making comparisons
 +on a single machine, not between different devices. An NMC value on one machine
 +is not useful to compare against an NMC value of the same data on
 +another machine.
 +
 +====== Licensing ======
 +
 +All code included is released to the public domain, so long as the original
 +author is credited.