Differences
This shows you the differences between two versions of the page.
— |
nmc-tcl [2017/05/13 19:13] (current) zashi created |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== NMC Tcl Extension ====== | ||
+ | |||
+ | n-Depth Mean Compare is an [[n-Depth_Mean_Compare|algorithm]] and Tcl extension for gauging the | ||
+ | similarity of two blocks of binary data of potentially differing lengths. | ||
+ | Given a depth (some integer greater than 0) and two blocks of data, a | ||
+ | floating point number is returned indicating the similarity of the two | ||
+ | blocks. The closer to zero, the more similar the two blocks are deemed to | ||
+ | be. If a value of 0 is returned, the blocks are deemed identical. | ||
+ | |||
+ | ====== Download ====== | ||
+ | |||
+ | * {{ :nmc:nmc-tcl-0.2.src.tar.xz |}} | ||
+ | |||
+ | |||
+ | ====== Building and Installing ====== | ||
+ | |||
+ | Running "make" will build and test the extension. You can install | ||
+ | with "make install". The default prefix is /usr and the default lib | ||
+ | path is ${PREFIX}/lib. You can override either of these like so: | ||
+ | |||
+ | make install PREFIX=$HOME | ||
+ | |||
+ | ====== Walkthrough of Comparisons ====== | ||
+ | |||
+ | Let's walk through some comparisons using nmc. | ||
+ | |||
+ | When using the Tcl extension, an n-depth of 0 means 'use the largest valid | ||
+ | value for n'. | ||
+ | |||
+ | <code tcl> | ||
+ | % # Load the package and import the nmc command | ||
+ | % package require nmc | ||
+ | 0.2 | ||
+ | % namespace import nmc::nmc | ||
+ | % # Simple equality test of identical strings | ||
+ | % nmc 0 asdf asdf | ||
+ | 0.000000000000000000 | ||
+ | % # zero means identical. | ||
+ | % # If we compare slightly different strings of the same length: | ||
+ | % nmc 0 asdd asdf | ||
+ | 0.894427190999915855 | ||
+ | % # We get a value close to zero. How about a string that is a substring of the other? | ||
+ | % nmc 0 asdff asdf | ||
+ | 0.447213595499957927 | ||
+ | % # Also note that comparisons are commutative, like you'd expect: | ||
+ | % nmc 0 asdf asdff | ||
+ | 0.447213595499957927 | ||
+ | % # 'b' is closer to 'a' than 'z' is: | ||
+ | % nmc 0 asda asdb | ||
+ | 0.447213595499957927 | ||
+ | % nmc 0 asda asdz | ||
+ | 11.18033988749894902 | ||
+ | % # 'asdf' and 'asdfasdf' are more dissimilar than 'asdfasdf' and 'asdfasdfasdf' | ||
+ | % nmc 0 asdf asdfasdf | ||
+ | 8.124038404635960830 | ||
+ | % nmc 0 asdfasdf asdfasdfasdf | ||
+ | 1.424000624219588395 | ||
+ | % # The tcl extension can take a percent as its n argument, which will then be percentage of the maximum valid n. | ||
+ | % # Get a couple of large, equal sized random strings | ||
+ | % set fh [open /dev/urandom r] | ||
+ | % set rand1 [read $fh [expr 0x1000]]; set rand2 [read $fh [expr 0x1000]]; close $fh | ||
+ | % # Depending on use case, we may wish to see purely random data as similar. | ||
+ | % # Percent-n is useful for this. Comparison using max-valid n: | ||
+ | % nmc 0 $rand1 $rand2 | ||
+ | 69.20999364325710700 | ||
+ | % nmc 1% $rand1 $rand2 | ||
+ | 7.700022818067388641 | ||
+ | </code> | ||
+ | |||
+ | ====== Short Comings ====== | ||
+ | |||
+ | The extension requires all the strings to be loaded into memory, thus | ||
+ | really large files cannot be compared. A sloppy work around, would be breaking | ||
+ | the files into chunks and comparing the chunks. Eventually, an nmc for file | ||
+ | streams (C-based) and channels (tcl-based) will be implemented. | ||
+ | |||
+ | Floating point math is used and, as you probably know, you cannot expect | ||
+ | 100% consistent results between architectures. It is for making comparisons | ||
+ | on a single machine, not between different devices. An NMC value on one machine | ||
+ | is not useful to compare against an NMC value of the same data on | ||
+ | another machine. | ||
+ | |||
+ | ====== Licensing ====== | ||
+ | |||
+ | All code included is released to the public domain, so long as the original | ||
+ | author is credited. | ||