summaryrefslogtreecommitdiffstats
path: root/differ.txi
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2017-02-20 00:05:29 -0800
committerBryan Newbold <bnewbold@robocracy.org>2017-02-20 00:05:29 -0800
commit8466d8cfa486fb30d1755c4261b781135083787b (patch)
treec8c12c67246f543c3cc4f64d1c07e003cb1d45ae /differ.txi
parent87b82b5822ca54228cfa6df29be3ad9d4bc47d16 (diff)
downloadslib-8466d8cfa486fb30d1755c4261b781135083787b.tar.gz
slib-8466d8cfa486fb30d1755c4261b781135083787b.zip
Import Upstream version 3a1upstream/3a1
Diffstat (limited to 'differ.txi')
-rw-r--r--differ.txi105
1 files changed, 105 insertions, 0 deletions
diff --git a/differ.txi b/differ.txi
new file mode 100644
index 0000000..c46fc2a
--- /dev/null
+++ b/differ.txi
@@ -0,0 +1,105 @@
+@noindent
+@code{diff:edit-length} implements the algorithm:
+
+@ifinfo
+@example
+S. Wu, E. Myers, U. Manber, and W. Miller,
+ "An O(NP) Sequence Comparison Algorithm,"
+ Information Processing Letters 35, 6 (1990), 317-323.
+ @url{http://www.cs.arizona.edu/people/gene/vita.html}
+@end example
+@end ifinfo
+@ifset html
+S. Wu, <A HREF="http://www.cs.arizona.edu/people/gene/vita.html">
+E. Myers,</A> U. Manber, and W. Miller,
+<A HREF="http://www.cs.arizona.edu/people/gene/PAPERS/np_diff.ps">
+"An O(NP) Sequence Comparison Algorithm,"</A>
+Information Processing Letters 35, 6 (1990), 317-323.
+@end ifset
+
+@noindent
+The values returned by @code{diff:edit-length} can be used to gauge
+the degree of match between two sequences.
+
+@noindent
+Surprisingly, "An O(NP) Sequence Comparison Algorithm" does not
+derive the edit sequence; only the sequence length. Developing this
+linear-space sub-quadratic-time algorithm for computing the edit
+sequence required hundreds of hours of work. I have submitted a
+paper describing the algorithm to the Journal of Computational
+Biology.
+
+@noindent
+If the items being sequenced are text lines, then the computed
+edit-list is equivalent to the output of the @dfn{diff} utility
+@cindex diff
+program. If the items being sequenced are words, then it is like the
+lesser known @dfn{spiff} program.
+@cindex spiff
+
+
+@defun diff:longest-common-subsequence array1 array2 =? p-lim
+
+
+@defunx diff:longest-common-subsequence array1 array2 =?
+@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
+to compare sequence tokens for equality.
+
+The non-negative integer @var{p-lim}, if provided, is maximum number of
+deletions of the shorter sequence to allow. @code{diff:longest-common-subsequence} will return @code{#f}
+if more deletions would be necessary.
+
+@code{diff:longest-common-subsequence} returns a one-dimensional array of length @code{(quotient (- (+
+len1 len2) (diff:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence
+common to both @var{array}s.
+@end defun
+
+@defun diff:edits array1 array2 =? p-lim
+
+
+@defunx diff:edits array1 array2 =?
+@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
+to compare sequence tokens for equality.
+
+The non-negative integer @var{p-lim}, if provided, is maximum number of
+deletions of the shorter sequence to allow. @code{diff:edits} will return @code{#f}
+if more deletions would be necessary.
+
+@code{diff:edits} returns a vector of length @code{(diff:edit-length @var{array1} @var{array2})} composed
+of a shortest sequence of edits transformaing @var{array1} to @var{array2}.
+
+Each edit is an integer:
+@table @asis
+@item @var{k} > 0
+Inserts @code{(array-ref @var{array1} (+ -1 @var{j}))} into the sequence.
+@item @var{k} < 0
+Deletes @code{(array-ref @var{array2} (- -1 @var{k}))} from the sequence.
+@end table
+@end defun
+
+@defun diff:edit-length array1 array2 =? p-lim
+
+
+@defunx diff:edit-length array1 array2 =?
+@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
+to compare sequence tokens for equality.
+
+The non-negative integer @var{p-lim}, if provided, is maximum number of
+deletions of the shorter sequence to allow. @code{diff:edit-length} will return @code{#f}
+if more deletions would be necessary.
+
+@code{diff:edit-length} returns the length of the shortest sequence of edits transformaing
+@var{array1} to @var{array2}.
+@end defun
+@example
+(diff:longest-common-subsequence "fghiejcklm" "fgehijkpqrlm" eqv?)
+@result{} "fghijklm"
+
+(diff:edit-length "fghiejcklm" "fgehijkpqrlm" eqv?)
+@result{} 6
+
+(diff:edits "fghiejcklm" "fgehijkpqrlm" eqv?)
+@result{} #As32(3 -5 -7 8 9 10)
+ ; e c h p q r
+@end example
+