diff options
Diffstat (limited to 'differ.txi')
-rw-r--r-- | differ.txi | 105 |
1 files changed, 105 insertions, 0 deletions
diff --git a/differ.txi b/differ.txi new file mode 100644 index 0000000..c46fc2a --- /dev/null +++ b/differ.txi @@ -0,0 +1,105 @@ +@noindent +@code{diff:edit-length} implements the algorithm: + +@ifinfo +@example +S. Wu, E. Myers, U. Manber, and W. Miller, + "An O(NP) Sequence Comparison Algorithm," + Information Processing Letters 35, 6 (1990), 317-323. + @url{http://www.cs.arizona.edu/people/gene/vita.html} +@end example +@end ifinfo +@ifset html +S. Wu, <A HREF="http://www.cs.arizona.edu/people/gene/vita.html"> +E. Myers,</A> U. Manber, and W. Miller, +<A HREF="http://www.cs.arizona.edu/people/gene/PAPERS/np_diff.ps"> +"An O(NP) Sequence Comparison Algorithm,"</A> +Information Processing Letters 35, 6 (1990), 317-323. +@end ifset + +@noindent +The values returned by @code{diff:edit-length} can be used to gauge +the degree of match between two sequences. + +@noindent +Surprisingly, "An O(NP) Sequence Comparison Algorithm" does not +derive the edit sequence; only the sequence length. Developing this +linear-space sub-quadratic-time algorithm for computing the edit +sequence required hundreds of hours of work. I have submitted a +paper describing the algorithm to the Journal of Computational +Biology. + +@noindent +If the items being sequenced are text lines, then the computed +edit-list is equivalent to the output of the @dfn{diff} utility +@cindex diff +program. If the items being sequenced are words, then it is like the +lesser known @dfn{spiff} program. +@cindex spiff + + +@defun diff:longest-common-subsequence array1 array2 =? p-lim + + +@defunx diff:longest-common-subsequence array1 array2 =? +@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used +to compare sequence tokens for equality. + +The non-negative integer @var{p-lim}, if provided, is maximum number of +deletions of the shorter sequence to allow. @code{diff:longest-common-subsequence} will return @code{#f} +if more deletions would be necessary. + +@code{diff:longest-common-subsequence} returns a one-dimensional array of length @code{(quotient (- (+ +len1 len2) (diff:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence +common to both @var{array}s. +@end defun + +@defun diff:edits array1 array2 =? p-lim + + +@defunx diff:edits array1 array2 =? +@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used +to compare sequence tokens for equality. + +The non-negative integer @var{p-lim}, if provided, is maximum number of +deletions of the shorter sequence to allow. @code{diff:edits} will return @code{#f} +if more deletions would be necessary. + +@code{diff:edits} returns a vector of length @code{(diff:edit-length @var{array1} @var{array2})} composed +of a shortest sequence of edits transformaing @var{array1} to @var{array2}. + +Each edit is an integer: +@table @asis +@item @var{k} > 0 +Inserts @code{(array-ref @var{array1} (+ -1 @var{j}))} into the sequence. +@item @var{k} < 0 +Deletes @code{(array-ref @var{array2} (- -1 @var{k}))} from the sequence. +@end table +@end defun + +@defun diff:edit-length array1 array2 =? p-lim + + +@defunx diff:edit-length array1 array2 =? +@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used +to compare sequence tokens for equality. + +The non-negative integer @var{p-lim}, if provided, is maximum number of +deletions of the shorter sequence to allow. @code{diff:edit-length} will return @code{#f} +if more deletions would be necessary. + +@code{diff:edit-length} returns the length of the shortest sequence of edits transformaing +@var{array1} to @var{array2}. +@end defun +@example +(diff:longest-common-subsequence "fghiejcklm" "fgehijkpqrlm" eqv?) +@result{} "fghijklm" + +(diff:edit-length "fghiejcklm" "fgehijkpqrlm" eqv?) +@result{} 6 + +(diff:edits "fghiejcklm" "fgehijkpqrlm" eqv?) +@result{} #As32(3 -5 -7 8 9 10) + ; e c h p q r +@end example + |