diff options
Diffstat (limited to 'differ.txi')
| -rw-r--r-- | differ.txi | 95 | 
1 files changed, 95 insertions, 0 deletions
| diff --git a/differ.txi b/differ.txi new file mode 100644 index 0000000..f7b1f75 --- /dev/null +++ b/differ.txi @@ -0,0 +1,95 @@ +@noindent +This package implements the algorithm: + +@ifinfo +@example +S. Wu, E. Myers, U. Manber, and W. Miller, +   "An O(NP) Sequence Comparison Algorithm," +   Information Processing Letters 35, 6 (1990), 317-323. +   @url{http://www.cs.arizona.edu/people/gene/vita.html} +@end example +@end ifinfo +@ifset html +S. Wu, <A HREF="http://www.cs.arizona.edu/people/gene/vita.html"> +E. Myers,</A> U. Manber, and W. Miller, +<A HREF="http://www.cs.arizona.edu/people/gene/PAPERS/np_diff.ps"> +"An O(NP) Sequence Comparison Algorithm,"</A> +Information Processing Letters 35, 6 (1990), 317-323. +@end ifset + +@noindent +If the items being sequenced are text lines, then the computed +edit-list is equivalent to the output of the @dfn{diff} utility +@cindex diff +program.  If the items being sequenced are words, then it is like the +lesser known @dfn{spiff} program. +@cindex spiff + +@noindent +The values returned by @code{diff:edit-length} can be used to gauge +the degree of match between two sequences. + +@noindent +I believe that this algorithm is currently the fastest for these +tasks, but genome sequencing applications fuel extensive research in +this area. + + +@defun diff:longest-common-subsequence array1 array2 =? + + +@defunx diff:longest-common-subsequence array1 array2 +@var{array1} and @var{array2} are one-dimensional arrays.  The procedure @var{=?} is used +to compare sequence tokens for equality.  @var{=?} defaults to @code{eqv?}. +@code{diff:longest-common-subsequence} returns a one-dimensional array of length @code{(quotient (- (+ +len1 len2) (fp:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence +common to both @var{array}s. +@end defun + +@defun diff:edits array1 array2 =? + + +@defunx diff:edits array1 array2 +@var{array1} and @var{array2} are one-dimensional arrays.  The procedure @var{=?} is used +to compare sequence tokens for equality.  @var{=?} defaults to @code{eqv?}. +@code{diff:edits} returns a list of length @code{(fp:edit-length @var{array1} @var{array2})} composed of +a shortest sequence of edits transformaing @var{array1} to @var{array2}. + +Each edit is a list of an integer and a symbol: +@table @asis +@item (@var{j} insert) +Inserts @code{(array-ref @var{array1} @var{j})} into the sequence. +@item (@var{k} delete) +Deletes @code{(array-ref @var{array2} @var{k})} from the sequence. +@end table +@end defun + +@defun diff:edit-length array1 array2 =? + + +@defunx diff:edit-length array1 array2 +@var{array1} and @var{array2} are one-dimensional arrays.  The procedure @var{=?} is used +to compare sequence tokens for equality.  @var{=?} defaults to @code{eqv?}. +@code{diff:edit-length} returns the length of the shortest sequence of edits transformaing +@var{array1} to @var{array2}. +@end defun +@example +(diff:longest-common-subsequence '#(f g h i e j c k l m) +                                 '#(f g e h i j k p q r l m)) +                                 @result{} #(f g h i j k l m) + +(diff:edit-length '#(f g h i e j c k l m) +                  '#(f g e h i j k p q r l m)) +@result{} 6 + +(pretty-print (diff:edits '#(f g h i e j c k l m) +                          '#(f g e h i j k p q r l m))) +@print{} +((3 insert)                           ; e + (4 delete)                           ; c + (6 delete)                           ; h + (7 insert)                           ; p + (8 insert)                           ; q + (9 insert))                          ; r +@end example + | 
