summaryrefslogtreecommitdiffstats
path: root/differ.txi
diff options
context:
space:
mode:
Diffstat (limited to 'differ.txi')
-rw-r--r--differ.txi95
1 files changed, 95 insertions, 0 deletions
diff --git a/differ.txi b/differ.txi
new file mode 100644
index 0000000..f7b1f75
--- /dev/null
+++ b/differ.txi
@@ -0,0 +1,95 @@
+@noindent
+This package implements the algorithm:
+
+@ifinfo
+@example
+S. Wu, E. Myers, U. Manber, and W. Miller,
+ "An O(NP) Sequence Comparison Algorithm,"
+ Information Processing Letters 35, 6 (1990), 317-323.
+ @url{http://www.cs.arizona.edu/people/gene/vita.html}
+@end example
+@end ifinfo
+@ifset html
+S. Wu, <A HREF="http://www.cs.arizona.edu/people/gene/vita.html">
+E. Myers,</A> U. Manber, and W. Miller,
+<A HREF="http://www.cs.arizona.edu/people/gene/PAPERS/np_diff.ps">
+"An O(NP) Sequence Comparison Algorithm,"</A>
+Information Processing Letters 35, 6 (1990), 317-323.
+@end ifset
+
+@noindent
+If the items being sequenced are text lines, then the computed
+edit-list is equivalent to the output of the @dfn{diff} utility
+@cindex diff
+program. If the items being sequenced are words, then it is like the
+lesser known @dfn{spiff} program.
+@cindex spiff
+
+@noindent
+The values returned by @code{diff:edit-length} can be used to gauge
+the degree of match between two sequences.
+
+@noindent
+I believe that this algorithm is currently the fastest for these
+tasks, but genome sequencing applications fuel extensive research in
+this area.
+
+
+@defun diff:longest-common-subsequence array1 array2 =?
+
+
+@defunx diff:longest-common-subsequence array1 array2
+@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
+to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}.
+@code{diff:longest-common-subsequence} returns a one-dimensional array of length @code{(quotient (- (+
+len1 len2) (fp:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence
+common to both @var{array}s.
+@end defun
+
+@defun diff:edits array1 array2 =?
+
+
+@defunx diff:edits array1 array2
+@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
+to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}.
+@code{diff:edits} returns a list of length @code{(fp:edit-length @var{array1} @var{array2})} composed of
+a shortest sequence of edits transformaing @var{array1} to @var{array2}.
+
+Each edit is a list of an integer and a symbol:
+@table @asis
+@item (@var{j} insert)
+Inserts @code{(array-ref @var{array1} @var{j})} into the sequence.
+@item (@var{k} delete)
+Deletes @code{(array-ref @var{array2} @var{k})} from the sequence.
+@end table
+@end defun
+
+@defun diff:edit-length array1 array2 =?
+
+
+@defunx diff:edit-length array1 array2
+@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
+to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}.
+@code{diff:edit-length} returns the length of the shortest sequence of edits transformaing
+@var{array1} to @var{array2}.
+@end defun
+@example
+(diff:longest-common-subsequence '#(f g h i e j c k l m)
+ '#(f g e h i j k p q r l m))
+ @result{} #(f g h i j k l m)
+
+(diff:edit-length '#(f g h i e j c k l m)
+ '#(f g e h i j k p q r l m))
+@result{} 6
+
+(pretty-print (diff:edits '#(f g h i e j c k l m)
+ '#(f g e h i j k p q r l m)))
+@print{}
+((3 insert) ; e
+ (4 delete) ; c
+ (6 delete) ; h
+ (7 insert) ; p
+ (8 insert) ; q
+ (9 insert)) ; r
+@end example
+