summaryrefslogtreecommitdiffstats
path: root/differ.txi
diff options
context:
space:
mode:
Diffstat (limited to 'differ.txi')
-rw-r--r--differ.txi90
1 files changed, 50 insertions, 40 deletions
diff --git a/differ.txi b/differ.txi
index f7b1f75..c46fc2a 100644
--- a/differ.txi
+++ b/differ.txi
@@ -1,5 +1,5 @@
@noindent
-This package implements the algorithm:
+@code{diff:edit-length} implements the algorithm:
@ifinfo
@example
@@ -18,6 +18,18 @@ Information Processing Letters 35, 6 (1990), 317-323.
@end ifset
@noindent
+The values returned by @code{diff:edit-length} can be used to gauge
+the degree of match between two sequences.
+
+@noindent
+Surprisingly, "An O(NP) Sequence Comparison Algorithm" does not
+derive the edit sequence; only the sequence length. Developing this
+linear-space sub-quadratic-time algorithm for computing the edit
+sequence required hundreds of hours of work. I have submitted a
+paper describing the algorithm to the Journal of Computational
+Biology.
+
+@noindent
If the items being sequenced are text lines, then the computed
edit-list is equivalent to the output of the @dfn{diff} utility
@cindex diff
@@ -25,71 +37,69 @@ program. If the items being sequenced are words, then it is like the
lesser known @dfn{spiff} program.
@cindex spiff
-@noindent
-The values returned by @code{diff:edit-length} can be used to gauge
-the degree of match between two sequences.
-@noindent
-I believe that this algorithm is currently the fastest for these
-tasks, but genome sequencing applications fuel extensive research in
-this area.
+@defun diff:longest-common-subsequence array1 array2 =? p-lim
-@defun diff:longest-common-subsequence array1 array2 =?
+@defunx diff:longest-common-subsequence array1 array2 =?
+@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
+to compare sequence tokens for equality.
+The non-negative integer @var{p-lim}, if provided, is maximum number of
+deletions of the shorter sequence to allow. @code{diff:longest-common-subsequence} will return @code{#f}
+if more deletions would be necessary.
-@defunx diff:longest-common-subsequence array1 array2
-@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
-to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}.
@code{diff:longest-common-subsequence} returns a one-dimensional array of length @code{(quotient (- (+
-len1 len2) (fp:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence
+len1 len2) (diff:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence
common to both @var{array}s.
@end defun
-@defun diff:edits array1 array2 =?
+@defun diff:edits array1 array2 =? p-lim
-@defunx diff:edits array1 array2
+@defunx diff:edits array1 array2 =?
@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
-to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}.
-@code{diff:edits} returns a list of length @code{(fp:edit-length @var{array1} @var{array2})} composed of
-a shortest sequence of edits transformaing @var{array1} to @var{array2}.
+to compare sequence tokens for equality.
-Each edit is a list of an integer and a symbol:
+The non-negative integer @var{p-lim}, if provided, is maximum number of
+deletions of the shorter sequence to allow. @code{diff:edits} will return @code{#f}
+if more deletions would be necessary.
+
+@code{diff:edits} returns a vector of length @code{(diff:edit-length @var{array1} @var{array2})} composed
+of a shortest sequence of edits transformaing @var{array1} to @var{array2}.
+
+Each edit is an integer:
@table @asis
-@item (@var{j} insert)
-Inserts @code{(array-ref @var{array1} @var{j})} into the sequence.
-@item (@var{k} delete)
-Deletes @code{(array-ref @var{array2} @var{k})} from the sequence.
+@item @var{k} > 0
+Inserts @code{(array-ref @var{array1} (+ -1 @var{j}))} into the sequence.
+@item @var{k} < 0
+Deletes @code{(array-ref @var{array2} (- -1 @var{k}))} from the sequence.
@end table
@end defun
-@defun diff:edit-length array1 array2 =?
+@defun diff:edit-length array1 array2 =? p-lim
-@defunx diff:edit-length array1 array2
+@defunx diff:edit-length array1 array2 =?
@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
-to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}.
+to compare sequence tokens for equality.
+
+The non-negative integer @var{p-lim}, if provided, is maximum number of
+deletions of the shorter sequence to allow. @code{diff:edit-length} will return @code{#f}
+if more deletions would be necessary.
+
@code{diff:edit-length} returns the length of the shortest sequence of edits transformaing
@var{array1} to @var{array2}.
@end defun
@example
-(diff:longest-common-subsequence '#(f g h i e j c k l m)
- '#(f g e h i j k p q r l m))
- @result{} #(f g h i j k l m)
+(diff:longest-common-subsequence "fghiejcklm" "fgehijkpqrlm" eqv?)
+@result{} "fghijklm"
-(diff:edit-length '#(f g h i e j c k l m)
- '#(f g e h i j k p q r l m))
+(diff:edit-length "fghiejcklm" "fgehijkpqrlm" eqv?)
@result{} 6
-(pretty-print (diff:edits '#(f g h i e j c k l m)
- '#(f g e h i j k p q r l m)))
-@print{}
-((3 insert) ; e
- (4 delete) ; c
- (6 delete) ; h
- (7 insert) ; p
- (8 insert) ; q
- (9 insert)) ; r
+(diff:edits "fghiejcklm" "fgehijkpqrlm" eqv?)
+@result{} #As32(3 -5 -7 8 9 10)
+ ; e c h p q r
@end example