diff options
Diffstat (limited to 'differ.txi')
-rw-r--r-- | differ.txi | 90 |
1 files changed, 50 insertions, 40 deletions
@@ -1,5 +1,5 @@ @noindent -This package implements the algorithm: +@code{diff:edit-length} implements the algorithm: @ifinfo @example @@ -18,6 +18,18 @@ Information Processing Letters 35, 6 (1990), 317-323. @end ifset @noindent +The values returned by @code{diff:edit-length} can be used to gauge +the degree of match between two sequences. + +@noindent +Surprisingly, "An O(NP) Sequence Comparison Algorithm" does not +derive the edit sequence; only the sequence length. Developing this +linear-space sub-quadratic-time algorithm for computing the edit +sequence required hundreds of hours of work. I have submitted a +paper describing the algorithm to the Journal of Computational +Biology. + +@noindent If the items being sequenced are text lines, then the computed edit-list is equivalent to the output of the @dfn{diff} utility @cindex diff @@ -25,71 +37,69 @@ program. If the items being sequenced are words, then it is like the lesser known @dfn{spiff} program. @cindex spiff -@noindent -The values returned by @code{diff:edit-length} can be used to gauge -the degree of match between two sequences. -@noindent -I believe that this algorithm is currently the fastest for these -tasks, but genome sequencing applications fuel extensive research in -this area. +@defun diff:longest-common-subsequence array1 array2 =? p-lim -@defun diff:longest-common-subsequence array1 array2 =? +@defunx diff:longest-common-subsequence array1 array2 =? +@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used +to compare sequence tokens for equality. +The non-negative integer @var{p-lim}, if provided, is maximum number of +deletions of the shorter sequence to allow. @code{diff:longest-common-subsequence} will return @code{#f} +if more deletions would be necessary. -@defunx diff:longest-common-subsequence array1 array2 -@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used -to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}. @code{diff:longest-common-subsequence} returns a one-dimensional array of length @code{(quotient (- (+ -len1 len2) (fp:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence +len1 len2) (diff:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence common to both @var{array}s. @end defun -@defun diff:edits array1 array2 =? +@defun diff:edits array1 array2 =? p-lim -@defunx diff:edits array1 array2 +@defunx diff:edits array1 array2 =? @var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used -to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}. -@code{diff:edits} returns a list of length @code{(fp:edit-length @var{array1} @var{array2})} composed of -a shortest sequence of edits transformaing @var{array1} to @var{array2}. +to compare sequence tokens for equality. -Each edit is a list of an integer and a symbol: +The non-negative integer @var{p-lim}, if provided, is maximum number of +deletions of the shorter sequence to allow. @code{diff:edits} will return @code{#f} +if more deletions would be necessary. + +@code{diff:edits} returns a vector of length @code{(diff:edit-length @var{array1} @var{array2})} composed +of a shortest sequence of edits transformaing @var{array1} to @var{array2}. + +Each edit is an integer: @table @asis -@item (@var{j} insert) -Inserts @code{(array-ref @var{array1} @var{j})} into the sequence. -@item (@var{k} delete) -Deletes @code{(array-ref @var{array2} @var{k})} from the sequence. +@item @var{k} > 0 +Inserts @code{(array-ref @var{array1} (+ -1 @var{j}))} into the sequence. +@item @var{k} < 0 +Deletes @code{(array-ref @var{array2} (- -1 @var{k}))} from the sequence. @end table @end defun -@defun diff:edit-length array1 array2 =? +@defun diff:edit-length array1 array2 =? p-lim -@defunx diff:edit-length array1 array2 +@defunx diff:edit-length array1 array2 =? @var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used -to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}. +to compare sequence tokens for equality. + +The non-negative integer @var{p-lim}, if provided, is maximum number of +deletions of the shorter sequence to allow. @code{diff:edit-length} will return @code{#f} +if more deletions would be necessary. + @code{diff:edit-length} returns the length of the shortest sequence of edits transformaing @var{array1} to @var{array2}. @end defun @example -(diff:longest-common-subsequence '#(f g h i e j c k l m) - '#(f g e h i j k p q r l m)) - @result{} #(f g h i j k l m) +(diff:longest-common-subsequence "fghiejcklm" "fgehijkpqrlm" eqv?) +@result{} "fghijklm" -(diff:edit-length '#(f g h i e j c k l m) - '#(f g e h i j k p q r l m)) +(diff:edit-length "fghiejcklm" "fgehijkpqrlm" eqv?) @result{} 6 -(pretty-print (diff:edits '#(f g h i e j c k l m) - '#(f g e h i j k p q r l m))) -@print{} -((3 insert) ; e - (4 delete) ; c - (6 delete) ; h - (7 insert) ; p - (8 insert) ; q - (9 insert)) ; r +(diff:edits "fghiejcklm" "fgehijkpqrlm" eqv?) +@result{} #As32(3 -5 -7 8 9 10) + ; e c h p q r @end example |