summaryrefslogtreecommitdiffstats
path: root/differ.txi
blob: f7b1f75dfdf86d51c4d4569d14956e358d3c4ed7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
@noindent
This package implements the algorithm:

@ifinfo
@example
S. Wu, E. Myers, U. Manber, and W. Miller,
   "An O(NP) Sequence Comparison Algorithm,"
   Information Processing Letters 35, 6 (1990), 317-323.
   @url{http://www.cs.arizona.edu/people/gene/vita.html}
@end example
@end ifinfo
@ifset html
S. Wu, <A HREF="http://www.cs.arizona.edu/people/gene/vita.html">
E. Myers,</A> U. Manber, and W. Miller,
<A HREF="http://www.cs.arizona.edu/people/gene/PAPERS/np_diff.ps">
"An O(NP) Sequence Comparison Algorithm,"</A>
Information Processing Letters 35, 6 (1990), 317-323.
@end ifset

@noindent
If the items being sequenced are text lines, then the computed
edit-list is equivalent to the output of the @dfn{diff} utility
@cindex diff
program.  If the items being sequenced are words, then it is like the
lesser known @dfn{spiff} program.
@cindex spiff

@noindent
The values returned by @code{diff:edit-length} can be used to gauge
the degree of match between two sequences.

@noindent
I believe that this algorithm is currently the fastest for these
tasks, but genome sequencing applications fuel extensive research in
this area.


@defun diff:longest-common-subsequence array1 array2 =?


@defunx diff:longest-common-subsequence array1 array2
@var{array1} and @var{array2} are one-dimensional arrays.  The procedure @var{=?} is used
to compare sequence tokens for equality.  @var{=?} defaults to @code{eqv?}.
@code{diff:longest-common-subsequence} returns a one-dimensional array of length @code{(quotient (- (+
len1 len2) (fp:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence
common to both @var{array}s.
@end defun

@defun diff:edits array1 array2 =?


@defunx diff:edits array1 array2
@var{array1} and @var{array2} are one-dimensional arrays.  The procedure @var{=?} is used
to compare sequence tokens for equality.  @var{=?} defaults to @code{eqv?}.
@code{diff:edits} returns a list of length @code{(fp:edit-length @var{array1} @var{array2})} composed of
a shortest sequence of edits transformaing @var{array1} to @var{array2}.

Each edit is a list of an integer and a symbol:
@table @asis
@item (@var{j} insert)
Inserts @code{(array-ref @var{array1} @var{j})} into the sequence.
@item (@var{k} delete)
Deletes @code{(array-ref @var{array2} @var{k})} from the sequence.
@end table
@end defun

@defun diff:edit-length array1 array2 =?


@defunx diff:edit-length array1 array2
@var{array1} and @var{array2} are one-dimensional arrays.  The procedure @var{=?} is used
to compare sequence tokens for equality.  @var{=?} defaults to @code{eqv?}.
@code{diff:edit-length} returns the length of the shortest sequence of edits transformaing
@var{array1} to @var{array2}.
@end defun
@example
(diff:longest-common-subsequence '#(f g h i e j c k l m)
                                 '#(f g e h i j k p q r l m))
                                 @result{} #(f g h i j k l m)

(diff:edit-length '#(f g h i e j c k l m)
                  '#(f g e h i j k p q r l m))
@result{} 6

(pretty-print (diff:edits '#(f g h i e j c k l m)
                          '#(f g e h i j k p q r l m)))
@print{}
((3 insert)                           ; e
 (4 delete)                           ; c
 (6 delete)                           ; h
 (7 insert)                           ; p
 (8 insert)                           ; q
 (9 insert))                          ; r
@end example