summaryrefslogtreecommitdiffstats
path: root/ps01_grep
diff options
context:
space:
mode:
authorbnewbold <bnewbold@eta.mit.edu>2009-02-11 14:24:16 -0500
committerbnewbold <bnewbold@eta.mit.edu>2009-02-11 14:24:16 -0500
commit2b7db5b23df55e3a1ac0639494bea750d0797c9d (patch)
tree700983f54296f16984629f25a395e751531f515f /ps01_grep
parent40f5708a4d382f358a854b336a9e8db9ffb3fccc (diff)
download6.945-2b7db5b23df55e3a1ac0639494bea750d0797c9d.tar.gz
6.945-2b7db5b23df55e3a1ac0639494bea750d0797c9d.zip
first pset work
Diffstat (limited to 'ps01_grep')
-rw-r--r--ps01_grep/posix_standard.html1109
-rw-r--r--ps01_grep/ps01_bnewbold.txt345
-rw-r--r--ps01_grep/ps01_work.scm331
3 files changed, 1785 insertions, 0 deletions
diff --git a/ps01_grep/posix_standard.html b/ps01_grep/posix_standard.html
new file mode 100644
index 0000000..6c1aeb2
--- /dev/null
+++ b/ps01_grep/posix_standard.html
@@ -0,0 +1,1109 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+<meta name="generator" content="HTML Tidy, see www.w3.org">
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+<link type="text/css" rel="stylesheet" href="style.css">
+<!-- Generated by The Open Group's rhtm tool v1.2.1 -->
+<!-- Copyright (c) 2001-2004 IEEE and The Open Group, All Rights Reserved -->
+<title>Regular Expressions</title>
+</head>
+<body bgcolor="white">
+
+<basefont size="3"> <!--header start-->
+<center><font size="2">The Open Group Base Specifications Issue 6<br>
+IEEE Std 1003.1, 2004 Edition<br>
+Copyright &copy; 2001-2004 The IEEE and The Open Group, All Rights reserved.</font></center>
+
+<!--header end-->
+<hr size="2" noshade>
+<h2><a name="tag_09"> 9. </a>Regular Expressions</h2>
+
+<p>Regular Expressions (REs) provide a mechanism to select specific strings from a set of character strings.</p>
+
+<p>Regular expressions are a context-independent syntax that can represent a wide variety of character sets and character set
+orderings, where these character sets are interpreted according to the current locale. While many regular expressions can be
+interpreted differently depending on the current locale, many features, such as character class expressions, provide for contextual
+invariance across locales.</p>
+
+<p>The Basic Regular Expression (BRE) notation and construction rules in <a href="#tag_09_03">Basic Regular Expressions</a> shall
+apply to most utilities supporting regular expressions. Some utilities, instead, support the Extended Regular Expressions (ERE)
+described in <a href="#tag_09_04">Extended Regular Expressions</a>; any exceptions for both cases are noted in the descriptions of
+the specific utilities using regular expressions. Both BREs and EREs are supported by the Regular Expression Matching interface in
+the System Interfaces volume of IEEE&nbsp;Std&nbsp;1003.1-2001 under <a href="../functions/regcomp.html"><i>regcomp</i>()</a>, <a
+href="../functions/regexec.html"><i>regexec</i>()</a>, and related functions.</p>
+
+<h3><a name="tag_09_01"> 9.1 </a>Regular Expression Definitions</h3>
+
+<p>For the purposes of this section, the following definitions shall apply:</p>
+
+<h4><a name="tag_09_01_01"></a>entire regular expression</h4>
+
+<p>The concatenated set of one or more BREs or EREs that make up the pattern specified for string selection.</p>
+
+<h4><a name="tag_09_01_02"></a>matched</h4>
+
+<p>A sequence of zero or more characters shall be said to be matched by a BRE or ERE when the characters in the sequence correspond
+to a sequence of characters defined by the pattern.</p>
+
+<p>Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character.
+This means that if a character set contains two or more encodings for a graphic symbol, or if the strings searched contain text
+encoded in more than one codeset, no attempt is made to search for any other representation of the encoded symbol. If that is
+required, the user can specify equivalence classes containing all variations of the desired graphic symbol.</p>
+
+<p>The search for a matching sequence starts at the beginning of a string and stops when the first sequence matching the expression
+is found, where &quot;first&quot; is defined to mean &quot;begins earliest in the string&quot;. If the pattern permits a variable number of
+matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched. For
+example, the BRE <tt>"bb*"</tt> matches the second to fourth characters of the string <tt>"abbbc"</tt>, and the ERE
+<tt>"(wee|week)(knights|night)"</tt> matches all ten characters of the string <tt>"weeknights"</tt>.</p>
+
+<p>Consistent with the whole match being the longest of the leftmost matches, each subpattern, from left to right, shall match the
+longest possible string. For this purpose, a null string shall be considered to be longer than no match at all. For example,
+matching the BRE <tt>"\(.*\).*"</tt> against <tt>"abcdef"</tt>, the subexpression <tt>"(\1)"</tt> is <tt>"abcdef"</tt>, and
+matching the BRE <tt>"\(a*\)*"</tt> against <tt>"bc"</tt>, the subexpression <tt>"(\1)"</tt> is the null string.</p>
+
+<p>When a multi-character collating element in a bracket expression (see <a href="#tag_09_03_05">RE Bracket Expression</a>) is
+involved, the longest sequence shall be measured in characters consumed from the string to be matched; that is, the collating
+element counts not as one element, but as the number of characters it matches.</p>
+
+<h4><a name="tag_09_01_03"></a>BRE (ERE) matching a single character</h4>
+
+<p>A BRE or ERE that shall match either a single character or a single collating element.</p>
+
+<p>Only a BRE or ERE of this type that includes a bracket expression (see <a href="#tag_09_03_05">RE Bracket Expression</a>) can
+match a collating element.</p>
+
+<h4><a name="tag_09_01_04"></a>BRE (ERE) matching multiple characters</h4>
+
+<p>A BRE or ERE that shall match a concatenation of single characters or collating elements.</p>
+
+<p>Such a BRE or ERE is made up from a BRE (ERE) matching a single character and BRE (ERE) special characters.</p>
+
+<h4><a name="tag_09_01_05"></a>invalid</h4>
+
+<p>This section uses the term &quot;invalid&quot; for certain constructs or conditions. Invalid REs shall cause the utility or function
+using the RE to generate an error condition. When invalid is not used, violations of the specified syntax or semantics for REs
+produce undefined results: this may entail an error, enabling an extended syntax for that RE, or using the construct in error as
+literal characters to be matched. For example, the BRE construct <tt>"\{1,2,3\}"</tt> does not comply with the grammar. A
+conforming application cannot rely on it producing an error nor matching the literal characters <tt>"\{1,2,3\}"</tt>.</p>
+
+<h3><a name="tag_09_02"> 9.2 </a>Regular Expression General Requirements</h3>
+
+<p>The requirements in this section shall apply to both basic and extended regular expressions.</p>
+
+<p>The use of regular expressions is generally associated with text processing. REs (BREs and EREs) operate on text strings; that
+is, zero or more characters followed by an end-of-string delimiter (typically NUL). Some utilities employing regular expressions
+limit the processing to lines; that is, zero or more characters followed by a &lt;newline&gt;. In the regular expression processing
+described in IEEE&nbsp;Std&nbsp;1003.1-2001, the &lt;newline&gt; is regarded as an ordinary character and both a period and a
+non-matching list can match one. The Shell and Utilities volume of IEEE&nbsp;Std&nbsp;1003.1-2001 specifies within the individual
+descriptions of those standard utilities employing regular expressions whether they permit matching of &lt;newline&gt;s; if not
+stated otherwise, the use of literal &lt;newline&gt;s or any escape sequence equivalent produces undefined results. Those utilities
+(like <a href="../utilities/grep.html"><i>grep</i></a>) that do not allow &lt;newline&gt;s to match are responsible for eliminating
+any &lt;newline&gt; from strings before matching against the RE. The <a href="../functions/regcomp.html"><i>regcomp</i>()</a>
+function in the System Interfaces volume of IEEE&nbsp;Std&nbsp;1003.1-2001, however, can provide support for such processing
+without violating the rules of this section.</p>
+
+<p>The interfaces specified in IEEE&nbsp;Std&nbsp;1003.1-2001 do not permit the inclusion of a NUL character in an RE or in the
+string to be matched. If during the operation of a standard utility a NUL is included in the text designated to be matched, that
+NUL may designate the end of the text string for the purposes of matching.</p>
+
+<p>When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without
+regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against
+the pattern, not only the character, but also its case counterpart (if any), shall be matched. This definition of case-insensitive
+processing is intended to allow matching of multi-character collating elements as well as characters, as each character in the
+string is matched using both its cases. For example, in a locale where <tt>"Ch"</tt> is a multi-character collating element and
+where a matching list expression matches such elements, the RE <tt>"[[.Ch.]]"</tt> when matched against the string <tt>"char"</tt>
+is in reality matched against <tt>"ch"</tt>, <tt>"Ch"</tt>, <tt>"cH"</tt>, and <tt>"CH"</tt>.</p>
+
+<p>The implementation shall support any regular expression that does not exceed 256 bytes in length.</p>
+
+<h3><a name="tag_09_03"> 9.3 </a>Basic Regular Expressions</h3>
+
+<h4><a name="tag_09_03_01"> 9.3.1 </a>BREs Matching a Single Character or Collating Element</h4>
+
+<p>A BRE ordinary character, a special character preceded by a backslash, or a period shall match a single character. A bracket
+expression shall match a single character or a single collating element.</p>
+
+<h4><a name="tag_09_03_02"> 9.3.2 </a>BRE Ordinary Characters</h4>
+
+<p>An ordinary character is a BRE that matches itself: any character in the supported character set, except for the BRE special
+characters listed in <a href="#tag_09_03_03">BRE Special Characters</a>.</p>
+
+<p>The interpretation of an ordinary character preceded by a backslash ( <tt>'\'</tt> ) is undefined, except for:</p>
+
+<ul>
+<li>
+<p>The characters <tt>')'</tt>, <tt>'('</tt>, <tt>'{'</tt>, and <tt>'}'</tt></p>
+</li>
+
+<li>
+<p>The digits 1 to 9 inclusive (see <a href="#tag_09_03_06">BREs Matching Multiple Characters</a>)</p>
+</li>
+
+<li>
+<p>A character inside a bracket expression</p>
+</li>
+</ul>
+
+<h4><a name="tag_09_03_03"> 9.3.3 </a>BRE Special Characters</h4>
+
+<p>A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash,
+such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have
+their special meaning are as follows:</p>
+
+<dl compact>
+<dt><tt>.[\</tt></dt>
+
+<dd>The period, left-bracket, and backslash shall be special except when used in a bracket expression (see <a href=
+"#tag_09_03_05">RE Bracket Expression</a>). An expression containing a <tt>'['</tt> that is not preceded by a backslash and is not
+part of a bracket expression produces undefined results.</dd>
+
+<dt><tt>*</tt></dt>
+
+<dd>The asterisk shall be special except when used:
+
+<ul>
+<li>
+<p>In a bracket expression</p>
+</li>
+
+<li>
+<p>As the first character of an entire BRE (after an initial <tt>'^'</tt>, if any)</p>
+</li>
+
+<li>
+<p>As the first character of a subexpression (after an initial <tt>'^'</tt>, if any); see <a href="#tag_09_03_06">BREs Matching
+Multiple Characters</a></p>
+</li>
+</ul>
+</dd>
+
+<dt><tt>^</tt></dt>
+
+<dd>The circumflex shall be special when used as:
+
+<ul>
+<li>
+<p>An anchor (see <a href="#tag_09_03_08">BRE Expression Anchoring</a>)</p>
+</li>
+
+<li>
+<p>The first character of a bracket expression (see <a href="#tag_09_03_05">RE Bracket Expression</a>)</p>
+</li>
+</ul>
+</dd>
+
+<dt><tt>$</tt></dt>
+
+<dd>The dollar sign shall be special when used as an anchor.</dd>
+</dl>
+
+<h4><a name="tag_09_03_04"> 9.3.4 </a>Periods in BREs</h4>
+
+<p>A period ( <tt>'.'</tt> ), when used outside a bracket expression, is a BRE that shall match any character in the supported
+character set except NUL.</p>
+
+<h4><a name="tag_09_03_05"> 9.3.5 </a>RE Bracket Expression</h4>
+
+<p>A bracket expression (an expression enclosed in square brackets, <tt>"[]"</tt> ) is an RE that shall match a single collating
+element contained in the non-empty set of collating elements represented by the bracket expression.</p>
+
+<p>The following rules and definitions apply to bracket expressions:</p>
+
+<ol>
+<li>
+<p>A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more
+expressions: collating elements, collating symbols, equivalence classes, character classes, or range expressions. The right-bracket
+( <tt>']'</tt> ) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after
+an initial circumflex ( <tt>'^'</tt> ), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a
+collating symbol (such as <tt>"[.].]"</tt> ) or is the ending right-bracket for a collating symbol, equivalence class, or character
+class. The special characters <tt>'.'</tt>, <tt>'*'</tt>, <tt>'['</tt>, and <tt>'\'</tt> (period, asterisk, left-bracket, and
+backslash, respectively) shall lose their special meaning within a bracket expression.</p>
+
+<p>The character sequences <tt>"[."</tt>, <tt>"[="</tt>, and <tt>"[:"</tt> (left-bracket followed by a period, equals-sign, or
+colon) shall be special inside a bracket expression and are used to delimit collating symbols, equivalence class expressions, and
+character class expressions. These symbols shall be followed by a valid expression and the matching terminating sequence
+<tt>".]"</tt>, <tt>"=]"</tt>, or <tt>":]"</tt>, as described in the following items.</p>
+</li>
+
+<li>
+<p>A matching list expression specifies a list that shall match any single-character collating element in any of the expressions
+represented in the list. The first character in the list shall not be the circumflex; for example, <tt>"[abc]"</tt> is an RE that
+matches any of the characters <tt>'a'</tt>, <tt>'b'</tt>, or <tt>'c'</tt>. It is unspecified whether a matching list expression
+matches a multi-character collating element that is matched by one of the expressions.</p>
+</li>
+
+<li>
+<p>A non-matching list expression begins with a circumflex ( <tt>'^'</tt> ), and specifies a list that shall match any
+single-character collating element except for the expressions represented in the list after the leading circumflex. For example,
+<tt>"[^abc]"</tt> is an RE that matches any character except the characters <tt>'a'</tt>, <tt>'b'</tt>, or <tt>'c'</tt>. It is
+unspecified whether a non-matching list expression matches a multi-character collating element that is not matched by any of the
+expressions. The circumflex shall have this special meaning only when it occurs first in the list, immediately following the
+left-bracket.</p>
+</li>
+
+<li>
+<p>A collating symbol is a collating element enclosed within bracket-period ( <tt>"[."</tt> and <tt>".]"</tt> ) delimiters.
+Collating elements are defined as described in <a href="xbd_chap07.html#tag_07_03_02_04"><i>Collation Order</i></a>. Conforming
+applications shall represent multi-character collating elements as collating symbols when it is necessary to distinguish them from
+a list of the individual characters that make up the multi-character collating element. For example, if the string <tt>"ch"</tt> is
+a collating element defined using the line:</p>
+
+<blockquote>
+<pre>
+<tt>collating-element &lt;ch-digraph&gt; from "&lt;c&gt;&lt;h&gt;"
+</tt>
+</pre>
+</blockquote>
+
+<p>in the locale definition, the expression <tt>"[[.ch.]]"</tt> shall be treated as an RE containing the collating symbol
+<tt>'ch'</tt>, while <tt>"[ch]"</tt> shall be treated as an RE matching <tt>'c'</tt> or <tt>'h'</tt>. Collating symbols are
+recognized only inside bracket expressions. If the string is not a collating element in the current locale, the expression is
+invalid.</p>
+</li>
+
+<li>
+<p>An equivalence class expression shall represent the set of collating elements belonging to an equivalence class, as described in
+<a href="xbd_chap07.html#tag_07_03_02_04"><i>Collation Order</i></a>. Only primary equivalence classes shall be recognized. The
+class shall be expressed by enclosing any one of the collating elements in the equivalence class within bracket-equal (
+<tt>"[="</tt> and <tt>"=]"</tt> ) delimiters. For example, if <tt>'a'</tt>, <tt>'&agrave;'</tt>, and <tt>'&acirc;'</tt> belong to
+the same equivalence class, then <tt>"[[=a=]b]"</tt>, <tt>"[[=&agrave;=]b]"</tt>, and <tt>"[[=&acirc;=]b]"</tt> are each
+equivalent to <tt>"[a&agrave;&acirc;b]"</tt>. If the collating element does not belong to an equivalence class, the equivalence
+class expression shall be treated as a collating symbol.</p>
+</li>
+
+<li>
+<p>A character class expression shall represent the union of two sets:</p>
+
+<ol type="a">
+<li>
+<p>The set of single-character collating elements whose characters belong to the character class, as defined in the <i>LC_CTYPE</i>
+category in the current locale.</p>
+</li>
+
+<li>
+<p>An unspecified set of multi-character collating elements.</p>
+</li>
+</ol>
+
+<p>All character classes specified in the current locale shall be recognized. A character class expression is expressed as a
+character class name enclosed within bracket-colon ( <tt>"[:"</tt> and <tt>":]"</tt> ) delimiters.</p>
+
+<p>The following character class expressions shall be supported in all locales:</p>
+
+<blockquote>
+<pre>
+<tt>[:alnum:] [:cntrl:] [:lower:] [:space:]
+[:alpha:] [:digit:] [:print:] [:upper:]
+[:blank:] [:graph:] [:punct:] [:xdigit:]
+</tt>
+</pre>
+</blockquote>
+
+<p>In addition, character class expressions of the form:</p>
+
+<blockquote>
+<pre>
+<tt>[:</tt><i>name</i><tt>:]
+</tt>
+</pre>
+</blockquote>
+
+<p>are recognized in those locales where the <i>name</i> keyword has been given a <b>charclass</b> definition in the
+<i>LC_CTYPE</i> category.</p>
+</li>
+
+<li>
+<p>In the POSIX locale, a range expression represents the set of collating elements that fall between two elements in the collation
+sequence, inclusive. In other locales, a range expression has unspecified behavior: strictly conforming applications shall not rely
+on whether the range expression is valid, or on the set of collating elements matched. A range expression shall be expressed as the
+starting point and the ending point separated by a hyphen ( <tt>'-'</tt> ).</p>
+
+<p>In the following, all examples assume the POSIX locale.</p>
+
+<p>The starting range point and the ending range point shall be a collating element or collating symbol. An equivalence class
+expression used as a starting or ending point of a range expression produces unspecified results. An equivalence class can be used
+portably within a bracket expression, but only outside the range. If the represented set of collating elements is empty, it is
+unspecified whether the expression matches nothing, or is treated as invalid.</p>
+
+<p>The interpretation of range expressions where the ending range point is also the starting range point of a subsequent range
+expression (for example, <tt>"[a-m-o]"</tt> ) is undefined.</p>
+
+<p>The hyphen character shall be treated as itself if it occurs first (after an initial <tt>'^'</tt>, if any) or last in the list,
+or as an ending range point in a range expression. As examples, the expressions <tt>"[-ac]"</tt> and <tt>"[ac-]"</tt> are
+equivalent and match any of the characters <tt>'a'</tt>, <tt>'c'</tt>, or <tt>'-'</tt> ; <tt>"[^-ac]"</tt> and <tt>"[^ac-]"</tt>
+are equivalent and match any characters except <tt>'a'</tt>, <tt>'c'</tt>, or <tt>'-'</tt> ; the expression <tt>"[%--]"</tt>
+matches any of the characters between <tt>'%'</tt> and <tt>'-'</tt> inclusive; the expression <tt>"[--@]"</tt> matches any of the
+characters between <tt>'-'</tt> and <tt>'@'</tt> inclusive; and the expression <tt>"[a--@]"</tt> is either invalid or equivalent to
+<tt>'@'</tt>, because the letter <tt>'a'</tt> follows the symbol <tt>'-'</tt> in the POSIX locale. To use a hyphen as the starting
+range point, it shall either come first in the bracket expression or be specified as a collating symbol; for example,
+<tt>"[][.-.]-0]"</tt>, which matches either a right bracket or any character or collating element that collates between hyphen and
+0, inclusive.</p>
+
+<p>If a bracket expression specifies both <tt>'-'</tt> and <tt>']'</tt>, the <tt>']'</tt> shall be placed first (after the
+<tt>'^'</tt>, if any) and the <tt>'-'</tt> last within the bracket expression.</p>
+</li>
+</ol>
+
+<h4><a name="tag_09_03_06"> 9.3.6 </a>BREs Matching Multiple Characters</h4>
+
+<p>The following rules can be used to construct BREs matching multiple characters from BREs matching a single character:</p>
+
+<ol>
+<li>
+<p>The concatenation of BREs shall match the concatenation of the strings matched by each component of the BRE.</p>
+</li>
+
+<li>
+<p>A subexpression can be defined within a BRE by enclosing it between the character pairs <tt>"\("</tt> and <tt>"\)"</tt>. Such a
+subexpression shall match whatever it would have matched without the <tt>"\("</tt> and <tt>"\)"</tt>, except that anchoring within
+subexpressions is optional behavior; see <a href="#tag_09_03_08">BRE Expression Anchoring</a>. Subexpressions can be arbitrarily
+nested.</p>
+</li>
+
+<li>
+<p>The back-reference expression <tt>'\n'</tt> shall match the same (possibly empty) string of characters as was matched by a
+subexpression enclosed between <tt>"\("</tt> and <tt>"\)"</tt> preceding the <tt>'\n'</tt>. The character <tt>'n'</tt> shall be a
+digit from 1 through 9, specifying the <i>n</i>th subexpression (the one that begins with the <i>n</i>th <tt>"\("</tt> from the
+beginning of the pattern and ends with the corresponding paired <tt>"\)"</tt> ). The expression is invalid if less than <i>n</i>
+subexpressions precede the <tt>'\n'</tt>. For example, the expression <tt>"\(.*\)\1$"</tt> matches a line consisting of two
+adjacent appearances of the same string, and the expression <tt>"\(a\)*\1"</tt> fails to match <tt>'a'</tt>. When the referenced
+subexpression matched more than one string, the back-referenced expression shall refer to the last matched string. If the
+subexpression referenced by the back-reference matches more than one string because of an asterisk ( <tt>'*'</tt> ) or an interval
+expression (see item (5)), the back-reference shall match the last (rightmost) of these strings.</p>
+</li>
+
+<li>
+<p>When a BRE matching a single character, a subexpression, or a back-reference is followed by the special character asterisk (
+<tt>'*'</tt> ), together with that asterisk it shall match what zero or more consecutive occurrences of the BRE would match. For
+example, <tt>"[ab]*"</tt> and <tt>"[ab][ab]"</tt> are equivalent when matching the string <tt>"ab"</tt>.</p>
+</li>
+
+<li>
+<p>When a BRE matching a single character, a subexpression, or a back-reference is followed by an interval expression of the format
+<tt>"\{m\}"</tt>, <tt>"\{m,\}"</tt>, or <tt>"\{m,n\}"</tt>, together with that interval expression it shall match what repeated
+consecutive occurrences of the BRE would match. The values of <i>m</i> and <i>n</i> are decimal integers in the range 0 &lt;=
+<i>m</i>&lt;= <i>n</i>&lt;= {RE_DUP_MAX}, where <i>m</i> specifies the exact or minimum number of occurrences and <i>n</i>
+specifies the maximum number of occurrences. The expression <tt>"\{m\}"</tt> shall match exactly <i>m</i> occurrences of the
+preceding BRE, <tt>"\{m,\}"</tt> shall match at least <i>m</i> occurrences, and <tt>"\{m,n\}"</tt> shall match any number of
+occurrences between <i>m</i> and <i>n</i>, inclusive.</p>
+
+<p>For example, in the string <tt>"abababccccccd"</tt> the BRE <tt>"c\{3\}"</tt> is matched by characters seven to nine, the BRE
+<tt>"\(ab\)\{4,\}"</tt> is not matched at all, and the BRE <tt>"c\{1,3\}d"</tt> is matched by characters ten to thirteen.</p>
+</li>
+</ol>
+
+<p>The behavior of multiple adjacent duplication symbols ( <tt>'*'</tt> and intervals) produces undefined results.</p>
+
+<p>A subexpression repeated by an asterisk ( <tt>'*'</tt> ) or an interval expression shall not match a null expression unless this
+is the only match for the repetition or it is necessary to satisfy the exact or minimum number of occurrences for the interval
+expression.</p>
+
+<h4><a name="tag_09_03_07"> 9.3.7 </a>BRE Precedence</h4>
+
+<p>The order of precedence shall be as shown in the following table:</p>
+
+<center>
+<table border="1" cellpadding="3" align="center">
+<tr valign="top">
+<th colspan="2" align="center">
+<p class="tent"><b>BRE Precedence (from high to low)</b></p>
+</th>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Collation-related bracket symbols</p>
+</td>
+<td align="left">
+<p class="tent">[==] [::] [..]</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Escaped characters</p>
+</td>
+<td align="left">
+<p class="tent">\&lt;special character&gt;</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Bracket expression</p>
+</td>
+<td align="left">
+<p class="tent">[]</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Subexpressions/back-references</p>
+</td>
+<td align="left">
+<p class="tent">\(\) \n</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Single-character-BRE duplication</p>
+</td>
+<td align="left">
+<p class="tent">* \{m,n\}</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Concatenation</p>
+</td>
+<td align="left">
+<p class="tent">&nbsp;</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Anchoring</p>
+</td>
+<td align="left">
+<p class="tent">^ $</p>
+</td>
+</tr>
+</table>
+</center>
+
+<h4><a name="tag_09_03_08"> 9.3.8 </a>BRE Expression Anchoring</h4>
+
+<p>A BRE can be limited to matching strings that begin or end a line; this is called &quot;anchoring&quot;. The circumflex and dollar sign
+special characters shall be considered BRE anchors in the following contexts:</p>
+
+<ol>
+<li>
+<p>A circumflex ( <tt>'^'</tt> ) shall be an anchor when used as the first character of an entire BRE. The implementation may treat
+the circumflex as an anchor when used as the first character of a subexpression. The circumflex shall anchor the expression (or
+optionally subexpression) to the beginning of a string; only sequences starting at the first character of a string shall be matched
+by the BRE. For example, the BRE <tt>"^ab"</tt> matches <tt>"ab"</tt> in the string <tt>"abcdef"</tt>, but fails to match in the
+string <tt>"cdefab"</tt>. The BRE <tt>"\(^ab\)"</tt> may match the former string. A portable BRE shall escape a leading circumflex
+in a subexpression to match a literal circumflex.</p>
+</li>
+
+<li>
+<p>A dollar sign ( <tt>'$'</tt> ) shall be an anchor when used as the last character of an entire BRE. The implementation may treat
+a dollar sign as an anchor when used as the last character of a subexpression. The dollar sign shall anchor the expression (or
+optionally subexpression) to the end of the string being matched; the dollar sign can be said to match the end-of-string following
+the last character.</p>
+</li>
+
+<li>
+<p>A BRE anchored by both <tt>'^'</tt> and <tt>'$'</tt> shall match only an entire string. For example, the BRE <tt>"^abcdef$"</tt>
+matches strings consisting only of <tt>"abcdef"</tt>.</p>
+</li>
+</ol>
+
+<h3><a name="tag_09_04"> 9.4 </a>Extended Regular Expressions</h3>
+
+<p>The extended regular expression (ERE) notation and construction rules shall apply to utilities defined as using extended regular
+expressions; any exceptions to the following rules are noted in the descriptions of the specific utilities using EREs.</p>
+
+<h4><a name="tag_09_04_01"> 9.4.1 </a>EREs Matching a Single Character or Collating Element</h4>
+
+<p>An ERE ordinary character, a special character preceded by a backslash, or a period shall match a single character. A bracket
+expression shall match a single character or a single collating element. An ERE matching a single character enclosed in parentheses
+shall match the same as the ERE without parentheses would have matched.</p>
+
+<h4><a name="tag_09_04_02"> 9.4.2 </a>ERE Ordinary Characters</h4>
+
+<p>An ordinary character is an ERE that matches itself. An ordinary character is any character in the supported character set,
+except for the ERE special characters listed in <a href="#tag_09_04_03">ERE Special Characters</a>. The interpretation of an
+ordinary character preceded by a backslash ( <tt>'\'</tt> ) is undefined.</p>
+
+<h4><a name="tag_09_04_03"> 9.4.3 </a>ERE Special Characters</h4>
+
+<p>An ERE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash,
+such a character shall be an ERE that matches the special character itself. The extended regular expression special characters and
+the contexts in which they shall have their special meaning are as follows:</p>
+
+<dl compact>
+<dt><tt>.[\(</tt></dt>
+
+<dd>The period, left-bracket, backslash, and left-parenthesis shall be special except when used in a bracket expression (see <a
+href="#tag_09_03_05">RE Bracket Expression</a>). Outside a bracket expression, a left-parenthesis immediately followed by a
+right-parenthesis produces undefined results.</dd>
+
+<dt><tt>)</tt></dt>
+
+<dd>The right-parenthesis shall be special when matched with a preceding left-parenthesis, both outside a bracket expression.</dd>
+
+<dt><tt>*+?{</tt></dt>
+
+<dd>The asterisk, plus-sign, question-mark, and left-brace shall be special except when used in a bracket expression (see <a href=
+"#tag_09_03_05">RE Bracket Expression</a>). Any of the following uses produce undefined results:
+
+<ul>
+<li>
+<p>If these characters appear first in an ERE, or immediately following a vertical-line, circumflex, or left-parenthesis</p>
+</li>
+
+<li>
+<p>If a left-brace is not part of a valid interval expression (see <a href="#tag_09_04_06">EREs Matching Multiple Characters</a>
+)</p>
+</li>
+</ul>
+</dd>
+
+<dt><tt>|</tt></dt>
+
+<dd>The vertical-line is special except when used in a bracket expression (see <a href="#tag_09_03_05">RE Bracket Expression</a>).
+A vertical-line appearing first or last in an ERE, or immediately following a vertical-line or a left-parenthesis, or immediately
+preceding a right-parenthesis, produces undefined results.</dd>
+
+<dt><tt>^</tt></dt>
+
+<dd>The circumflex shall be special when used as:
+
+<ul>
+<li>
+<p>An anchor (see <a href="#tag_09_04_09">ERE Expression Anchoring</a>)</p>
+</li>
+
+<li>
+<p>The first character of a bracket expression (see <a href="#tag_09_03_05">RE Bracket Expression</a>)</p>
+</li>
+</ul>
+</dd>
+
+<dt><tt>$</tt></dt>
+
+<dd>The dollar sign shall be special when used as an anchor.</dd>
+</dl>
+
+<h4><a name="tag_09_04_04"> 9.4.4 </a>Periods in EREs</h4>
+
+<p>A period ( <tt>'.'</tt> ), when used outside a bracket expression, is an ERE that shall match any character in the supported
+character set except NUL.</p>
+
+<h4><a name="tag_09_04_05"> 9.4.5 </a>ERE Bracket Expression</h4>
+
+<p>The rules for ERE Bracket Expressions are the same as for Basic Regular Expressions; see <a href="#tag_09_03_05">RE Bracket
+Expression</a>.</p>
+
+<h4><a name="tag_09_04_06"> 9.4.6 </a>EREs Matching Multiple Characters</h4>
+
+<p>The following rules shall be used to construct EREs matching multiple characters from EREs matching a single character:</p>
+
+<ol>
+<li>
+<p>A concatenation of EREs shall match the concatenation of the character sequences matched by each component of the ERE. A
+concatenation of EREs enclosed in parentheses shall match whatever the concatenation without the parentheses matches. For example,
+both the ERE <tt>"cd"</tt> and the ERE <tt>"(cd)"</tt> are matched by the third and fourth character of the string
+<tt>"abcdefabcdef"</tt>.</p>
+</li>
+
+<li>
+<p>When an ERE matching a single character or an ERE enclosed in parentheses is followed by the special character plus-sign (
+<tt>'+'</tt> ), together with that plus-sign it shall match what one or more consecutive occurrences of the ERE would match. For
+example, the ERE <tt>"b+(bc)"</tt> matches the fourth to seventh characters in the string <tt>"acabbbcde"</tt>. And,
+<tt>"[ab]+"</tt> and <tt>"[ab][ab]*"</tt> are equivalent.</p>
+</li>
+
+<li>
+<p>When an ERE matching a single character or an ERE enclosed in parentheses is followed by the special character asterisk (
+<tt>'*'</tt> ), together with that asterisk it shall match what zero or more consecutive occurrences of the ERE would match. For
+example, the ERE <tt>"b*c"</tt> matches the first character in the string <tt>"cabbbcde"</tt>, and the ERE <tt>"b*cd"</tt> matches
+the third to seventh characters in the string <tt>"cabbbcdebbbbbbcdbc"</tt>. And, <tt>"[ab]*"</tt> and <tt>"[ab][ab]"</tt> are
+equivalent when matching the string <tt>"ab"</tt>.</p>
+</li>
+
+<li>
+<p>When an ERE matching a single character or an ERE enclosed in parentheses is followed by the special character question-mark (
+<tt>'?'</tt> ), together with that question-mark it shall match what zero or one consecutive occurrences of the ERE would match.
+For example, the ERE <tt>"b?c"</tt> matches the second character in the string <tt>"acabbbcde"</tt>.</p>
+</li>
+
+<li>
+<p>When an ERE matching a single character or an ERE enclosed in parentheses is followed by an interval expression of the format
+<tt>"{m}"</tt>, <tt>"{m,}"</tt>, or <tt>"{m,n}"</tt>, together with that interval expression it shall match what repeated
+consecutive occurrences of the ERE would match. The values of <i>m</i> and <i>n</i> are decimal integers in the range 0 &lt;=
+<i>m</i>&lt;= <i>n</i>&lt;= {RE_DUP_MAX}, where <i>m</i> specifies the exact or minimum number of occurrences and <i>n</i>
+specifies the maximum number of occurrences. The expression <tt>"{m}"</tt> matches exactly <i>m</i> occurrences of the preceding
+ERE, <tt>"{m,}"</tt> matches at least <i>m</i> occurrences, and <tt>"{m,n}"</tt> matches any number of occurrences between <i>m</i>
+and <i>n</i>, inclusive.</p>
+
+<p>For example, in the string <tt>"abababccccccd"</tt> the ERE <tt>"c{3}"</tt> is matched by characters seven to nine and the ERE
+<tt>"(ab){2,}"</tt> is matched by characters one to six.</p>
+</li>
+</ol>
+
+<p>The behavior of multiple adjacent duplication symbols ( <tt>'+'</tt>, <tt>'*'</tt>, <tt>'?'</tt>, and intervals) produces
+undefined results.</p>
+
+<p>An ERE matching a single character repeated by an <tt>'*'</tt>, <tt>'?'</tt>, or an interval expression shall not match a null
+expression unless this is the only match for the repetition or it is necessary to satisfy the exact or minimum number of
+occurrences for the interval expression.</p>
+
+<h4><a name="tag_09_04_07"> 9.4.7 </a>ERE Alternation</h4>
+
+<p>Two EREs separated by the special character vertical-line ( <tt>'|'</tt> ) shall match a string that is matched by either. For
+example, the ERE <tt>"a((bc)|d)"</tt> matches the string <tt>"abc"</tt> and the string <tt>"ad"</tt>. Single characters, or
+expressions matching single characters, separated by the vertical bar and enclosed in parentheses, shall be treated as an ERE
+matching a single character.</p>
+
+<h4><a name="tag_09_04_08"> 9.4.8 </a>ERE Precedence</h4>
+
+<p>The order of precedence shall be as shown in the following table:</p>
+
+<center>
+<table border="1" cellpadding="3" align="center">
+<tr valign="top">
+<th colspan="2" align="center">
+<p class="tent"><b>ERE Precedence (from high to low)</b></p>
+</th>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Collation-related bracket symbols</p>
+</td>
+<td align="left">
+<p class="tent">[==] [::] [..]</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Escaped characters</p>
+</td>
+<td align="left">
+<p class="tent">\&lt;special character&gt;</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Bracket expression</p>
+</td>
+<td align="left">
+<p class="tent">[]</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Grouping</p>
+</td>
+<td align="left">
+<p class="tent">()</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Single-character-ERE duplication</p>
+</td>
+<td align="left">
+<p class="tent">* + ? {m,n}</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Concatenation</p>
+</td>
+<td align="left">
+<p class="tent">&nbsp;</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Anchoring</p>
+</td>
+<td align="left">
+<p class="tent">^ $</p>
+</td>
+</tr>
+
+<tr valign="top">
+<td align="left">
+<p class="tent">Alternation</p>
+</td>
+<td align="left">
+<p class="tent">|</p>
+</td>
+</tr>
+</table>
+</center>
+
+<p>For example, the ERE <tt>"abba|cde"</tt> matches either the string <tt>"abba"</tt> or the string <tt>"cde"</tt> (rather than the
+string <tt>"abbade"</tt> or <tt>"abbcde"</tt>, because concatenation has a higher order of precedence than alternation).</p>
+
+<h4><a name="tag_09_04_09"> 9.4.9 </a>ERE Expression Anchoring</h4>
+
+<p>An ERE can be limited to matching strings that begin or end a line; this is called &quot;anchoring&quot;. The circumflex and dollar sign
+special characters shall be considered ERE anchors when used anywhere outside a bracket expression. This shall have the following
+effects:</p>
+
+<ol>
+<li>
+<p>A circumflex ( <tt>'^'</tt> ) outside a bracket expression shall anchor the expression or subexpression it begins to the
+beginning of a string; such an expression or subexpression can match only a sequence starting at the first character of a string.
+For example, the EREs <tt>"^ab"</tt> and <tt>"(^ab)"</tt> match <tt>"ab"</tt> in the string <tt>"abcdef"</tt>, but fail to match
+in the string <tt>"cdefab"</tt>, and the ERE <tt>"a^b"</tt> is valid, but can never match because the <tt>'a'</tt> prevents the
+expression <tt>"^b"</tt> from matching starting at the first character.</p>
+</li>
+
+<li>
+<p>A dollar sign ( <tt>'$'</tt> ) outside a bracket expression shall anchor the expression or subexpression it ends to the end of a
+string; such an expression or subexpression can match only a sequence ending at the last character of a string. For example, the
+EREs <tt>"ef$"</tt> and <tt>"(ef$)"</tt> match <tt>"ef"</tt> in the string <tt>"abcdef"</tt>, but fail to match in the string
+<tt>"cdefab"</tt>, and the ERE <tt>"e$f"</tt> is valid, but can never match because the <tt>'f'</tt> prevents the expression
+<tt>"e$"</tt> from matching ending at the last character.</p>
+</li>
+</ol>
+
+<h3><a name="tag_09_05"> 9.5 </a>Regular Expression Grammar</h3>
+
+<p>Grammars describing the syntax of both basic and extended regular expressions are presented in this section. The grammar takes
+precedence over the text. See the Shell and Utilities volume of IEEE&nbsp;Std&nbsp;1003.1-2001, <a href=
+"../utilities/xcu_chap01.html#tag_01_10">Section 1.10, Grammar Conventions</a>.</p>
+
+<h4><a name="tag_09_05_01"> 9.5.1 </a>BRE/ERE Grammar Lexical Conventions</h4>
+
+<p>The lexical conventions for regular expressions are as described in this section.</p>
+
+<p>Except as noted, the longest possible token or delimiter beginning at a given point is recognized.</p>
+
+<p>The following tokens are processed (in addition to those string constants shown in the grammar):</p>
+
+<dl compact>
+<dt><b>COLL_ELEM_SINGLE</b></dt>
+
+<dd>
+Any single-character collating element, unless it is a <b>META_CHAR</b>.</dd>
+
+<dt><b>COLL_ELEM_MULTI</b></dt>
+
+<dd>
+Any multi-character collating element.</dd>
+
+<dt><b>BACKREF</b></dt>
+
+<dd>Applicable only to basic regular expressions. The character string consisting of <tt>'\'</tt> followed by a single-digit
+numeral, <tt>'1'</tt> to <tt>'9'</tt>.</dd>
+
+<dt><b>DUP_COUNT</b></dt>
+
+<dd>Represents a numeric constant. It shall be an integer in the range 0 &lt;= <b>DUP_COUNT</b> &lt;= {RE_DUP_MAX}. This token is
+only recognized when the context of the grammar requires it. At all other times, digits not preceded by <tt>'\'</tt> are treated as
+<b>ORD_CHAR</b>.</dd>
+
+<dt><b>META_CHAR</b></dt>
+
+<dd>One of the characters:
+
+<dl compact>
+<dt><tt>^</tt></dt>
+
+<dd>When found first in a bracket expression</dd>
+
+<dt><tt>-</tt></dt>
+
+<dd>When found anywhere but first (after an initial <tt>'^'</tt>, if any) or last in a bracket expression, or as the ending range
+point in a range expression</dd>
+
+<dt><tt>]</tt></dt>
+
+<dd>When found anywhere but first (after an initial <tt>'^'</tt>, if any) in a bracket expression</dd>
+</dl>
+</dd>
+
+<dt><b>L_ANCHOR</b></dt>
+
+<dd>Applicable only to basic regular expressions. The character <tt>'^'</tt> when it appears as the first character of a basic
+regular expression and when not <b>QUOTED_CHAR</b>. The <tt>'^'</tt> may be recognized as an anchor elsewhere; see <a href=
+"#tag_09_03_08">BRE Expression Anchoring</a>.</dd>
+
+<dt><b>ORD_CHAR</b></dt>
+
+<dd>A character, other than one of the special characters in <b>SPEC_CHAR</b>.</dd>
+
+<dt><b>QUOTED_CHAR</b></dt>
+
+<dd>In a BRE, one of the character sequences:
+
+<blockquote>
+<pre>
+<tt>\^ \. \* \[ \$ \\
+</tt>
+</pre>
+</blockquote>
+
+<p>In an ERE, one of the character sequences:</p>
+
+<blockquote>
+<pre>
+<tt>\^ \. \[ \$ \( \) \|
+\* \+ \? \{ \\
+</tt>
+</pre>
+</blockquote>
+</dd>
+
+<dt><b>R_ANCHOR</b></dt>
+
+<dd>(Applicable only to basic regular expressions.) The character <tt>'$'</tt> when it appears as the last character of a basic
+regular expression and when not <b>QUOTED_CHAR</b>. The <tt>'$'</tt> may be recognized as an anchor elsewhere; see <a href=
+"#tag_09_03_08">BRE Expression Anchoring</a>.</dd>
+
+<dt><b>SPEC_CHAR</b></dt>
+
+<dd>For basic regular expressions, one of the following special characters:
+
+<dl compact>
+<dt><tt>.</tt></dt>
+
+<dd>Anywhere outside bracket expressions</dd>
+
+<dt><tt>\</tt></dt>
+
+<dd>Anywhere outside bracket expressions</dd>
+
+<dt><tt>[</tt></dt>
+
+<dd>Anywhere outside bracket expressions</dd>
+
+<dt><tt>^</tt></dt>
+
+<dd>When used as an anchor (see <a href="#tag_09_03_08">BRE Expression Anchoring</a>) or when first in a bracket expression</dd>
+
+<dt><tt>$</tt></dt>
+
+<dd>When used as an anchor</dd>
+
+<dt><tt>*</tt></dt>
+
+<dd>Anywhere except first in an entire RE, anywhere in a bracket expression, directly following <tt>"\("</tt>, directly following
+an anchoring <tt>'^'</tt></dd>
+</dl>
+
+<p>For extended regular expressions, shall be one of the following special characters found anywhere outside bracket
+expressions:</p>
+
+<blockquote>
+<pre>
+<tt>^ . [ $ ( ) |
+* + ? { \
+</tt>
+</pre>
+</blockquote>
+
+<p>(The close-parenthesis shall be considered special in this context only if matched with a preceding open-parenthesis.)</p>
+</dd>
+</dl>
+
+<h4><a name="tag_09_05_02"> 9.5.2 </a>RE and Bracket Expression Grammar</h4>
+
+<p>This section presents the grammar for basic regular expressions, including the bracket expression grammar that is common to both
+BREs and EREs.</p>
+
+<pre>
+<tt>%token ORD_CHAR QUOTED_CHAR DUP_COUNT
+<br>
+%token BACKREF L_ANCHOR R_ANCHOR
+<br>
+%token Back_open_paren Back_close_paren
+/* '\(' '\)' */
+<br>
+%token Back_open_brace Back_close_brace
+/* '\{' '\}' */
+<br>
+/* The following tokens are for the Bracket Expression
+ grammar common to both REs and EREs. */
+<br>
+%token COLL_ELEM_SINGLE COLL_ELEM_MULTI META_CHAR
+<br>
+%token Open_equal Equal_close Open_dot Dot_close Open_colon Colon_close
+/* '[=' '=]' '[.' '.]' '[:' ':]' */
+<br>
+%token class_name
+/* class_name is a keyword to the LC_CTYPE locale category */
+/* (representing a character class) in the current locale */
+/* and is only recognized between [: and :] */
+<br>
+%start basic_reg_exp
+%%
+<br>
+/* --------------------------------------------
+ Basic Regular Expression
+ --------------------------------------------
+*/
+basic_reg_exp : RE_expression
+ | L_ANCHOR
+ | R_ANCHOR
+ | L_ANCHOR R_ANCHOR
+ | L_ANCHOR RE_expression
+ | RE_expression R_ANCHOR
+ | L_ANCHOR RE_expression R_ANCHOR
+ ;
+RE_expression : simple_RE
+ | RE_expression simple_RE
+ ;
+simple_RE : nondupl_RE
+ | nondupl_RE RE_dupl_symbol
+ ;
+nondupl_RE : one_char_or_coll_elem_RE
+ | Back_open_paren RE_expression Back_close_paren
+ | BACKREF
+ ;
+one_char_or_coll_elem_RE : ORD_CHAR
+ | QUOTED_CHAR
+ | '.'
+ | bracket_expression
+ ;
+RE_dupl_symbol : '*'
+ | Back_open_brace DUP_COUNT Back_close_brace
+ | Back_open_brace DUP_COUNT ',' Back_close_brace
+ | Back_open_brace DUP_COUNT ',' DUP_COUNT Back_close_brace
+ ;
+<br>
+/* --------------------------------------------
+ Bracket Expression
+ -------------------------------------------
+*/
+bracket_expression : '[' matching_list ']'
+ | '[' nonmatching_list ']'
+ ;
+matching_list : bracket_list
+ ;
+nonmatching_list : '^' bracket_list
+ ;
+bracket_list : follow_list
+ | follow_list '-'
+ ;
+follow_list : expression_term
+ | follow_list expression_term
+ ;
+expression_term : single_expression
+ | range_expression
+ ;
+single_expression : end_range
+ | character_class
+ | equivalence_class
+ ;
+range_expression : start_range end_range
+ | start_range '-'
+ ;
+start_range : end_range '-'
+ ;
+end_range : COLL_ELEM_SINGLE
+ | collating_symbol
+ ;
+collating_symbol : Open_dot COLL_ELEM_SINGLE Dot_close
+ | Open_dot COLL_ELEM_MULTI Dot_close
+ | Open_dot META_CHAR Dot_close
+ ;
+equivalence_class : Open_equal COLL_ELEM_SINGLE Equal_close
+ | Open_equal COLL_ELEM_MULTI Equal_close
+ ;
+character_class : Open_colon class_name Colon_close
+ ;
+</tt>
+</pre>
+
+<p>The BRE grammar does not permit <b>L_ANCHOR</b> or <b>R_ANCHOR</b> inside <tt>"\("</tt> and <tt>"\)"</tt> (which implies that
+<tt>'^'</tt> and <tt>'$'</tt> are ordinary characters). This reflects the semantic limits on the application, as noted in <a href=
+"#tag_09_03_08">BRE Expression Anchoring</a>. Implementations are permitted to extend the language to interpret <tt>'^'</tt> and
+<tt>'$'</tt> as anchors in these locations, and as such, conforming applications cannot use unescaped <tt>'^'</tt> and <tt>'$'</tt>
+in positions inside <tt>"\("</tt> and <tt>"\)"</tt> that might be interpreted as anchors.</p>
+
+<h4><a name="tag_09_05_03"> 9.5.3 </a>ERE Grammar</h4>
+
+<p>This section presents the grammar for extended regular expressions, excluding the bracket expression grammar. <basefont size=
+"2"></p>
+
+<dl>
+<dt><b>Note:</b></dt>
+
+<dd>The bracket expression grammar and the associated <b>%token</b> lines are identical between BREs and EREs. It has been omitted
+from the ERE section to avoid unnecessary editorial duplication.</dd>
+</dl>
+
+<basefont size="3">
+
+<pre>
+<tt>%token ORD_CHAR QUOTED_CHAR DUP_COUNT
+%start extended_reg_exp
+%%
+<br>
+/* --------------------------------------------
+ Extended Regular Expression
+ --------------------------------------------
+*/
+extended_reg_exp : ERE_branch
+ | extended_reg_exp '|' ERE_branch
+ ;
+ERE_branch : ERE_expression
+ | ERE_branch ERE_expression
+ ;
+ERE_expression : one_char_or_coll_elem_ERE
+ | '^'
+ | '$'
+ | '(' extended_reg_exp ')'
+ | ERE_expression ERE_dupl_symbol
+ ;
+one_char_or_coll_elem_ERE : ORD_CHAR
+ | QUOTED_CHAR
+ | '.'
+ | bracket_expression
+ ;
+ERE_dupl_symbol : '*'
+ | '+'
+ | '?'
+ | '{' DUP_COUNT '}'
+ | '{' DUP_COUNT ',' '}'
+ | '{' DUP_COUNT ',' DUP_COUNT '}'
+ ;
+</tt>
+</pre>
+
+<p>The ERE grammar does not permit several constructs that previous sections specify as having undefined results:</p>
+
+<ul>
+<li>
+<p><b>ORD_CHAR</b> preceded by <tt>'\'</tt></p>
+</li>
+
+<li>
+<p>One or more <i>ERE_dupl_symbol</i>s appearing first in an ERE, or immediately following <tt>'|'</tt>, <tt>'^'</tt>, or
+<tt>'('</tt></p>
+</li>
+
+<li>
+<p><tt>'{'</tt> not part of a valid <i>ERE_dupl_symbol</i></p>
+</li>
+
+<li>
+<p><tt>'|'</tt> appearing first or last in an ERE, or immediately following <tt>'|'</tt> or <tt>'('</tt>, or immediately preceding
+<tt>')'</tt></p>
+</li>
+</ul>
+
+<p>Implementations are permitted to extend the language to allow these. Conforming applications cannot use such constructs.</p>
+
+<hr size="2" noshade>
+<center><font size="2"><!--footer start-->
+UNIX &reg; is a registered Trademark of The Open Group.<br>
+POSIX &reg; is a registered Trademark of The IEEE.<br>
+[ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href=
+"../utilities/contents.html">XCU</a> | <a href="../functions/contents.html">XSH</a> | <a href="../xrat/contents.html">XRAT</a>
+]</font></center>
+
+<!--footer end-->
+<hr size="2" noshade>
+</body>
+</html>
+
diff --git a/ps01_grep/ps01_bnewbold.txt b/ps01_grep/ps01_bnewbold.txt
new file mode 100644
index 0000000..52979e7
--- /dev/null
+++ b/ps01_grep/ps01_bnewbold.txt
@@ -0,0 +1,345 @@
+;;; 6.945 Problem Set #1
+;;; 02/10/2009
+;;; Bryan Newbold <bnewbold@mit.edu>
+
+
+Problem 1.1: Warmup
+---------------
+
+(define (r:+ pattern)
+ (r:repeat 1 #f pattern))
+
+(define (r:* pattern)
+ (r:repeat 0 #f pattern))
+
+(pp (r:grep (r:seq (r:quote "dog")
+ (r:+ (r:quote "cat")))
+ "tests.txt"))
+
+;("[09]. catdogcat" "[11]. dogdogcatdogdog" "[13]. acatdogdogcats")
+
+(pp (r:grep (r:seq (r:quote "cat")
+ (r:+ (r:char-from (string->char-set "ogd")))
+ (r:quote "cat"))
+ "tests.txt"))
+
+;("[09]. catdogcat" "[13]. acatdogdogcats")
+
+(pp (r:grep (r:seq " "
+ (r:+ (r:quote "cat"))
+ (r:eol))
+ "tests.txt"))
+; #f
+
+(pp (r:grep (r:* (r:quote "something that isn't there"))
+ "tests.txt"))
+; matches all
+
+(pp (r:grep (r:repeat 2 3 (r:* (r:char-from (string->char-set "tca"))))
+ "tests.txt"))
+; matches all: nothing repeated several times
+
+(pp (r:grep (r:seq (r:quote "dog")
+ (r:+ (r:quote "cat"))
+ (r:quote "dog"))
+ "tests.txt"))
+
+Problem 1.2: The Proposals
+---------------
+
+a) This would be the same as asking Louis Reasoner to explain why his
+suggestion is wrong; the problem lies within r:repeat, so re-calling r:repeat
+does not sidestep the problem.
+
+b) Bonnie's method is computationally more efficient because checking up to the
+minimum number of repetitions is only executed once; after that successive
+checks are made for additional repetitions. From a straight-forward
+perspective, Bonnie's approach also uses significantly fewer characters of
+final regex code: this could potentially (but unlikely) become important if
+large expressions were searched for with a large range of possible repetitions.
+Most importantly, Bonnie's method is more readable and brings the
+implementation more inline with the underlying philosophy of regular
+expressions (in my opinion).
+
+c) Ben's solution is the one that an experienced regex-er would think of first
+and is the closest conceptual analog to r:repeat in the actual specification.
+The code is compact and should be the most readable and explicit, though we'll
+see what kind of syntaxual issues crop up...
+
+d)
+
+(define (r:repeat min max expr)
+ (if (not (exact-nonnegative-integer? min))
+ (error "Min must be non-negative integer:" min))
+ (if max
+ (begin
+ (if (not (exact-nonnegative-integer? max))
+ (error "Max must be non-negative integer (or null):" max))
+ (if (not (<= min max))
+ (error "Min not less than max:" min max))))
+ (cond
+ ((not max) (r:seq expr (string-append "\\{" (number->string min) ",\\}")))
+ ((= max min) (r:seq expr (string-append "\\{" (number->string min) "\\}")))
+ (else (r:seq expr (string-append "\\{"
+ (number->string min)
+ ","
+ (number->string max)
+ "\\}")))))
+#| tests:
+(pp (r:grep (r:repeat 2 3 (r:quote "cat"))
+ "tests.txt"))
+(pp (r:grep (r:repeat 2 #f (r:quote "cat"))
+ "tests.txt"))
+(pp (r:grep (r:repeat 1 1 (r:quote "cat"))
+ "tests.txt"))
+(pp (r:grep (r:repeat 4 3 (r:quote "cat"))
+ "tests.txt"))
+(r:repeat 2 3 (r:quote "cat"))
+|#
+
+
+Problem 1.3: Optimization
+--------------
+I simply removed the default parentheses around all expressions and implemented
+a new r:subexp wrapper which puts parentheses around expressions. Then I had
+r:alt and r:repeat wrap their expressions in r:subexp. This results in /some/
+redundant nesting but is a huge reduction.
+
+(define (r:subexp . exprs)
+ (string-append "\\(" (apply string-append exprs) "\\)"))
+
+(define (r:seq . exprs)
+ (apply string-append exprs))
+
+(r:seq " " (r:subexp "sdf") "dddd")
+
+(define (r:repeat min max expr)
+ (if (not (exact-nonnegative-integer? min))
+ (error "Min must be non-negative integer:" min))
+ (if max
+ (begin
+ (if (not (exact-nonnegative-integer? max))
+ (error "Max must be non-negative integer (or null):" max))
+ (if (not (<= min max))
+ (error "Min not less than max:" min max))))
+ (cond
+ ((not max) (r:seq (r:subexp expr)
+ (string-append "\\{" (number->string min) ",\\}")))
+ ((= max min) (r:seq (r:subexp expr)
+ (string-append "\\{" (number->string min) "\\}")))
+ (else (r:seq (r:subexp expr)
+ (string-append "\\{"
+ (number->string min)
+ ","
+ (number->string max)
+ "\\}")))))
+
+
+Problem 1.4: Back-references
+--------------
+I only implemented the most crude form of back-expressions. A better
+implementation would include a method of naming subexpressions and refering
+back to them by name: the implementation would do the appropriate counting to
+find the correct backreference number for a given named subexpression.
+
+(define (r:backref n)
+ (string-append "\\" (number->string n)))
+
+Problem 1.5: Ugh!
+-------------
+a) Differences between BREs and EREs (I referenced the table at
+http://www.greenend.org.uk/rjk/2002/06/regexp.html):
+
+* Some characters do not need escaping backslashes: mutiple matching brackets
+ ('\{' becomes '{'), subexpressions ('\(' becomes '(')
+* The alternation operator '|' is added
+* The '+' operator is added meaning one or more matches
+* The '?' operator is added meaning one or zero matches
+
+b) The basic method I chose is to have each r: procedure return a list
+containing two lambda procedures: the first will return the regular grep
+version of the expression, while the second will return the extended grep
+version. The r:grep procedure selects the first lambda and r:egrep selects the
+second, and this decision cascades down recursively to the atomic expressions.
+
+My implementation is crude; the r-grep: and r-egrep: procedures should be
+internal to the primary r: procedures (using let), and the whole process could
+be implemented as a funtion.
+
+c) (not, uh, fully tested...)
+
+; the following are only the expressions which had to be rewritten...
+
+(define (r-grep:seq . exprs)
+ (let ((choose-grep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) (car x))))))
+ (apply string-append (map choose-grep exprs))))
+
+(r-grep:seq "a" "b" '("grep" "egrep"))
+; abgrep
+
+(define (r-egrep:seq . exprs)
+ (let ((choose-egrep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) (cadr x))))))
+ (apply string-append (map choose-egrep exprs))))
+
+#| test:
+(r-grep:seq "a" "b" '("grep" "egrep"))
+; "abgrep"
+(r-egrep:seq "a" "b" '("grep" "egrep"))
+; "abegrep"
+|#
+
+(define (r:seq . exprs)
+ (cons (lambda () (apply r-grep:seq exprs))
+ (cons (lambda () (apply r-egrep:seq exprs)) '())))
+
+#| test:
+(r:seq "a" "b" '("grep" "egrep"))
+;Value: (#[compound-procedure 14] #[compound-procedure 15])
+((car (r:seq "a" "b" '("grep" "egrep"))))
+; "abgrep"
+((cadr (r:seq "a" "b" '("grep" "egrep"))))
+; "abegrep"
+|#
+
+(define (r-egrep:alt . exprs)
+ (let ((choose-egrep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) ((cadr x)))))))
+ (if (pair? exprs)
+ (apply r-egrep:seq
+ (cons (choose-egrep (car exprs))
+ (append-map (lambda (expr)
+ (list "|" (choose-egrep expr)))
+ (cdr exprs))))
+ (r-egrep:seq))))
+
+(define (r-grep:alt . exprs)
+ (let ((choose-grep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) ((car x)))))))
+ (if (pair? exprs)
+ (apply r-grep:seq
+ (cons (choose-grep (car exprs))
+ (append-map (lambda (expr)
+ (list "\\|" (choose-grep expr)))
+ (cdr exprs))))
+ (r-grep:seq))))
+
+(define (r:alt . exprs)
+ (cons (lambda () (apply r-grep:alt exprs))
+ (cons (lambda () (apply r-egrep:alt exprs)) '())))
+
+#| test:
+(r-egrep:alt "a" (r:dot) "c")
+;Value: "a|.|c"
+(r:alt "asdf" (r:dot))
+;Value: (#[compound-procedure 16] #[compound-procedure 17])
+((car (r:alt "asdf" (r:dot))))
+;Value: "asdf\\|."
+((cadr (r:alt "asdf" (r:dot))))
+;Value: "asdf|."
+|#
+
+(define (r-grep:repeat min max expr)
+ (let ((choose-grep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) ((car x)))))))
+ (if (not (exact-nonnegative-integer? min))
+ (error "Min must be non-negative integer:" min))
+ (if max
+ (begin
+ (if (not (exact-nonnegative-integer? max))
+ (error "Max must be non-negative integer (or null):" max))
+ (if (not (<= min max))
+ (error "Min not less than max:" min max))))
+ (cond
+ ((not max) (r-grep:seq (r-grep:subexp (choose-grep expr))
+ (string-append "\\{" (number->string min) ",\\}")))
+ ((= max min) (r-grep:seq (r-grep:subexp (choose-grep expr))
+ (string-append "\\{" (number->string min) "\\}")))
+ (else (r-grep:seq (r-grep:subexp (choose-grep expr))
+ (string-append "\\{"
+ (number->string min)
+ ","
+ (number->string max)
+ "\\}"))))))
+
+(define (r-egrep:repeat min max expr)
+ (let ((choose-egrep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) ((cadr x)))))))
+ (if (not (exact-nonnegative-integer? min))
+ (error "Min must be non-negative integer:" min))
+ (if max
+ (begin
+ (if (not (exact-nonnegative-integer? max))
+ (error "Max must be non-negative integer (or null):" max))
+ (if (not (<= min max))
+ (error "Min not less than max:" min max))))
+ (cond
+ ((not max) (r-egrep:seq (r-egrep:subexp (choose-egrep expr))
+ (string-append "{" (number->string min) ",}")))
+ ((= max min) (r-egrep:seq (r-egrep:subexp (choose-egrep expr))
+ (string-append "{" (number->string min) "}")))
+ (else (r-egrep:seq (r-egrep:subexp (choose-egrep expr))
+ (string-append "{"
+ (number->string min)
+ ","
+ (number->string max)
+ "}"))))))
+
+(define (r:repeat min max expr)
+ (cons (lambda () (r-grep:repeat min max expr))
+ (cons (lambda () (r-egrep:repeat min max expr)) '())))
+
+(define (r-grep:subexp . exprs)
+ (string-append "\\(" (apply string-append
+ (map (lambda (x)
+ (cond
+ ((string? x) x)
+ ((pair? x) ((car x)))))
+ exprs)) "\\)"))
+
+(define (r-egrep:subexp . exprs)
+ (string-append "(" (apply string-append
+ (map (lambda (x)
+ (cond
+ ((string? x) x)
+ ((pair? x) ((cadr x)))))
+ exprs)) ")"))
+
+(define (r:subexp . exprs)
+ (cons (lambda () (apply r-grep:subexp exprs))
+ (cons (lambda () (apply r-egrep:subexp exprs)) '())))
+
+
+#| test:
+(r:subexp "a" "b" (r:dot))
+;Value: (#[compound-procedure 25] #[compound-procedure 26])
+((cadr (r:subexp "a" "b" (r:dot))))
+;Value: "(ab.)"
+((car (r:subexp "a" "b" (r:dot))))
+;Value: "\\(ab.\\)"
+|#
+
+(define (r:grep expr filename)
+ (r:grep-like "grep" '() ((car expr)) filename))
+
+(define (r:egrep expr filename)
+ (if (eq? microcode-id/operating-system 'nt)
+ (r:grep-like "grep" '("-E") ((cadr expr)) filename)
+ (r:grep-like "egrep" '() ((cadr expr)) filename)))
+
+(pp (r:grep (r:seq (r:repeat 2 3 (r:quote "cat")) (r:+ (r:quote "dog")))
+ "tests.txt"))
+
diff --git a/ps01_grep/ps01_work.scm b/ps01_grep/ps01_work.scm
new file mode 100644
index 0000000..7338f20
--- /dev/null
+++ b/ps01_grep/ps01_work.scm
@@ -0,0 +1,331 @@
+;;; 6.945 Problem Set #1 - workfile
+;;; 02/10/2009
+;;; Bryan Newbold <bnewbold@mit.edu>
+
+(load "regexp.scm")
+
+
+(pp (r:grep (r:seq " "
+ (r:repeat 3 5 (r:alt (r:quote "cat") (r:quote "dog")))
+ (r:eol))
+ "tests.txt"))
+
+;-------------------------------------------------------------------
+; 1.1: define r:* and r:+
+
+(define (r:+ pattern)
+ (r:repeat 1 #f pattern))
+
+(define (r:* pattern)
+ (r:repeat 0 #f pattern))
+
+(pp (r:grep (r:seq (r:quote "dog")
+ (r:+ (r:quote "cat")))
+ "tests.txt"))
+
+;("[09]. catdogcat" "[11]. dogdogcatdogdog" "[13]. acatdogdogcats")
+
+(pp (r:grep (r:seq (r:quote "cat")
+ (r:+ (r:char-from (string->char-set "ogd")))
+ (r:quote "cat"))
+ "tests.txt"))
+
+;("[09]. catdogcat" "[13]. acatdogdogcats")
+
+(pp (r:grep (r:seq " "
+ (r:+ (r:quote "cat"))
+ (r:eol))
+ "tests.txt"))
+; #f
+
+(pp (r:grep (r:* (r:quote "something that isn't there"))
+ "tests.txt"))
+; matches all
+
+(pp (r:grep (r:repeat 2 3 (r:* (r:char-from (string->char-set "tca"))))
+ "tests.txt"))
+; matches all: nothing repeated several times
+
+(pp (r:grep (r:seq (r:quote "dog")
+ (r:+ (r:quote "cat"))
+ (r:quote "dog"))
+ "tests.txt"))
+
+;("[11]. dogdogcatdogdog")
+
+(pp (r:grep (r:seq (r:quote "dog")
+ (r:* (r:quote "cat"))
+ (r:quote "dog"))
+ "tests.txt"))
+#|
+("[10]. catcatdogdog" "[11]. dogdogcatdogdog"
+ "[12]. catcatcatdogdogdog"
+ "[13]. acatdogdogcats"
+ "[14]. ifacatdogdogs"
+ "[15]. acatdogdogsme")
+|#
+
+;-------------------------------------------------------------------------
+; 1.2
+
+(define (r:repeat min max expr)
+ (if (not (exact-nonnegative-integer? min))
+ (error "Min must be non-negative integer:" min))
+ (if max
+ (begin
+ (if (not (exact-nonnegative-integer? max))
+ (error "Max must be non-negative integer (or null):" max))
+ (if (not (<= min max))
+ (error "Min not less than max:" min max))))
+ (cond
+ ((not max) (r:seq expr (string-append "\\{" (number->string min) ",\\}")))
+ ((= max min) (r:seq expr (string-append "\\{" (number->string min) "\\}")))
+ (else (r:seq expr (string-append "\\{"
+ (number->string min)
+ ","
+ (number->string max)
+ "\\}")))))
+#| tests:
+(pp (r:grep (r:repeat 2 3 (r:quote "cat"))
+ "tests.txt"))
+(pp (r:grep (r:repeat 2 #f (r:quote "cat"))
+ "tests.txt"))
+(pp (r:grep (r:repeat 1 1 (r:quote "cat"))
+ "tests.txt"))
+(pp (r:grep (r:repeat 4 3 (r:quote "cat"))
+ "tests.txt"))
+(r:repeat 2 3 (r:quote "cat"))
+|#
+
+;---------------------------------------------------------------------------
+; 1.3
+
+(define (r:subexp . exprs)
+ (string-append "\\(" (apply string-append exprs) "\\)"))
+
+(define (r:seq . exprs)
+ (apply string-append exprs))
+
+(r:seq " " (r:subexp "sdf") "dddd")
+
+(define (r:repeat min max expr)
+ (if (not (exact-nonnegative-integer? min))
+ (error "Min must be non-negative integer:" min))
+ (if max
+ (begin
+ (if (not (exact-nonnegative-integer? max))
+ (error "Max must be non-negative integer (or null):" max))
+ (if (not (<= min max))
+ (error "Min not less than max:" min max))))
+ (cond
+ ((not max) (r:seq (r:subexp expr)
+ (string-append "\\{" (number->string min) ",\\}")))
+ ((= max min) (r:seq (r:subexp expr)
+ (string-append "\\{" (number->string min) "\\}")))
+ (else (r:seq (r:subexp expr)
+ (string-append "\\{"
+ (number->string min)
+ ","
+ (number->string max)
+ "\\}")))))
+
+(define (r:alt . exprs)
+ (if (pair? exprs)
+ (r:subexp (apply r:seq
+ (cons (car exprs)
+ (append-map (lambda (expr)
+ (list "\\|" expr))
+ (cdr exprs)))))
+ (r:seq)))
+
+(r:alt "first thing" "second thing")
+
+;----------------------------------------------------------------------------
+; 1.4
+
+(define (r:backref n)
+ (string-append "\\" (number->string n)))
+
+;----------------------------------------------------------------------------
+; 1.5
+
+; the following are only the expressions which had to be rewritten...
+
+(define (r-grep:seq . exprs)
+ (let ((choose-grep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) (car x))))))
+ (apply string-append (map choose-grep exprs))))
+
+(r-grep:seq "a" "b" '("grep" "egrep"))
+; abgrep
+
+(define (r-egrep:seq . exprs)
+ (let ((choose-egrep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) (cadr x))))))
+ (apply string-append (map choose-egrep exprs))))
+
+#| test:
+(r-grep:seq "a" "b" '("grep" "egrep"))
+; "abgrep"
+(r-egrep:seq "a" "b" '("grep" "egrep"))
+; "abegrep"
+|#
+
+(define (r:seq . exprs)
+ (cons (lambda () (apply r-grep:seq exprs))
+ (cons (lambda () (apply r-egrep:seq exprs)) '())))
+
+#| test:
+(r:seq "a" "b" '("grep" "egrep"))
+;Value: (#[compound-procedure 14] #[compound-procedure 15])
+((car (r:seq "a" "b" '("grep" "egrep"))))
+; "abgrep"
+((cadr (r:seq "a" "b" '("grep" "egrep"))))
+; "abegrep"
+|#
+
+(define (r-egrep:alt . exprs)
+ (let ((choose-egrep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) ((cadr x)))))))
+ (if (pair? exprs)
+ (apply r-egrep:seq
+ (cons (choose-egrep (car exprs))
+ (append-map (lambda (expr)
+ (list "|" (choose-egrep expr)))
+ (cdr exprs))))
+ (r-egrep:seq))))
+
+(define (r-grep:alt . exprs)
+ (let ((choose-grep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) ((car x)))))))
+ (if (pair? exprs)
+ (apply r-grep:seq
+ (cons (choose-grep (car exprs))
+ (append-map (lambda (expr)
+ (list "\\|" (choose-grep expr)))
+ (cdr exprs))))
+ (r-grep:seq))))
+
+(define (r:alt . exprs)
+ (cons (lambda () (apply r-grep:alt exprs))
+ (cons (lambda () (apply r-egrep:alt exprs)) '())))
+
+#| test:
+(r-egrep:alt "a" (r:dot) "c")
+;Value: "a|.|c"
+(r:alt "asdf" (r:dot))
+;Value: (#[compound-procedure 16] #[compound-procedure 17])
+((car (r:alt "asdf" (r:dot))))
+;Value: "asdf\\|."
+((cadr (r:alt "asdf" (r:dot))))
+;Value: "asdf|."
+|#
+
+(define (r-grep:repeat min max expr)
+ (let ((choose-grep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) ((car x)))))))
+ (if (not (exact-nonnegative-integer? min))
+ (error "Min must be non-negative integer:" min))
+ (if max
+ (begin
+ (if (not (exact-nonnegative-integer? max))
+ (error "Max must be non-negative integer (or null):" max))
+ (if (not (<= min max))
+ (error "Min not less than max:" min max))))
+ (cond
+ ((not max) (r-grep:seq (r-grep:subexp (choose-grep expr))
+ (string-append "\\{" (number->string min) ",\\}")))
+ ((= max min) (r-grep:seq (r-grep:subexp (choose-grep expr))
+ (string-append "\\{" (number->string min) "\\}")))
+ (else (r-grep:seq (r-grep:subexp (choose-grep expr))
+ (string-append "\\{"
+ (number->string min)
+ ","
+ (number->string max)
+ "\\}"))))))
+
+
+(define (r-egrep:repeat min max expr)
+ (let ((choose-egrep (lambda (x)
+ (cond
+ ((string? x) x)
+ ((list? x) ((cadr x)))))))
+ (if (not (exact-nonnegative-integer? min))
+ (error "Min must be non-negative integer:" min))
+ (if max
+ (begin
+ (if (not (exact-nonnegative-integer? max))
+ (error "Max must be non-negative integer (or null):" max))
+ (if (not (<= min max))
+ (error "Min not less than max:" min max))))
+ (cond
+ ((not max) (r-egrep:seq (r-egrep:subexp (choose-egrep expr))
+ (string-append "{" (number->string min) ",}")))
+ ((= max min) (r-egrep:seq (r-egrep:subexp (choose-egrep expr))
+ (string-append "{" (number->string min) "}")))
+ (else (r-egrep:seq (r-egrep:subexp (choose-egrep expr))
+ (string-append "{"
+ (number->string min)
+ ","
+ (number->string max)
+ "}"))))))
+
+(define (r:repeat min max expr)
+ (cons (lambda () (r-grep:repeat min max expr))
+ (cons (lambda () (r-egrep:repeat min max expr)) '())))
+
+(define (r-grep:subexp . exprs)
+ (string-append "\\(" (apply string-append
+ (map (lambda (x)
+ (cond
+ ((string? x) x)
+ ((pair? x) ((car x)))))
+ exprs)) "\\)"))
+
+(define (r-egrep:subexp . exprs)
+ (string-append "(" (apply string-append
+ (map (lambda (x)
+ (cond
+ ((string? x) x)
+ ((pair? x) ((cadr x)))))
+ exprs)) ")"))
+
+(define (r:subexp . exprs)
+ (cons (lambda () (apply r-grep:subexp exprs))
+ (cons (lambda () (apply r-egrep:subexp exprs)) '())))
+
+#| test:
+(r:subexp "a" "b" (r:dot))
+;Value: (#[compound-procedure 25] #[compound-procedure 26])
+((cadr (r:subexp "a" "b" (r:dot))))
+;Value: "(ab.)"
+((car (r:subexp "a" "b" (r:dot))))
+;Value: "\\(ab.\\)"
+|#
+
+(define (r:grep expr filename)
+ (r:grep-like "grep" '() ((car expr)) filename))
+
+(define (r:egrep expr filename)
+ (if (eq? microcode-id/operating-system 'nt)
+ (r:grep-like "grep" '("-E") ((cadr expr)) filename)
+ (r:grep-like "egrep" '() ((cadr expr)) filename)))
+
+(pp (r:grep (r:seq (r:repeat 2 3 (r:quote "cat")) (r:+ (r:quote "dog")))
+ "tests.txt"))
+
+
+
+
+
+