Regular Expressions

an atom is one of:

  (re)    report.                  (?:re) no report
  ()      empty string,report      (?:)   empty string, no report
  [chars] bracket expression       .      single character
  \k      non-alnum k as ordinary  \c     alphanum c escaped
  {       char. w/num, bound.      x      just a char (default)

a single quantifier may follow an atom. it is:

  regular quantifiers   "bounds"
  * (0 or more)         {m}   (exactly m)
  + (1 or more)         {m,}  (m or more)
  ? (0 or 1)            {m,n} (m-n inclusive, m<n)
  any quantifier followed by an extra ? is non-greedy

a constraint matches empty string under certain conditions.

  ^ line start  (?=re) positive lookahead - point where re begins
  $ line end    (?!re) negative lookahead - where no re begins

shortcuts

   \A   string start             \Z   string end
   \m   word start               \M   word end
   \y   word beginning/end       \Y   not word beginning/end

bracket expressions

  [abc]   1 char in set
  [^abc]  1 char not in set
  [a-z]   range (inclusive)

  rules inside them:
    chars except -, ], escapes, & some [... combos are literals.
    literal - or ] use collating elements or \ in front
    [.ba.]  collating element
    [[:<:]], [[:>:]]  empty strings at start/end of word (alnum or _)

standard character classes

  alpha,upper,lower -> letters            (x)digit -> (hexa)decimal
  alnum,print       -> letter or digit    punct    -> punctuation.
  space             -> white space        blank    -> space or tab
  graph             -> visible            cntrl    -> control char

shortcuts

 \d  [[:digit:]]       \s    [[:space:]]      outer brackets lost
 \w  [[:alnum:]_]      \D    [^[:digit:]]     in bracket expressns
 \S  [^[:space:]]      \W    [^[:alnum:]_]    [...^...] is illegal

Backreferences

   ([bc])\1    match bb or cc but not bc. # by leading '('
   \m, \mnn    m nonzero digit, n is digit, mnn <= # capturing ')' seen

Metasyntax (defaults first)

  (?xyz)      affects rest of RE after ')'
  b,e,q       rest of RE is BRE,ERE,literal chars
  c,i         case-sensitive, case-insensitive
  s,n,w,p,w   (non), yes, inverse partial, and partial
                 newline-sensitive (m historical synonym for n)
  t,x         (tight), expanded syntax:
               (ws and chars between # and next \n or RE end ignored)
     ws=space     retain ws or # when preceded by '\' or in bracket expr.
                  they are illegal in multichar symbols like '(?:' or '\('

regsub ?switches? exp string subSpec varName

 exp     - regular expression matched against string.
 varName - matching part of string copied (with possible subst) into it
 subSpec - replaces matching part of string.
           & and \0 -> matching part of string.
           \n (n is digit 1..9) -> match for n-th () subexpr of exp
           can escape with backslashes, but then enclose subSpec in braces.
 -all    - all matching ranges found and substituted using corresponding matches
 -nocase - matching case-insensitive, substitution uses original case.
 -start index --> ^ no longer match startline and \A still match start of string at index.

return to 3gwt index