Regular Expressions
Regex Fundamentals
-
Delimiters: Characters used to mark the beginning and end of a pattern.
-
Literal Characters: Characters without special meaning. Ex: a to z, 0 to
9
-
Meta-Characters: Characters with special meaning. Ex: . ^ $ * + ?
{} [] \ | ()
-
`.` is a special metacharacter and it matches any single character
except newline characters (\n, \r).
-
`\` is a special metacharacter and it is used to escape special
characters or introduce special sequences (\n, \t, etc.)
-
Flags/Modifiers
-
`g` (global): Searches for all occurrences of the pattern within
the text, rather than just the first one.
-
`i` (case-insensitive): Ignores case differences when matching
letters.
-
`m` (multi-line): Changes the behavior of ^ and $ anchors to match
the beginning and end of each line within the text, rather than
just the beginning and end of the entire text.
-
`s` (dotAll): Allows the dot (.) to match newline characters (\n),
which it doesn't do by default.
-
`u` (unicode): Enables
full Unicode support for the regex pattern.
-
`y` (sticky): Matches only from the last index where a previous
match ended.
-
Character Classes
- Defining sets of characters to match [abc].
- Ranges within character classes [a-z] or [0-9].
- Combined ranges within character classes [a-z0-9].
- Negated character classes [^abc].
-
Quantifiers
- `*` matches 0 or more occurrences.
- `+` matches 1 or more occurrences.
- `?` matches 0 or 1 occurrence.
- `{n}` matches exactly `n` occurrences.
- `{n,}` matches `n` or more occurrences.
- `{n,m}` matches between `n` and `m` occurrences.
-
Anchors
- `^` asserts the position at the start of the string.
- `$` asserts the position at the end of the string.
-
`m` (multi-line flag) (not an anchor): Changes the behavior of ^
and $ anchors to match the beginning and end of each line within
the text, rather than just the beginning and end of the entire
text.
-
Predefined/Shorthand Character Classes
- `\d` matches any digit (equivalent to [0-9]).
- `\D` matches any non-digit.
-
`\w` matches any word character (equivalent to [A-Za-z0-9_]).
- `\W` matches any non-word character.
-
`\s` matches any whitespace character (spaces, tabs, line breaks).
- `\S` matches any non-whitespace character.
-
Alternation
- Using the vertical bar `|` to specify alternatives
-
Groups and Capturing
- Grouping with parentheses `()`.
- Capturing groups and backreferences `(\1\2)`
- Non-capturing groups `(?:...)`.
- Named capture groups (?<name>... )
-
Backreferences using names of named capture groups. \k<name>
-
Word Boundaries
- Word boundaries `\b`
- Non-word boundaries `\B`
-
Lookaheads and Lookbehinds (Lookarounds)
- Positive lookahead `pattern1(?=pattern2)`
- Negative lookahead `pattern1(?!pattern2)`
- Positive lookbehind `(?<=pattern1)pattern2`
- Negative lookbehind `(?<!pattern1)pattern2`
-
Debugging Regular Expressions
- The current input position (the current position)
- Lookahead where it is before the main pattern `(?=pattern2)pattern1`
- Lookbehind where it is after the main pattern `pattern1(?<=pattern1)`
-
Substitutions or Replacements
- $& Inserts the matched substring.
-
$` Inserts the portion of the string that precedes the matched
substring.
-
$' Inserts the portion of the string that follows the matched
substring.
-
$n Inserts the nth (1-indexed) capturing group where n is a
positive integer less than 100.
-
$<name> Inserts the named capturing group where Name is the
group name.
- $$ Inserts a "$".
-
Regex in JavaScript