Regular Expressions

Regex Fundamentals

  1. Delimiters: Characters used to mark the beginning and end of a pattern.
  2. Literal Characters: Characters without special meaning. Ex: a to z, 0 to 9
  3. Meta-Characters: Characters with special meaning. Ex:   . ^ $ * + ? {} [] \ | ()
  4. `.` is a special metacharacter and it matches any single character except newline characters (\n, \r).
  5. `\` is a special metacharacter and it is used to escape special characters or introduce special sequences (\n, \t, etc.)
  6. Flags/Modifiers
    • `g` (global): Searches for all occurrences of the pattern within the text, rather than just the first one.
    • `i` (case-insensitive): Ignores case differences when matching letters.
    • `m` (multi-line): Changes the behavior of ^ and $ anchors to match the beginning and end of each line within the text, rather than just the beginning and end of the entire text.
    • `s` (dotAll): Allows the dot (.) to match newline characters (\n), which it doesn't do by default.
    • `u` (unicode): Enables full Unicode support for the regex pattern.
    • `y` (sticky): Matches only from the last index where a previous match ended.
  7. Character Classes
    • Defining sets of characters to match [abc].
    • Ranges within character classes [a-z] or [0-9].
    • Combined ranges within character classes [a-z0-9].
    • Negated character classes [^abc].
  8. Quantifiers
    • `*` matches 0 or more occurrences.
    • `+` matches 1 or more occurrences.
    • `?` matches 0 or 1 occurrence.
    • `{n}` matches exactly `n` occurrences.
    • `{n,}` matches `n` or more occurrences.
    • `{n,m}` matches between `n` and `m` occurrences.
  9. Anchors
    • `^` asserts the position at the start of the string.
    • `$` asserts the position at the end of the string.
    • `m` (multi-line flag) (not an anchor): Changes the behavior of ^ and $ anchors to match the beginning and end of each line within the text, rather than just the beginning and end of the entire text.
  10. Predefined/Shorthand Character Classes
    • `\d` matches any digit (equivalent to [0-9]).
    • `\D` matches any non-digit.
    • `\w` matches any word character (equivalent to [A-Za-z0-9_]).
    • `\W` matches any non-word character.
    • `\s` matches any whitespace character (spaces, tabs, line breaks).
    • `\S` matches any non-whitespace character.
  11. Alternation
    • Using the vertical bar `|` to specify alternatives
  12. Groups and Capturing
    • Grouping with parentheses `()`.
    • Capturing groups and backreferences `(\1\2)`
    • Non-capturing groups `(?:...)`.
    • Named capture groups (?<name>... )
    • Backreferences using names of named capture groups. \k<name>
  13. Word Boundaries
    • Word boundaries `\b`
    • Non-word boundaries `\B`
  14. Lookaheads and Lookbehinds (Lookarounds)
    • Positive lookahead `pattern1(?=pattern2)`
    • Negative lookahead `pattern1(?!pattern2)`
    • Positive lookbehind `(?<=pattern1)pattern2`
    • Negative lookbehind `(?<!pattern1)pattern2`
  15. Debugging Regular Expressions
    • The current input position (the current position)
    • Lookahead where it is before the main pattern `(?=pattern2)pattern1`
    • Lookbehind where it is after the main pattern `pattern1(?<=pattern1)`
  16. Substitutions or Replacements
    • $& Inserts the matched substring.
    • $` Inserts the portion of the string that precedes the matched substring.
    • $' Inserts the portion of the string that follows the matched substring.
    • $n Inserts the nth (1-indexed) capturing group where n is a positive integer less than 100.
    • $<name> Inserts the named capturing group where Name is the group name.
    • $$ Inserts a "$".
  17. Regex in JavaScript