/ /

Cheatsheet — click to insert

What is a Regular Expression?

A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. They're used for string searching, validation, text extraction, and replacement. Regex is supported in virtually every programming language and is an essential tool for any developer working with text processing.

Regex Syntax Quick Reference

  • . — Matches any character except newline
  • ^ / $ — Start / end of string (or line in multiline mode)
  • * / + / ? — 0 or more / 1 or more / 0 or 1
  • {n,m} — Between n and m repetitions
  • [abc] — Character class — matches a, b, or c
  • [^abc] — Negated character class
  • (abc) — Capturing group
  • (?:abc) — Non-capturing group
  • \d / \w / \s — Digit / word character / whitespace
  • a|b — Alternation — matches a or b

Common Regex Patterns

  • Email: [\w.+-]+@[\w-]+\.[a-zA-Z]{2,}
  • URL: https?://[\w\-._~:/?#[\]@!$&'()*+,;=%]+
  • IPv4: (\d{1,3}\.){3}\d{1,3}
  • Phone (US): (\+1\s?)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}
  • Hex color: #([a-fA-F0-9]{6}|[a-fA-F0-9]{3})
  • Date (YYYY-MM-DD): \d{4}-\d{2}-\d{2}

Mastering Regular Expressions: Patterns Every Developer Should Know

Regular expressions are one of the most powerful tools in a developer's toolkit, but they're also one of the most misunderstood. The key to writing effective regex is understanding the building blocks and combining them methodically rather than trying to write complex patterns from scratch.

Essential Regex Patterns for Common Tasks

  • Email Validation: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ — Covers 99% of valid email formats. For production use, consider using your language's built-in email validator instead of regex alone.
  • URL Matching: https?://[^\s/$.?#].[^\s]* — Matches HTTP and HTTPS URLs. Simple but effective for extracting links from text.
  • IPv4 Address: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b — Matches IP address format. Note: this doesn't validate ranges (e.g., 999.999.999.999 would match).
  • Date Format (YYYY-MM-DD): \d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]) — Matches ISO 8601 date format with basic month/day validation.
  • Password Strength: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$ — Requires at least 8 characters with uppercase, lowercase, digit, and special character.

Regex Performance: Avoiding Catastrophic Backtracking

Catastrophic backtracking occurs when a regex engine gets stuck trying exponentially many combinations on certain inputs. This can freeze your application or cause ReDoS (Regular Expression Denial of Service) attacks.

  • Dangerous pattern: (a+)+$ — On input "aaaaaaaaaaaaaaaaab", this takes exponential time because the engine tries every possible way to split the a's between the inner and outer groups.
  • Safe alternative: a+$ — Flatten nested quantifiers when possible.
  • Rule of thumb: Avoid nesting quantifiers (like (x+)+, (x*)*, or (x+)*) unless you're certain the inner pattern cannot match the same characters as the outer repetition.

In production applications, always set a timeout for regex operations and consider using regex libraries that support linear-time matching (like RE2 for Go or rust-regex for Rust).

Regex Flavors: Key Differences Across Languages

Not all regex engines are the same. Here are important differences:

  • JavaScript: No lookbehind support until ES2018. Use /pattern/flags syntax. The g flag is stateful with lastIndex.
  • Python: Uses the re module. Supports named groups with (?P<name>...) syntax (note the P).
  • Go: Uses RE2 engine — no backtracking, guarantees linear time, but doesn't support lookahead/lookbehind.
  • Java: Full PCRE-like support including possessive quantifiers (a++) and atomic groups.

Frequently Asked Questions about Regular Expressions

What is the difference between .* and .+ in regex?

The asterisk (*) means "zero or more" of the preceding element, while plus (+) means "one or more." The pattern .* matches any string including empty strings, while .+ requires at least one character. Both are greedy by default — they match as much as possible. Add ? to make them lazy (match as little as possible): .*? and .+?.

What do the regex flags g, i, m, and s do?

The g (global) flag finds all matches instead of stopping after the first. The i flag makes the match case-insensitive. The m (multiline) flag makes ^ and $ match the start and end of each line rather than the entire string. The s (dotAll) flag makes the dot match newline characters. Combine flags as needed: /pattern/gim.

How do I match a literal dot, parenthesis, or other special characters?

Escape special regex characters with a backslash. A literal dot is . (without the backslash, dot matches any character). A literal parenthesis is ( and ). Other characters that need escaping: ^ $ | ? * + { } [ ]. For example, to match the URL example.com literally, write /example.com/ — otherwise the dot would match any character.

What is the difference between a greedy and a lazy quantifier?

Greedy quantifiers (*, +, {n,m}) try to match as much as possible while still allowing the overall pattern to match. Lazy (non-greedy) quantifiers (*?, +?, {n,m}?) match as little as possible. For example, given <b>bold</b>, the greedy /<.*>/ matches the entire string, while the lazy /<.*?>/ matches only <b>.

Related Developer Tools