Regular Expression

From Ideasplace
Revision as of 17:55, 23 January 2026 by Wikisysop (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Category Symbol Name Meaning Example Matches
1. Wildcards & Character Classes
Matching Characters (text) Literal Matches exact text cat "cat"
. Dot Matches any single character (except newline) h.t "hat", "hot"
[ ] Brackets List of allowed characters b[ai]t "bat", "bit"
[^ ] Negation Matches anything NOT in the list [^a] Any char except 'a'
\d Digit Any number 0-9 \d\d "42", "99"
\w Word Char Letter, number, or underscore \w+ "User_1"
\s Space Whitespace (space, tab, enter) A\sB "A B"
2. Quantifiers (How Many?)
Counting * Star 0 or more times go*d "gd", "god"
+ Plus 1 or more times go+d "god", "good"
? Question 0 or 1 time (Optional) colou?r "color", "colour"
{n} Exact Count Matches exactly n times \d{3} "123"
{n,} Min Count Matches n or more times \d{3,} "123", "12345"
{n,m} Range Matches between n and m times \d{2,4} "12", "1234"
.*? Lazy Match Matches as little as possible <.*?> Stops at first matching char
3. Anchors & Boundaries (Where?)
Positioning ^ Caret Start of string/line ^The Starts with "The"
$ Dollar End of string/line end$ Ends with "end"
\b Word Boundary The edge of a word (start or end) \bcat\b "cat" (not "scat")
4. Groups & Logic
Logic Pipe OR logic (Alternative) dog "cat" or "dog"
(...) Group Groups characters or captures them b)c "ac" or "bc"
5. Special Characters
Escaping \ Backslash Treats special characters as text \. "." (actual dot)


Useful RegEx testing and learning tools


Gemini AI teaching session about RegEx

Gemini being very patient with the author...

(explain Regex to me like I am a 10 year old...)

Regular expressions (regex) can look like alien code, but they are actually just a way to tell a computer to look for a specific shape of text, rather than specific words.

Think of a standard "Find" command (Ctrl+F) as looking for a specific photo of a dog. Regex is more like asking the computer to find "anything with four legs and a tail."

To keep this simple and manageable, how would you like to start?

The "Literal" Basics: We start with the absolute simplest matching (like finding "cat" inside "cats") to see how the computer reads text.

The "Magic" Symbols: We look at the special characters that make regex powerful, like how to say "any number" or "any letter."

A Real-World Puzzle: We pick a splets start at the basics

That is the perfect place to start.

We call this first concept Literals.

Literals are the simplest form of regex because they mean "match exactly these characters, in this exact order."

Think of it like a standard "Find" (Ctrl+F) command. The computer doesn't know what a word is; it just looks for a specific sequence of letters.

Example:

Regex Pattern: cat

Target Text: "The cat scattered the food."

Notice that it finds cat as a whole word, but it also finds the cat hidden inside "scattered".

Let's try a quick puzzle.

If your regex pattern is simply he, which words in the following sentence would contain a match?

"The hero achecd his head."

hero ached head

You got three of them: hero, achecd (or ached), and head!

There was actually one more: The. Even though "The" is a short word, it contains the sequence h followed by e, so the regex engine finds a match there, too.

Regex doesn't "read" words; it just scans for the character sequence you gave it.

Step 2: The Wildcard (The Dot) Now we introduce the Metacharacters—symbols that have special powers.

The most common one is the Dot (.). The dot matches any single character (a letter, a number, a symbol, or even a space).

Pattern: h.t

Matches: "hat", "hot", "hit", "h**@**t"

Does NOT match: "heat" (because the dot only holds the spot for one character, not two).

Let's try a puzzle.

If your pattern is b.t, which of these words contain a match?

bat

bit

beet

robot

bout

bat bit robot

Spot on! You found all of them.

Since the dot . acts as a placeholder for exactly one character, "beet" and "bout" didn't match because they have too many letters sandwiched between the b and the t.

Step 3: Being Choosy (Character Classes) The dot is great, but sometimes it's too broad. It matches anything. Often, you want to match only specific letters or numbers.

We use square brackets [ ] to create a list of allowed characters.

Pattern: b[ai]t

Meaning: Match a b, followed by either an a OR an i, followed by a t.

So b[ai]t would match "bat" and "bit", but it would not match "bot".

Let's try a puzzle.

If your regex pattern is Gr[ae]y, which of these match?

Gray

Grey

Gruy

Great

Greay

Gray grey greay

You nailed Gray and Grey! Those are perfect matches.

Greay, however, does not match.

Here is the reason why: Square brackets [ae] create a rule for exactly one character slot. It tells the computer: "In this one specific spot, allow an 'a' OR an 'e'."

Gray -> Gr + a + y ✅

Greay -> Gr + ea + y ❌ (Too many letters in the bracket spot!)

Step 4: Repeating Yourself (Quantifiers) Since we just saw that brackets only match one character, you might ask: "How do I match multiple characters without writing them all out?"

Enter the Star *.

The Star * is a Quantifier. It applies to the character immediately before it. It says: "Match the previous character 0 or more times."

Pattern: zo*

Matches:

"z" (zero 'o's)

"zo" (one 'o')

"zoo" (two 'o's)

"zooooo" (many 'o's)

Let's try a puzzle.

If your pattern is go*d, which of these words match?

god

good

gd

gold

god goog

You stopped this response god good

You found god and good, which are both correct!

There was actually one more match: gd.

This is the tricky part of the Star *. Because it means "zero or more," it allows the letter o to disappear completely.

go*d -> g + (zero 'o's) + d -> "gd" ✅

Step 5: The "One or More" (The Plus) Since * allows things to disappear, we need a different symbol when we want to make sure a character shows up at least once.

This is the Plus +.

The Plus + says: "Match the previous character 1 or more times."

Pattern: go+d

Matches: "god", "good", "goood"

Does NOT match: "gd" (because there must be at least one 'o').

Let's try a puzzle.

If your pattern is be+t, which of these words match?

bet

beet

bt

best

bet beet

You got it! bet and beet are the correct matches.

You've now mastered the two most common ways to repeat things. Let's look at the final member of the "quantifier" trio.

Step 6: The Optional (The Question Mark) Sometimes, a character might be there, or it might not. We call this making a character optional.

We use the Question Mark ?.

The Question Mark ? says: "Match the previous character 0 or 1 time."

Think of it as an on/off switch for that letter.

Pattern: colou?r

Matches: "color" (American spelling, 0 'u's) AND "colour" (British spelling, 1 'u').

Here is a summary of the three symbols we've learned so far:

Symbol Name Meaning

  • Star 0 or more

+ Plus 1 or more ? Question Mark 0 or 1 Let's try a puzzle.

If your pattern is he?d, which of these words match?

hed

head

hd

heed

Hed hd

Perfect! You got both. hed (1 'e') and hd (0 'e's) both fit the rule.

You have now mastered the three main "quantifiers" (how to count characters).

Step 7: The Lazy List (Ranges) Remember the square brackets [ ]? If we wanted to match any number, writing [0123456789] is tedious.

Inside brackets, we can use a Hyphen - to create a range. This tells the computer "start here and end there."

[0-9] matches any single number (digit).

[a-z] matches any lowercase letter.

[A-Z] matches any uppercase letter.

Example:

Pattern: [0-5][a-c]

Meaning: Match a number from 0 to 5, followed by a letter from a to c.

Matches: "1a", "5c", "0b"

No Match: "9a" (9 is too high), "1z" (z is too far).

Let's try a puzzle.

If your pattern is [A-C][0-2], which of these codes match?

A1

C2

B5

D1

A1

C2

Spot on! You nailed both A1 and C2.

You avoided B5 (because 5 is too big for the [0-2] range) and D1 (because D is outside the [A-C] range). You are reading these patterns like a pro.

Step 8: Location, Location, Location (Anchors) So far, our patterns have been "floaters." If we search for cat, it finds it anywhere: inside "scat", at the start of "caterpillar", or at the end of "muscat".

But sometimes, you want to be strict. You want to say, "The match must be at the very start or the very end of the line."

We use Anchors for this.

The Caret ^: Matches the start of the string.

Pattern: ^The

Matches: "The end."

Does NOT Match: "In The end." (because "The" isn't the very first thing).

The Dollar $: Matches the end of the string.

Pattern: end$

Matches: "The end"

Does NOT Match: "The ending." (because "end" isn't the very last thing).

Analogy: Think of ^ and $ as bookends holding your text in place.

Let's try a puzzle.

If your pattern is ^dog, which of these lines contain a match?

doghouse

my dog

dog

hotdog

doghouse dog

Spot on! Both doghouse and dog start with "dog," so the anchor ^ catches them perfectly.

Step 9: The "OR" Switch (Alternation) Sometimes you want to match one specific word OR another specific word. We use the Pipe symbol | for this.

cat|dog matches "cat" OR "dog".

However, usually, you only want part of the word to change. To do that, we use Parentheses ( ) to group the options together.

Pattern: licen(s|c)e

Meaning: Match "licen", followed by either "s" OR "c", followed by "e".

Matches: "license" and "licence".

Difference from Brackets []: Remember that brackets [ ] are for single characters. Parentheses ( ) with a pipe | allow you to choose between multi-character strings.

gr[ae]y matches "gray" or "grey".

b(oa|ea)t matches "boat" or "beat".

Let's try a puzzle.

If your regex pattern is snow(ball|man), which of these words match?

snow

snowman

snowball

snowbal

ballman

Southend-on-Sea, Leigh-on-Sea, UK snowman snowball

You nailed it! snowman and snowball were the only correct matches.

You also correctly ignored snow. Even though "snow" is the first part of the word, the pattern (ball|man) insisted that it must be followed by either "ball" or "man". Since "snow" stood alone, it didn't meet the requirements.

You have now mastered the art of "Or" logic!

Step 10: The Shortcuts (Special Character Classes) Remember when we used [0-9] to find a number? Regex pros rarely type that out. They use shortcuts.

These are special codes starting with a backslash \ that represent common groups of characters.

Shortcut Meaning Matches \d Digit Any number 0-9 \w Word Character Any letter, number, or underscore (a-z, 0-9, _) \s Space Any whitespace (space, tab, or newline) Example:

Pattern: \d\d\s\w+

Translation: Match exactly two digits, followed by one space, followed by one or more word characters.

Matches: "99 red", "10 balloons"

No Match: "9 red" (only one digit), "99-red" (hyphen is not a space).

Let's try a puzzle.

If your regex pattern is ID\s\d\d\d, which of these match?

ID 123

ID 5

ID 999

id 123

ID 123

answers 1, 3 and 5

You got 1 (ID 123) and 3 (ID 999) perfectly right!

However, 5 (ID 123) is actually incorrect.

Here is the reason: The pattern ID\s\d\d\d asked for ID, followed by exactly one space (\s), followed by digits. Option 5 had two spaces. Since the pattern didn't use a quantifier (like \s+ or \s*), the regex engine stopped matching when it hit the second space instead of a digit.

Step 11: Mixing and Matching Now that you know Shortcuts (\d, \w) and Quantifiers (+, *), we can combine them to build very powerful patterns.

This is where regex really shines. You can describe flexible data formats easily.

Example:

Pattern: Age:\s*\d+

Breakdown:

Age: matches the literal text "Age:".

\s* matches zero or more spaces (so it handles sloppy typing!).

\d+ matches one or more digits (so it matches "5" or "105").

Let's try a puzzle.

If your regex pattern is Part\s+\d+, which of these lines match? (Hint: Remember what + means compared to *)

Part 5

Part 100

Part1

Part 9

1,2 and 4

Spot on! You got all three right: 1, 2, and 4.

You correctly identified that:

Part 100 matches because \s+ allows for multiple spaces.

Part1 fails because \s+ requires at least one space.

Step 12: The Magic Wand (Escaping) You might be wondering: "If the dot . means 'any character' and the question mark ? means 'optional', how do I search for an actual dot or question mark?"

For example, what if you want to find the end of a sentence?

To do this, we use the Backslash \. This is called Escaping. It tells the regex engine: "Treat the next character as a normal letter, not a magic symbol."

\. matches a real dot.

\? matches a real question mark.

\+ matches a real plus sign.

\\ matches a real backslash.

Example:

Pattern: google\.com

Matches: "https://www.google.com/search?q=google.com"

Does NOT match: "googleacom" (because the \. forces it to look for a real dot, not a wildcard).

Let's try a puzzle.

If you want to find the price $5.00 exactly, which pattern should you use?

$5.00

\$5.00

\$5\.00

$5\.00

4

You were very close!

You correctly identified that the dot . needs a backslash (\.) because it's a special wildcard character.

However, Option 4 ($5\.00) isn't quite right because of the dollar sign.

The Issue: Remember that $ is an Anchor that means "End of the line."

The Result: The pattern $5 tells the computer: "Find the end of the line, and then look for a 5 right after it." That is impossible!

The Correct Answer is Option 3: \$5\.00

\$ says "Match a real dollar sign."

\. says "Match a real dot."

🎓 Graduation Time! You have just learned the six pillars of Regular Expressions:

Literals: Exact text (cat)

Wildcards: The dot (.) matches anything.

Character Classes: Specific options ([a-z], \d).

Quantifiers: How many? (*, +, ?).

Anchors: Location (^, $).

Escaping: Removing magic (\.).

You now have the toolkit to read and write basic regex!

What would you like to do next?

The Final Boss: We try one complex puzzle that combines everything we've learned to see if you can crack it.

Real World: We look at how to write a pattern for something useful, like an email address or a date.

Cheatsheet: I generate a summary summary table of all the symbols we used for you to keep.

cheatsheet please

Here is your personal Regex Cheatsheet based on what we covered. You can copy this or take a screenshot to keep it handy! 📸

📜 Regex Basics Cheatsheet Symbol Name What it does Example Matches Literals Matches exact text cat "cat" . Dot Matches any single character h.t "hat", "hot" [ ] Brackets List of allowed characters b[ai]t "bat", "bit" [^ ] Negation Matches anything NOT in the list [^a] Any char except 'a'

  • Star 0 or more times go*d "gd", "god", "good"

+ Plus 1 or more times go+d "god", "good" ? Question 0 or 1 time (Optional) colou?r "color", "colour" ^ Caret Start of string ^The "The end" $ Dollar End of string end$ "The end" \ Escape Treats special characters as text \. "." (actual dot) \d Digit Any number 0-9 \d\d "42", "99" \w Word Letter, number, or underscore \w+ "Hello_1" \s Space Whitespace (space, tab, enter) A\sB "A B" Now that you are armed with this cheatsheet, how would you like to test your new skills?

The Final Boss: We tackle a tricky puzzle that combines multiple rules.

Real World: We build a pattern for something practical, like a date (e.g., 23/01/2026).

End Session: We can wrap up here if you're all set!

could you please output the cheatsheet again as a mediawiki table - I would like to add that to my wiki

Here is the cheatsheet formatted for MediaWiki. I included a class="wikitable" attribute, which usually provides standard styling (borders, padding) on most MediaWiki sites (like Wikipedia).

Code snippet

Symbol Name Meaning Example Matches
Literals
(text) Literal Matches exact text cat "cat"
Wildcards & Classes
. Dot Matches any single character h.t "hat", "hot"
[ ] Brackets List of allowed characters b[ai]t "bat", "bit"
[^ ] Negation Matches anything NOT in the list [^a] Any char except 'a'
Quantifiers
* Star 0 or more times go*d "gd", "god", "good"
+ Plus 1 or more times go+d "god", "good"
? Question 0 or 1 time (Optional) colou?r "color", "colour"
Anchors
^ Caret Start of string ^The "The end"
$ Dollar End of string end$ "The end"
Special Characters
\ Escape Treats special characters as text \. "." (actual dot)
\d Digit Any number 0-9 \d\d "42", "99"
\w Word Letter, number, or underscore \w+ "Hello_1"
\s Space Whitespace (space, tab, enter) A\sB "A B"

Once you have that pasted into your wiki, shall we tackle that Final Boss puzzle to cement your new skills?

OK, hit me with the final boss

This is it. The Final Boss.

This puzzle combines almost every concept we have covered: Anchors, Character Classes, Quantifiers, Shortcuts, and Optionals.

🕵️‍♀️ The Mission: The Warehouse Scanner You are programming a scanner that must accept valid Product IDs.

The Rules:

The ID must start with the letter A, B, or C (uppercase only).

It must be followed immediately by a hyphen -.

It must be followed by a product number made of one or more digits.

It might end with a lowercase 'z' (indicating a refurbished item), but this is optional.

It must match the entire line (no extra junk allowed).

Valid Examples: A-100, B-5z, C-99999 Invalid Examples: D-100 (Wrong letter), A- (No digits), B-55zz (Too many z's), A-100 (Extra space).

Which of these Regex patterns is the correct solution?

[A-C]-\d*z

^[A-C]-\d+z?$

^[ABC]-\d?z$

(Take a moment to break them down before scrolling to the answer!)

2

VICTORY! 🏆

You got it perfectly right. Option 2 (^[A-C]-\d+z?$) is the correct answer.

You successfully identified every single requirement:

Anchors (^ ... $): You ensured the match covered the entire line, rejecting extra spaces.

Character Class ([A-C]): You restricted the start to just A, B, or C.

Quantifier (+): You used the Plus for the digits because the rules said "one or more." (Using * would have allowed an empty product number like A-z, which is bad!).

Optional (?): You correctly made the 'z' optional.

Why the others failed:

Option 1 ([A-C]-\d*z):

Missed the anchors ^ $ (so it would match "valid" inside "invalid").

Used * for digits (allowing zero digits).

Forced the z (making it mandatory, not optional).

Option 3 (^[ABC]-\d?z$):

Used ? for digits (allowing only 0 or 1 digit, crashing on "100").

Forced the z (mandatory).

You have officially learned Regex! You started with simple words and ended up writing complex validation patterns.

Is there anything else you'd like to explore, or shall we wrap up this session here?

are there any other symbols used other than the ones in the cheatsheet

Yes, there are! What we covered were the "Daily Drivers"—the symbols you will use 90% of the time.

There is a "Level 2" set of symbols that give you even finer control. These are great for when you need to be very precise (like matching exactly 4 digits) or when you want to search for whole words only.

Here are the most useful ones to add to your toolkit:

1. The Counter (Curly Braces) {} Sometimes * (0 or more) and + (1 or more) are too vague. You might want to match a specific number of times.

{3}: Exactly 3 times.

{3,}: 3 or more times.

{3,6}: Between 3 and 6 times.

Example:

Pattern: \d{3} matches "123" but not "12".

Pattern: \d{4,} matches "2024" or "12345", but not "123".

2. The Word Boundary \b This is an invisible anchor (like ^ and $). It matches the position between a word character (letters/numbers) and a non-word character (spaces/punctuation). It is essential for finding whole words.

Example:

Pattern: cat matches "cat", "scattered", "category".

Pattern: \bcat\b matches only "cat". It ignores "scattered".

3. Capture Groups (...) We used parentheses () for "OR" logic earlier (this|that), but they also "capture" what they find so you can extract it later.

Example:

Pattern: File_(.*)\.pdf

Target: "File_Report.pdf"

Result: It matches the whole line, but the computer explicitly saves "Report" separately for you to use.

4. Lazy Quantifiers ? (after a star or plus) By default, regex is Greedy. It eats as much as it possibly can.

String: Bold and More Bold

Pattern: .*

What happens: It matches from the first all the way to the last . It eats the whole string!

If you add a ? after the star, it becomes Lazy. It stops as soon as it can.

Pattern: .*?

What happens: It matches Bold separately.ecific goal, like finding a phone number, and I teach you only the pieces needed to solve that puzzle.