Learning Regular Expression with visual guide

Learning Regular Expression with visual guide

·

12 min read

Hey folks 👋🏻,

Well, this is my first article, I am on the #100DaysOfCode challenge where I am learning Must know things as a developer. This is my week1 learning. So there might be a ton of mistakes as I go along writing it out, so please give me feedback so that I can work over it.

So Let’s Start !!!

Introduction

Whether you are a frontend or backend developer you must come across Regular Expression once in your career. I remember when I first use form validation to validate passwords. I have no idea what /^[a-z0-9_-]{6,18}$/ do. I just copy and paste from google.

So this blog was my trial to explain Regular Expression with visuals.

What is Regular Expression?

Regular Expressions are a way to describe patterns in string data. It’s both terribly awkward and extremely useful. Regular Expressions are difficult to learn - they have a very compact syntax that ends up looking like gibberish. However, they can be extremely powerful when it comes to form validation, find and replace strings, and/or searching through a body of text. Properly understanding regular expressions will make you a more effective programmer.

Create a Regular Expression

//Long Syntax --- new RegExp("expression","flags");
const longRegExp = new RegExp("[A-Z]+", 'g');
//Short Syntax --- /expression/flags
const shortRegExp = /[A-Z]+/g;

Slashes /.../ tell JavaScript that you are creating a regular expression.

regex-work.png

Character classes

CharacterMeaning
\wWord
\WNOT word
\dDigit
\DNot digit
\sWhitespace
\SNOT whitespace
\tTabs
\nLine breaks
.Any character (except newline)

Flags

FlagsMeaning
gAll matches in the string will be replaced, not just the first.
iWith this flag the search is case-insensitive: no difference between A and a .
m^ and $match per line
sEnables “dotall” mode, that allows a dot . to match newline character \n
uEnables full Unicode (😊, 🤩...) support.
y“Sticky” mode, searching at the exact position in the text.

Brackets and Grouping

setsAndRange.png

  1. [...]: Sets of characters

    Say you want to match any number. In a regular expression, putting a set of characters between square brackets makes that part of the expression match any of the characters between the brackets.

     //1. Test string contains numbers.
     const str = "The Independence Year 1947";
     const regExp = /[0123456789]/; //not recommendation use range
     console.log(regExp.test(str));
     // ✅ Output →  true
    
     //characters in the set, correspond to exactly one character in the match.
     //Example 2. Find "Free", then [d or o], then "m"
     const str2 = "Freedom";
     const regExp = /Free[do]m/; //match for Freedm or Freeom.
     console.log(regExp.test(str2)); 
     // ❌ Output → false
    
  2. [.-.]: Range

    There is another version for /[0123456789]/ which will work for a range of items /[0-9]/.

    • \d – is the same as [0-9],
    • \w – is the same as [a-zA-Z0-9_],
    • \s – is the same as [\t\n\v\f\r ]

      const str = "The Independence Year 1947";
      const regExp = /[0-9]/;
      console.log(regExp.test(str));
      // ✅ Output →  true
      
  3. [^...]: Excluding range

    • [^aeiou] – any character except 'a', 'e', 'i' , ‘o’ or 'u'.
    • [^0-9] – any character except a digit, the same as \D.
    • [^\s] – any non-space character, same as \S.

      const str = "abc";
      const regExp = /[^A-Za-z0-9]/
      console.log(regExp.test(str));//any special character
      // ❌ Output →  false
      
  4. (...): Capturing Group

    Say you want to use an operator like * or + on more than one element at a time. Then you can use ().

     const regExp = /woo+(hoo+)+/i;
     console.log(regExp.test("Woohoooohoohooo"));
     // ✅ Output →  true
    
     //Domain name match
     const domain = "example.com";
     const domainExp = /(\w+\.)+\w+/g;
     console.log(domainExp.test(domain));
     // ✅ Output →  true
    

Anchors: Word and string boundaries

anchors.png

  • ^: If you want to enforce that string must start with a specific pattern or string then use ^ .

      const regExp = /^tiny/i;
    
      const str = "Tiny habits make a big difference.";
      console.log(regExp.test(str) ); 
      // ✅ Output →  true
    
      const str1 = "The Tiny habits make a big difference.";
      console.log(regExp.test(str1) );
      // ❌ Output →  false
    
  • $: If you want to enforce that string must end with a specific pattern or string then use $ .

      const regExp = /progress$/i;
    
      const str = "Goals are good for setting a direction, but systems are best for making progress";
      console.log(regExp.test(str) ); 
      // ✅ Output →  true
    
      const str1 = "Goals are good for setting a direction, but systems are best for making progress everday";
      console.log(regExp.test(str1) );
      // ❌ Output →  false
    

    Testing full match: Both ^ and $ together are mostly used to test match must span the whole string.

    Example: Let’s check whether or not a string is a number only.

      const regExp = /^\d+$/i;
    
      const str = "123456789";
      console.log(regExp.test(str) ); 
      // ✅ Output →  true
    
      const str1 = "1234e122";
      console.log(regExp.test(str1) );
      // ❌ Output →  false
    
  • /b: Word boundary

    This lets you inspect whether a string is at the beginning or at the end of a word:

      const regExp = /\bworld\b/i;
    
      console.log(regExp.test("Hello, World!"));//In this world is standalone
      // ✅ Output →  true
    
      console.log(regExp.test("Hello WorldMap!"));//In this world is not standalone
      // ❌ Output →  false
    

Quantifiers

Quantifiers.png

  1. x? : Zero or one occurrences

    str.match explained 👇🏼. For not it returns the match in array with additional info.

    const str = "Life is Art Live yours in colour";
    console.log(str.match(/colou?r/g)); // work with both ----> color, colour
    // Output → ["colour"]
    
  2. x* : Zero or more occurrences

     const str = "255 25 2";
     console.log(str.match(/\d5*/g));
     // Output → ['255', '25', '2']
    
  3. x+ : One or more occurrences

     const str = "255 25 2";
     console.log(str.match(/\d5+/g));
     // Output → ['255', '25']
    
  4. x{n} : n occurrences

     const apples = "🍎🍎🍎";
     console.log(apples.match(/🍎{3}/gu)); // u → Enables emoji 🤩
     // Output → ["🍎🍎🍎"]
    
     const address = "Street: 432,SomeStreet, Local, City: Pune, Zip code:  411001";
     //now if you want to get zip code
     console.log(address.match(/\d{6}/g));
     // Output → ['411001']
    
  5. x{n,m} : n/m occurrences. n → min and m → max

    If you use {n,} it looks for sequences of digits of length n or more.

     const visitStr ="I visited beautiful place on 01-30-2010."
     const regExp = /\d{1,2}-\d{1,2}-\d{4}/g; 
     console.log(visitStr.match(regExp));//work with both date format → M-D-YYYY and MM-DD-YYYY
     // Output → ['01-30-2010']
    
     const str = "+7(902)-223-25-87";
     const numbers = str.match(/\d{1,}/g);
     console.log(numbers); 
     // Output → ['7', '902', '223', '25', '87']
    

Methods

  1. str.search(regExp): At what index is the match?

    The method str.search(regExp) returns the position of the first match or -1 if not found:

     const str = "Nothing changes if nothing changes";
     const regExp1 = /changes/;
     console.log(str.search(regExp1));
     // ✅ Output → 8 (first match position)
    
     const regExp2 = /life/;
     console.log(str.search(regExp2));
     // ❌ Output → -1
    
  2. str.match(regExp) : Getting all group 0 captures

    The method str.match(regExp) finds matches for regExp in the string str.

    It has 2 modes:

    • If REGEX doesn’t have flag g : it returns the first match as an array with capturing groups and properties index (position of the match), input(input string, equals str):

        const str = "Take small steps Everyday and you'll eventually get there.";
        const regExp = /every(day)/i;
        const result = str.match(regExp);
      
        console.log( result[0] );     // Output → Everyday (full match)
        console.log( result[1] );     // Output → day (first capturing group)
        console.log( result.length ); // Output → 2
      
        // Additional information:
        console.log( result.index );  // Output → 17 (match position)
        console.log( result.input );  // Output → Take small steps Everyday and you'll eventually get there. (source string)
      
    • If REGEX has flag g : then it returns an array of all matches as strings, without capturing groups and other details.

        const str = "Take small steps Everyday and you'll eventually get there.";
        const regExp = /every(day)/g;
        const result = str.match(regExp);
      
        console.log( result );        // Output → ["Everyday"]
        console.log( result[0] );     // Output → Everyday (full match)
        console.log( result.length ); // Output → 1
      

      If there are no matches, no matter if there’s flag g or not, null is returned.

  3. str.matchAll(regExp): Getting an iterable overall match objects [ES2020]

    The method matchAll() must be called with the g flag. It returns an iterable object with matches instead of an array. You can make a regular array from it using Array.from or using for..of .Every match is returned as an array with capturing groups (the same format as str.match without flag g).

     const book = "Atomic Habits An Easy & Proven Way to Build Good Habits & Break Bad Ones";
     const regex = /Habi(t)[a-z]/g;
     const result = book.matchAll(regex);
    
     console.log(result); // Output → RegExpStringIterator{} → object RegExp String Iterator
    
     Array.from(result, (res) => console.log(res));
    
     //Output →['Habits', 't', index: 7, input: 'Atomic Habits An Easy & Proven Way to Build Good Habits & Break Bad Ones', groups: undefined]
     //['Habits', 't', index: 49, input: 'Atomic Habits An Easy & Proven Way to Build Good Habits & Break Bad Ones', groups: undefined]
    

    If there are no results, it returns an empty iterable object instead of null

  4. str.replace(str|regExp, str|function):

    If you want to not only search and match but replace Strings, the replace() method will do the job.

    • Without /g and /y, only the first occurrence is replaced:

        const date = "26-02-2022";
        // replace first dashes by a slash
        const result = date.replace("-", "/");
        console.log(result);
        // Output → 26/02-2022
      
    • With /g, all occurrences are replaced:

        const date = "26-02-2022";
        // replace all dashes by a slash
        const result = date.replace("/-/g", "/");
        console.log(result);
        // Output → 26/02/2022
      

      💪🏻 Real Power of Replace - come into fact when you can refer to matched groups in the replacement string or use the function in replacement to second param.

      For example: Say you have comma separated list of authors names in the format Lastname Firstname . If you want to swap these names and remove the comma to get a Firstname Lastname with new line. format, you can use the following code:

      const authors = 'Clear James, Holiday Ryan, Housel Morgan';
      const authorRegExp = /(\w+) (\w+),?/g; // ? →  allow zero or one occurrence of comma, \w → alphanumeric character
      const rearrangeName = authors.replace(authorRegExp, "$2 $1\n");
      console.log(rearrangeName);
      // Output → James Clear
      //         Ryan Holiday
      //         Morgan Housel
      

      Suppose you want to replace some words in quote with UPPERCASE to highlight them .You can use the second argument as a function.

      const str = "Aim for the moon. If you miss, you may hit a star.";
      const result = str.replace(/moon|star/gi, str => str.toUpperCase()); //pipe character (|) denotes a choice.
      console.log(result);
      // Output → Aim for the MOON. If you miss, you may hit a STAR.
      
  5. regExp.exec(str): Capturing groups

    The method regExp.exec(str) returns a match for regExp in the string str. Unlike previous methods, it’s called on a REGEX, not on a string.

    There are 2 ways in which exec works:

    • If REGEX doesn’t have flag g : Getting a match object for the first match

      If there’s no g, then regExp.exec(str)returns the first match just like str.match(regExp)

        const str = "You must MAKE a change to SEE a change";
        const regExp1 = /change/;
        console.log(regExp1.exec(str));
        // Output → {
        //  0: "change"
        //    groups: undefined
        //    index: 16
        //    input: "You must MAKE a change to SEE a change"
        //    length: 1
        // }
      
    • If REGEX has Flag g : you can loop over matches

      • A call to regExp.exec(str) returns the first match and saves the position immediately after it in the property regExp.lastIndex.
      • The next such call starts the search from position regExp.lastIndex, returns the next match and saves the position after it in regExp.lastIndex.
      • …And so on.
      • If there are no matches, regExp.exec returns null and resets regExp.lastIndex to 0.

        const str = "You must MAKE a change to SEE a change";
        const regExp = /change/ig;
        let match;
        while (match = regExp.exec(str)) {
        console.log(`Found ${match[0]} at ${match.index}, Next starts at ${regExp.lastIndex}.`);
        }
        // Output → Found change at 16
        //          Found change at 32
        

        Before method regExp.matchAll [ES2020] added to JavaScript. calls of regexp.exec were used in the loop to get all matches with groups.

  6. regExp.test(str): Is there a match?

    The method regExp.test(str) looks for at least one match, if found, returns true, otherwise false.

     const str = "Hello World";
     const regExp1 = /hello/i; // i - case insensitive
     console.log(regExp1.test(str));
     // ✅ Output → true 
    
     const regExp2 = /script/;
     console.log(regExp2.test(str));
     // ❌ Output → false
    

Look-around :

There are many cases when you to find next or before by another pattern. This special syntax called “lookahead” and “lookbehind”, together referred to as “look-around”

lookAround.png

Lookahead: matches for a pattern that are followed

  • Positive Lookahead:(?=«pattern») matches if pattern matches what comes next.

      const regExp = /James(?= Clear)/;
    
      const str = "James is a writer.";
      console.log(regExp.test(str));
      // ❌ Output → false
    
      const str1 = "James Clear is the author of the bestselling book.";
      console.log(regExp.test(str1));
      // ✅ Output → true
    
  • Negative Lookahead: (?!«pattern») matches if pattern does not match what comes next.

      const regExp = /James(?! Clear)/;
    
      const str = "James is a writer.";
      console.log(regExp.test(str));
      // ✅ Output → true
    
      const str1 = "James Clear is the author of the bestselling book.";
      console.log(regExp.test(str1));
      // ❌ Output → false
    

Lookbehind: matches for a pattern that are preceded

  • Positive Lookbehind: (?<=«pattern») matches if pattern matches what came before.

      const regExp = /(?<=James) Clear/;
    
      const str = "James is a writer.";
      console.log(regExp.test(str));
      // ❌ Output → false 
    
      const str1 = "James Clear is the author of the bestselling book.";
      console.log(regExp.test(str1));
      // ✅ Output → true
    
  • Negative Lookbehind: (?<!«pattern») matches if pattern does not match what came before.

      cconst regExp = /(?<!James) Clear/;
    
      const str = "Ray Clear is a writer.";
      console.log(regExp.test(str));
      // ✅ Output → true 
    
      const str1 = "James Clear is the author of the bestselling book.";
      console.log(regExp.test(str1));
      // ❌ Output → false
    

Backtracking

As the name specifies, Regular expressions store back-reference. When entering a branch, it remembers its current position so that it can go back and try another branch if the current one does not work out.

To find a match, the regex engine will consume characters one by one. When a partial match begins, the engine will remember the start position so it can go back in case the following characters don't complete the match.

  • If the match is complete, there is no backtracking.
  • If the match isn't complete, the engine will backtrack the string (like when you rewind an old tape) to try to find a whole match.

Let’s take one example \d{2}[a-z]{2} . Where the First two characters should be \d digit followed by the Second two-character between a-z.

and try to match string abc123def

backtracking.png

⚠️ In the above example, there is only one backtracking which is kind of ok. But sometimes regular expressions are looking simple but can execute a very long time, and even “hang” the JavaScript engine. In that case,

the Web browser suggests killing the script and reloading the page. Not a good thing for sure.

For server-side JavaScript such a regExp may hang the server process, that’s even worse. So careful with Backtracking.

Using REGEX in VSCode

VSCode has a nice feature when using the search tool, it can search using regular expressions. You can click cmd+f (on a Mac, or ctrl+f on windows) to open the search tool, and then click cmd+option+r to enable regex search.

For example, You have a large .json file where you want to change the date format. You can replace all using vs code regex search and replace option.

Regex to find all dates - ((19|20)[0-9]{2})(0[0-9]|1[0-2])([0-2][0-9]|3[0-1]) . In this you divide date into 3 group. And replace it with group reference $3-$2-$1

vscode.png

Conclusion

So, This concludes Regular expression (Zero to Hero). There are many things you can try with REGEX. I tried my best on the idea. I hope you learn something new today.

Happy Learning 👩🏼‍💻

References and Learning Resources

Book:

https://eloquentjavascript.net/09_regexp.html

Articles:

https://fireship.io/lessons/regex-cheat-sheet-js/

https://javascript.info/regular-expressions

https://flaviocopes.com/javascript-regular-expressions

https://www.janmeppe.com/blog/regex-for-noobs/

Learning playground:

https://regexlearn.com/playground

https://www.freecodecamp.org/learn/javascript-algorithms-and-data-structures/regular-expressions/

It is possible that I forgot to mention some references.