About Farhan Khwaja

Farhan is a software engineer working in retail domain. He is an ETL/UNIX/Teradata developer. He is also the founder and editor of Code 2 Learn.

Regular Expressions in Java – Soft Introduction

A regular expression is a kind of pattern that can be applied to text (String, in Java). Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.

A regular expression either matches the text ( or a part of it) or it fails to match.
* If regular expression matches a part of text then we can find it out which one.
** If regular expression in complex, then we can easily find out which part of the regular expression matches with which part of the text.

A First Example

The regular expression “[a-z]+” matches all lower case letters in the text.
[a-z] means any character from a to z, inclusive and + means “one or more”.

Suppose we supply a string “code 2 learn java tutorial”.

How to do it in Java

First, you must compile the pattern :
import java.util.regex.*;
Pattern p = Pattern.compile(“[a-z]+”);

Next you must create a matcher for the text by sending a message to the pattern :
Matcher m = p.matcher(“code 2 learn java tutorial”);


Neither Pattern nor Matcher have a public constructor, we create it by using methods in Pattern class.

Pattern Class: A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.

Matcher Class: A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher method on a Pattern object.

After we have done the above steps, and now that we have matcher m, we can check whether the match has been found or not and if yes then from which index position it starts, etc.

m.matches() returns true if the pattern matches the entire string or else false.
m.lookingAt() returns true if the pattern matches at the beginning of the string , and false otherwise.
m.find() returns true if pattern matches any part of the text.

Finding what was matched

After a successful match, m.start() will return the index of the first character matched and m.end() will return the index of the last character matched, plus one.

If no match was attempted, or if the match was unsuccessful, m.start() and m.end() will throw an IllegalStateException
– This is a RuntimeException, so you don’t have to catch it

It may seem strange that m.end() returns the index of the last character matched plus one, but this is just what most String methods require
– For example, “Now is the time.substring(m.start(), m.end())
will return exactly the matched substring.

Java Program :

import java.util.regex.*;

public class RegexTest {
   public static void main(String args[]) {
      String pattern = "[a-z]+";
      String text = "code 2 learn java tutorial";
      Pattern p = Pattern.compile(pattern);
      Matcher m = p.matcher(text);
      while (m.find()) {
          System.out.print(text.substring(m.start(), m.end()) + "*");

Output: code*learn*java*tutorial*.

Additional Methods

If m is a matcher, then

m.replaceFirst(replacement) returns a new String where the first substring matched by the pattern has been replaced by replacement
m.replaceAll(replacement) returns a new String where every substring matched by the pattern has been replaced by replacement
m.find(startIndex) looks for the next pattern match, starting at the specified index
m.reset() resets this matcher
m.reset(newText) resets this matcher and gives it new text to examine (which may be a String, StringBuffer, or CharBuffer)

Regular Expression Syntax

Here is the table listing down all the regular expression metacharacter syntax available in Java:

^Matches beginning of line.
$Matches end of line.
.Matches any single character except newline. Using m option allows it to match newline as well.
[...]Matches any single character in brackets.
[^...]Matches any single character not in brackets
\ABeginning of entire string
\zEnd of entire string
\ZEnd of entire string except allowable final line terminator.
re*Matches 0 or more occurrences of preceding expression.
re+Matches 1 or more of the previous thing
re?Matches 0 or 1 occurrence of preceding expression.
re{ n}Matches exactly n number of occurrences of preceding expression.
re{ n,}Matches n or more occurrences of preceding expression.
re{ n, m}Matches at least n and at most m occurrences of preceding expression.
a| bMatches either a or b.
(re)Groups regular expressions and remembers matched text.
(?: re)Groups regular expressions without remembering matched text.
(?> re)Matches independent pattern without backtracking.
\wMatches word characters.
\WMatches nonword characters.
\sMatches whitespace. Equivalent to [\t\n\r\f].
\SMatches nonwhitespace.
\dMatches digits. Equivalent to [0-9].
\DMatches nondigits.
\AMatches beginning of string.
\ZMatches end of string. If a newline exists, it matches just before newline.
\zMatches end of string.
\GMatches point where last match finished.
\nBack-reference to capture group number “n”
\bMatches word boundaries when outside brackets. Matches backspace (0×08) when inside brackets.
\BMatches nonword boundaries.
\n, \t, etc.Matches newlines, carriage returns, tabs, etc.
\QEscape (quote) all characters up to \E
\EEnds quoting begun with \Q

Reference: Regular Expressions in Java from our JCG partner Farhan Khwaja at the Code 2 Learn blog.

Related Whitepaper:

Bulletproof Java Code: A Practical Strategy for Developing Functional, Reliable, and Secure Java Code

Use Java? If you do, you know that Java software can be used to drive application logic of Web services or Web applications. Perhaps you use it for desktop applications? Or, embedded devices? Whatever your use of Java code, functional errors are the enemy!

To combat this enemy, your team might already perform functional testing. Even so, you're taking significant risks if you have not yet implemented a comprehensive team-wide quality management strategy. Such a strategy alleviates reliability, security, and performance problems to ensure that your code is free of functionality errors.Read this article to learn about this simple four-step strategy that is proven to make Java code more reliable, more secure, and easier to maintain.

Get it Now!  

Leave a Reply

× five = 10

Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books