Regular Expressions in Java

Package com.stevesoft.pat version 1.5.3

Home
Articles/Links
Mugs, T-shirts Comments/Raves
New in 1.5.3
A Game
An Online Test
Questions

Copyright/License
Download Free

 If you need a non-LGPL version
You Can Buy!

Online help...
Quick Start
Tutorial Part 1
Tutorial Part 2
Tutorial Part 3
Tutorial Part 4
Tutorial Part 5
Tutorial Part 6
Examples
Support
FAQ
Documentation

Useful apps...
Java Beautifier
Code Colorizer
GUI Grep
Swing Grep

Other stuff...
Phreida
xmlser

Tutorial Part 2

Pattern Elements:

(), (?:), (?!), (?=)


You can use parenthesis in a variety of ways. One way is to provide grouping for a set of pattern elements. In this example, "{2,}" applies to the entire subpattern "foo" and thus there must be at least a "foofoo" for the pattern to match.
Regex r = new Regex("(foo){2,}");
 
r.search("foo");
System.out.println(""+r.didMatch());
// Prints "false"
 
r.search("foofoofoo");
System.out.println(r.stringMatched());
// Prints "foofoofoo"
Another use for parenthesis is for pulling out matching subpatterns. These subpatterns are called "backreferences."
Regex r = new Regex("[abc]([def])");
r.search("==> be <==");
System.out.println(""+r.didMatch());
// Prints "true"
 
System.out.println(r.stringMatched());
// Prints "be"
 
System.out.println(r.stringMatched(1));
// Prints "e"
// This is the contents of the first backreference
Each parenthesis has a number, and the part of the matching string that falls within the parenthesis is the part of the backreference with that number.
Regex r = new Regex("([abc])([def])");
r.search("==> be <==");
 
System.out.println(r.stringMatched(1));
// Prints "b"
// This is the contents of the first backreference
 
System.out.println(r.stringMatched(2));
// Prints "e"
// This is the contents of the second backreference
The backreferences are given numbers according to the position of the "(" character in the pattern. The leftmost gets 1, the next one to the right gets 2, and so on. This is especially useful to know if you like to nest backreferences.
Regex r = new Regex("(ab(cd))ef");
r.search("==>abcdef<==");
 
System.out.println(r.stringMatched());
// Prints "abcdef"
 
System.out.println(r.stringMatched(1));
// Prints "abcd"
 
System.out.println(r.stringMatched(2));
// Prints "cd"
Please note how the following patterns behave, as it will bring out a few subtleties of pattern writing
Regex r = new Regex("(a)+b*");
r.search("==>aaaabbb<==");
System.out.println(r.stringMatched(1));
// Prints "a"
// Note that the subpattern is just the
// literal character "a" so that is what
// the backreference sees.
 
r = new Regex("(a+)b*");
r.search("==>aaaabbb<==");
System.out.println(r.stringMatched(1));
// Prints "aaaa"
// Now the () contains the * as well, so
// all the matching a's are returned in
// the backreference.
 
r = new Regex("([abc])+");
r.search("==>aaabbbc<==");
System.out.println(r.stringMatched(1));
// Prints "c"
// When you have something of the form (...)*
// the backreference returns the last thing
// that matched.
Note: You can also use methods left(1) and right(1) to get the text to the left and right of backreference one just as you can use the methods left() and right() to get the text to the left and right of the entire match.

Another use of parenthesis is to select one of a set of patterns. The character "|" is used to distinguish the different patterns. For example:

Regex r = new Regex("(apple|banana|pear|orange)");
r.search("apple");
System.out.println(""+r.didMatch());
// Prints "true"
r.search("orange");
System.out.println(""+r.didMatch());
// Prints "true"
r.search("grape");
System.out.println(""+r.didMatch());
// Prints "false"
If you just want the grouping ability of ()'s but are not interested in getting a backreference it is faster and more efficient to use (?:) instead. (By the way, if speed in matching, as opposed to compiling, is what you're after, you should always call the optimize() method or include "(?o)" near the front of your pattern.)
Regex r1 = new Regex("(?:foo){2,}");
// is the same as
Regex r2 = new Regex("(foo){2,}");
// except that r1 produces no backreference.
The pattern (?=) can be used to look ahead in the pattern, as it is always a zero-length match. Otherwise, it behaves as (?:).
Regex r = new Regex("(?i)foo(?=bar)");
 
r.search("Foo or foobar?");
System.out.println(r.stringMatched());
// Prints "foo"
// Matches on the lower case version of
// foo because it is followed by bar -- but
// since the match is zero-width "bar" is
// not part of the matched string.
 
r = new Regex("(?i)foo");
r.search("Foo or foobar?");
System.out.println(r.stringMatched());
// Prints "Foo"
The pattern element (?!) also provides a lookahead functionality with zero-width match -- but only if the subpattern fails to match.
Regex r = new Regex("(?i)foo(?!bar)");
r.search("Foobar or foo?");
System.out.println(r.stringMatched());
// Prints "foo"
// Cannot match on "Foo" because it is followed
// by bar.
 
r = new Regex("(?i)foo");
r.search("Foobar or foo?");
System.out.println(r.stringMatched());
// Prints "Foo"
Review: Parenthesis have three basic functions
  • Grouping of patterns
  • Producing backreferences
  • Selecting one of a set of patterns to match
  • (?: ... ) is like ( ... ) except no backreference is produced.
  • (?= ... ) is like (? ... ) except it produces a match of zero width.
  • (?! ... ) is like (?= ... ) but it only matches if the pattern inside is not found.

Previous Next