Regular Expressions in Java

Package com.stevesoft.pat version 1.5.3

Home
Articles/Links
Mugs, T-shirts Comments/Raves
New in 1.5.3
A Game
An Online Test
Questions

Copyright/License
Download Free

 If you need a non-LGPL version
You Can Buy!

Online help...
Quick Start
Tutorial Part 1
Tutorial Part 2
Tutorial Part 3
Tutorial Part 4
Tutorial Part 5
Tutorial Part 6
Examples
Support
FAQ
Documentation

Useful apps...
Java Beautifier
Code Colorizer
GUI Grep
Swing Grep

Other stuff...
Phreida
xmlser

Tutorial Part 6

Things you can't do in Perl: (?@()), (?>1), (?Q), (??w), (??d), ...

Up until now, you've been learning about the same basic patterns that are available to you in perl 5. Package pat offers some extensions of this framework, as well as an ability to add your own patterns.

Suppose you wish to match the word "foo" but not if it occurs inside single or double quotes. There is a shorthand for this sort of thing in package pat. You can simply say

Regex r=new Regex("(?Q)foo");
r.search("one 'foo' two \"foo\" three foo.");
System.out.println(r.left()+">>"+r.stringMatched()+"<<"+r.right());
// Prints: One 'foo' two "foo" three >>foo<<.
When the Q flag is set, package pat acts as if each letter inside single or double quotes had a value outside the range possible for a character. Quotes extend from the first occurence of ' or " until a matching quote that is not preceeded by a backslash is found.

Suppose you want to match on balanced parenthesis? That is, suppose you want to fish out a function and its arguments from some bit of C code you're parsing. Package pat allows you to do this in a very simple and straightforward manner.

Regex r = new Regex("foo(?@())");
r.search("a=7; foo( x=2+4, bar(9) ); output(8);");
System.out.println(r.stringMatched());
// Prints: foo( x=2+4, bar(9) )
 
// For comparison....
 
r = new Regex("foo\\(.*\\)");
r.search("a=7; foo( x=2+4, bar(9) ); output(8);");
System.out.println(r.stringMatched());
// Prints: foo( x=2+4, bar(9) ); output(8)
 
// and...
 
r = new Regex("foo\\(.*?\\)");
r.search("a=7; foo( x=2+4, bar(9) ); output(8);");
System.out.println(r.stringMatched());
// Prints: foo( x=2+4, bar(9)
As you can see, of three attempts to match a balanced paranthesis only the first really did what we wanted. The "(?@())" pattern element can be used to match on square brackets (for this purpose it takes the form "(?@[])") and for curly brackets (for this purpose it takes the form "(?@{})"). If that isn't convenient enough for you, you can write your own variation of this pattern element and teach Regex to understand it (see deriv3.java. If you want to see another example of how to design your own pattern element see deriv2.java.).

In Perl you can match an x followed by an a using the pattern "x(?=a)" but what if you wanted to match on an x, but only if it was preceeded by an a? You could do it in package pat like this: "x(?<2)a(?<1)". The element "(?<2)" means go back 2, and the element "(?<1)" means skip ahead 1.

Regex r = new Regex("x(?<2)a(?>1)");
r.search(" -- ax -- ");
System.out.println(r.stringMatched());
// Prints: x
Use this pattern element with caution. It works as advertised, but its sometimes hard to realize exactly what the result of this will be.

The shorthand "\\w" is convenient, it can be used to match on words in combination with "+". Thus, "\\w+" will generally allow you to match on any word -- as long as that word is in English. This pattern would probably not do what you wanted if it encountered the German word fünf or the French word français. One solution is to use the pattern element "(??w)". This acts like "\\w" but matches all unicode characters. Likewise, there is a pattern element "(??d)" which will match on all unicode digits -- not just 0-9 like "\\d". To learn more, see unicode.html.

A final trick: What if you want to swap two words in a bit of text, say you want to replace every occurence of foo with bar and vice versa? You can do this with the transformer.

Trans t = new Transformer(true);
t.add("s/foo/bar/");
t.add("s/bar/foo/");
System.out.println( t.replaceAll("Here is foo, here is bar.") );
// Prints:  Here is bar, here is foo.
To understand this better, and to see another example please look at trans.java.

Well, that's it for now. More tutorials on other details of package pat will probably be placed here in the future. In any event, now you know the basics of how to use package pat. Happy Programming!


Previous