|
Home
Articles/Links
Mugs, T-shirts
Comments/Raves
New in 1.5.3
A Game
An Online Test
Questions
Copyright/License
Download Free
If you need a non-LGPL version
You Can Buy!
Online help...
Quick Start
Tutorial Part 1
Tutorial Part 2
Tutorial Part 3
Tutorial Part 4
Tutorial Part 5
Tutorial Part 6
Examples
Support
FAQ
Documentation
Useful apps...
Java Beautifier
Code Colorizer
GUI Grep
Swing Grep
Other stuff...
Phreida
xmlser
 |
Tutorial Part 5
Replacing Text
Up until now we have focused on how to write a pattern
to match something. However, Regex's can also be used to
replace text. Here is an example of a Regex being used
to change "foo" to "bar".
Regex r = new Regex("foo","bar");
System.out.println(r.replaceFirst("foo and foo again!"));
// prints "bar and foo again!"
System.out.println(r.replaceAll("foo and foo again!"));
// prints "bar and bar again!"
|
The second argument to the constructor of Regex is the
replacement rule. Like the pattern itself, the
replacement rule has some special syntax. The sequence
"$&" refers to the current match. So, to put square
brackets around either the word "foo" or "bar" all we need
to do is this:
Regex r = new Regex("(?:foo|bar)","[$&]");
System.out.println(r.replaceAll("foo or bar"));
// prints "[foo] or [bar]"
|
In the replacement rule, the square brackets are just literal
text with no special meaning. All the special bits of text
for replacement rules will begin with either a $ or a \
unlike the patterns which had a wider variety of special characters.
Note that the replacement rule will work the same way if we had
written it as "[${&}]", or "[$MATCH]", or even "[${MATCH}]".
Putting the {}'s in allows you to specify more exactly which characters
you intend to name the replacment rule, and one is allowed to use
"MATCH" instead of "&" simply because some people think that an
English word is easier to read than a symbol like "&". (I can't
think why)
The next trick you might be interested in learning is how to refer
to a backreference in a replacement rule. The following rule makes
sure that there are white spaces around a "+" sign.
Regex r = new Regex("(\\S)\\+(\\S)","${1} + ${2}");
System.out.println(r.replaceAll("3+4=7, 2+5=7, 1 + 6=7"));
// prints "3 + 4=7, 2 + 5=7, 1 + 6=7"
|
The pattern "\\S", as you may recall, matches anything that is not
a space. Thus, the pattern will match inside the String two times,
the first time it matches on "3+4", the first backreference is "3",
and the second backreference is "4". The second time it matches,
it matches on "2+5" with "2" in the first backrefence and "5" in
the second. Note: Instead of "${1}" one can use "$1" or "\\1" to refer
to the backreference.
Probably less interesting but still quite useful,
is the use of "$`" or "$PREMATCH"
to refer the part of the pattern to the left of a match. Likewise,
the replacement rule "$'" or "$POSTMATCH" can be use to refer to
the portion of the String to the right of a match. In the next
example we use this rule to reverse the order of words in a String.
Regex r = new Regex("\\s+and\\s+","$POSTMATCH and $PREMATCH");
System.out.println(r.replaceAll("foo and bar"));
// prints "bar and foo"
|
As you will remember, "\\s" matches on a white space (i.e. space, tab,
carriage return, or line feed characters), and "\\s+" matches
on one or more white space characters.
Another point of interest concerns the sequences "\\U", "\\L",
"\\u", "\\l", "\\Q", and "\\E". All characters are upper case after
the \U, all are lower case after the \L, and all non-alpha numeric
characters are quoted after \Q. The \E flag puts everything back to
normal.
Here's an example of how you can make words 2 or more letters in length
upper case.
Regex r = new Regex("\\w{2,}","\\U$&");
System.out.println(r.replaceAll("a foo and a bar"));
// Prints a FOO AND a BAR
|
Here's a silly modification that uses \E
Regex r = new Regex("\\w{2,}","\\U$&\\E$&");
System.out.println(r.replaceAll("a foo and a bar"));
// Prints a FOOfoo ANDand a BARbar
|
Now, let's consider the the effects of \u and \l. These cause
the next letter to be upper or lower case respectively, and they
over-ride \U and \L. Thus
Regex r = new Regex("\\w{2,}","\\L\\u$&");
System.out.println(r.replaceAll("a foo and a BAR"));
// Prints a Foo And a Bar
|
This last replacement rule capitolizes a word.
Note that the patterns ^ and $ are affected by the m flag.
If the m flag is turned on (include "(?m)" at the start of the
pattern), then we are in "line mode" and ^ and $ will detected
the end/beginning of lines not just the entire string.
Regex r = null;
// m flag on
r = new Regex("(?m)^","[start]");
System.out.println(r.replaceAll("a\nb\nc"));
/* Prints:
[start]a
[start]b
[start]c
*/
// m flag off
r = new Regex("^","[start]");
System.out.println(r.replaceAll("a\nb\nc"));
/* Prints:
[start]a
b
c
*/
// m flag on
r = new Regex("(?m)$","[end]");
System.out.println(r.replaceAll("a\nb\nc"));
/* Prints:
a[end]
b[end]
c[end]
*/
// m flag off
r = new Regex("$","[end]");
System.out.println(r.replaceAll("a\nb\nc"));
/* Prints:
a
b
c[end]
*/
|
The patterns "\Z" and "\A" are unaffected. They will always
match the end and beginning of the string, respectively.
One other sort of thing you can do in Perl 5 is to allow a
subroutine to process your substitutions. For those of you
who know perl, I'm referring to code like the following:
$x = "Some numbers: 49 36 2";
$x =~ s/\d+/sqrt($&)/eg;
print $x,"\n";
The output from this perl code is:
Some numbers: 7 6 1.4142135623731
The "e" flag allows you to use a function (in this case sqrt)
to perform the substitution rule. Package pat does not support
the "e" flag, for that would entail writing the entire perl language
in java and not just doing regular expression matching. However,
what it does do is allow you to have a java subroutine handle
the matching. This example file
fancy.java illustrates how this can be accomplished.
Previous
Next
|