Java Regular Expressions

Java Regular Expressions


Regular expressions In Java
  • Regular expressions have long been integral to standard Unix utilities like sed and awk, and languages like Python and Perl (some would argue that they are the predominant reason for Perl’s success). String manipulation tools were previously delegated to the String, StringBuffer, and StringTokenizer classes in Java, which had relatively simple facilities compared to regular expressions. 
  • Regular expressions are powerful and flexible text-processing tools. They allow you to specify, programmatically, complex patterns of text that can be discovered in an input string. Once you discover these patterns, you can then react to them any way you want. Although the syntax of regular expressions can be intimidating at first, they provide a compact and dynamic language that can be employed to solve all sorts of string processing, matching and selection, editing, and verification problems in a completely general way. 
  • A regular expression is a way to describe strings in general terms, so that you can say, "If a string has these things in it, then it matches what I’m looking for." For example, to say that a number might or might not be preceded by a minus sign, you put in the minus sign followed by a question mark, like this - 
              -? 
  • To describe an integer, you say that it’s one or more digits. In regular expressions, a digit is described by saying ‘\d’. If you have any experience with regular expressions in other languages, you’ll immediately notice a difference in the way backslashes are handled. In other languages, ‘\\’ means "I want to insert a plain old (literal) backslash in the regular expression. Don’t give it any special meaning." In Java, ‘ \ \ ‘ means "I’m inserting a regular expression backslash so that the following character has special meaning." For example, if you want to indicate a digit, your regular expression string will be ‘\\d’. If you want to insert a literal backslash, you say ‘\\\\’- However, things like newlines and tabs just use a single backslash: ‘\n\t’.
Example -The simplest way to use regular expressions is to use the functionality built into the String class. For example, 
//: strings/IntegerMatch.java
public class IntegerMatch {
public static void main(String[] args) {
System.out.println("-1234".matches("-?\\d+"));
System.out.println("5678".matches("-?\\d+"));
System.out.println("+911".matches("-?\\d+"));
System.out.println("+911".matches("(-|\\+)?\\d+"));
}
}
Output - 
true
true
false
true

 
 
The first two expressions match, but the third one starts with a ‘+’, which is a legitimate sign but means the number doesn’t match the regular expression. So we need a way to say, "may start with a + or a -." In regular expressions, parentheses have the effect of grouping an expression, and the vertical bar ‘|’ means OR.
Example -A useful regular expression tool that’s built into String is split( ), which means, "Split this string around matches of the given regular expression
//: strings/Splitting.java
import java.util.*;
public class Splitting {
public static String knights =
"Then, when you have found the shrubbery, you must " +
"cut down the mightiest tree in the forest... " +
"with... a herring!";
public static void split(String regex) {
System.out.println(
Arrays.toString(knights.split(regex)));
}
public static void main(String[] args) {
split(" "); // Doesn't have to contain regex chars
split("\\W+"); // Non-word characters
split("n\\W+"); // 'n' followed by non-word characters
}
Output - 
[Then,, when, you, have, found, the, shrubbery,, you, must, cut, down,
the, mightiest, tree, in, the, forest..., with..., a, herring!]
[Then, when, you, have, found, the, shrubbery, you, must, cut, down,
the, mightiest, tree, in, the, forest, with, a, herring]
[The, whe, you have found the shrubbery, you must cut dow, the mightiest
tree i, the forest... with... a herring!]

 
 
  • You can begin learning regular expressions with a subset of the possible constructs. A complete list of constructs for building regular expressions can be found in the JDK documentation for the Pattern class for package java.util.regex.
  • The power of regular expressions appear when you are defining character classes. Here are some typical ways to create character classes, and some predefined classes - 
  • you’ll want to bookmark the JDK documentation page for java.util.regex.Pattern so you can easily access all the possible regular expression patterns. 
 
Example –
import java.util.regex.*;
import static net.mindview.util.Print.*;
public class TestRegularExpression {
public static void main(String[] args) {
if(args.length < 2) {
print("Usage:\njava TestRegularExpression " +
"characterSequence regularExpression+");
System.exit(0);
}
print("Input: \"" + args[0] + "\"");
for(String arg : args) {
print("Regular expression: \"" + arg + "\"");
Pattern p = Pattern.compile(arg);
Matcher m = p.matcher(args[0]);
while(m.find()) {
 
print("Match \"" + m.group() + "\" at positions " + 
 m.start() + "-" + (m.end() - 1)); 
 } 
 } 
 }
Output -
Input: "abcabcabcdefabc" Regular expression: "abcabcabcdefabc" Match "abcabcabcdefabc" at positions 0-14 Regular expression: "abc+" Match "abc" at positions 0-2 Match "abc" at positions 3-5 Match "abc" at positions 6-8 Match "abc" at positions 12-14 Regular expression: "(abc)+" Match "abcabcabc" at positions 0-8 Match "abc" at positions 12-14 Regular expression: "(abc){2,}" Match "abcabcabc" at positions 0-8