INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Regular Expressions

Introduction to Regular Expressions by MarsChelios
Posted: 3 Nov 02

  Hi everyone, this is my first FAQ. Regular Expressions have just recently become available in Java for version 1.4 and I thought it would be good to let people know what they are and what they can do. My goal here is to write a series of FAQ's on Regular Expressions, becoming increasingly complex as I go. This FAQ is designed to introduce Regular Expressions to those where this is new and to those who have not used them before in Java.
  For those who don't know, Regular Expressions are a way to express a pattern that represents possible String sequences. Used correctly they can be very powerful. Here is an example of a Regular Expression and some of the the Strings that are possible for it.

Example 1: Regular Expression Sample
Regular Expression:
  a*b

Strings Possible:
  b
  ab
  aab
  aaab
  aaaa...b
  ...


  As you can see the Regular Expression a*b means any String with a, zero or more times and one trailing b.
  Huh?
  The * behind the a can be thought of as a modifier that changes the overall pattern of the Regular Expression. The * is a modifier meaning zero or more times, so in this case, a's can appear zero or more times. The b has no modifier, meaning it appears only once and at the end of the pattern.

  There are a lot of modifiers and I doubt I'll be covering all of them now. For a complete list of modifiers check out the Pattern class of the Java 1.4.X API's, from which the above example is taken. I'm going to go over some of the more common modifiers, but first I need to teach you about Groups,  Classes and Predefined characters.  

Groups
  Groups for Regular Expressions are a sequence of characters that occur in a particular order. A group is surrounded by ()'s and can be modified in the same way a single character can. Here's an example that makes use of groups:

Example 2: Groups
Regular Expression:
  (ab)*b

Strings Possible:
  b
  abb
  ababb
  abababb
  abababab...b
  ...


  The Regular Expression (ab)*b means ab, zero or more times, followed by a trailing b.  A good way to think of groups is to picture them as an ink stamp, and each time you use it, it appears the same way.

Classes
  Classes for Regular Expressions are possible characters that can occur where the class is in the Regular Expression. A class is surrounded by []'s and can be modified in the same way a single character can. Here's an example that makes use of classes:

Example 3: Classes
Regular Expression:
  [ab]*b

Strings Possible:
  b
  ab
  bb
  aab
  abb
  bbb
  ...
  aabbaab
  ...

  The Regular Expression [ab]*b means a or b, zero or more times, followed by a trailing b.  The best way to think of classes are as if they are a bag you are allowed to pull one thing out of, but you can choose what each time.

  Sometimes when you are putting sequences of characters in a class, such as the alphabet, there are a lot of characters to enter.  To alleviate this, Java allows you to specify the start and end of the sequence, like so:
  [a-z]

This means characters a though z can be chosen.
You can also specify what you don't want in the class, as so:
  [^xyz]

This means all characters except x, y, and z can be chosen.
You can nest groups and classes to create even more combinations of patterns.

Example 4: Nesting Groups and Classes
Regular Expression:
  [a(de)]*b

Strings Possible:
  b
  ab
  deb
  adeb
  deab
  deadeb
  ...

  The Regular Expression [a(de)]*b means a or de, zero or more times, followed by a trailing b.

Predefined Characters
  Predefined characters allow you to specify a set of characters using a single special character. Here is a list of the predefined characters available.

Predefined Character        What is Does
.                           Any character (may or may not match line terminators)
\d                          A digit: [0-9]  
\D                          A non-digit: [^0-9]  
\s                          A whitespace character: [ \t\n\x0B\f\r]  
\S                          A non-whitespace character: [^\s]  
\w                          A word character: [a-zA-Z_0-9]  
\W                          A non-word character: [^\w]

  Here is an example using predefined characters:

Example 5: Predefined Characters
Real Numbers
  \d*

Floating-Point Numbers
  \d*\.+\d*


  Notice the '\' in front of the period for Floating-Point Numbers. As you can see from the Predefined Character listings, a lone period is a wildcard and can represent any character. To specify that we want an actual '.' and not a wildcard, we must precede it with a '\'.

  On to modifiers! There are a lot of modifiers for Regular Expressions in Java, but right now I am going to cover just the basics.

Basic Modifiers
Modifier        What is Does
X?              X, Once or not at all
X*              X, Zero or more times
X+              X, One or more times
X{n}            X, n Times
X{n,}           X, at Least n Times
X{n,m}          X, at Least n Times, at most m Times

  Technically these modifiers are called Greedy Modifiers, because of the way they search a string, which we have not covered yet.  For now, though, I'm not going to get into that, but just be aware there are different modifiers than these that work in different ways.

  To end off this FAQ, I'm going to post the source for a simple Java program that will allow you to create a Regular Expression and test it.  

import java.awt.*;
import java.awt.event.*;

import java.util.regex.*;

import javax.swing.*;
import javax.swing.event.*;

public class RegularExpressionTester {

    public static void main (String [] args) {
        
        final JTextField patternField = new JTextField (12);

        final JTextField testField = new JTextField (12);

        patternField.addActionListener (new ActionListener () {
            public void actionPerformed (ActionEvent event) {
                String pattern = patternField.getText ();
                
                try    {
                    Pattern.compile (pattern);
                    patternField.setBackground (Color.GREEN);
                }
                catch (PatternSyntaxException exception) {
                    patternField.setBackground (Color.RED);
                }
            }
        });
        
        testField.addActionListener (new ActionListener () {
            public void actionPerformed (ActionEvent event) {
                String pattern = patternField.getText ();
                String string = testField.getText ();
                
                //Check if input matches regex String
                if (string.matches (pattern)) {
                    testField.setBackground (Color.GREEN);
                }
                else {
                    testField.setBackground (Color.RED);
                }
            }
        });

        final JFrame frame = new JFrame ("Regular Expression Tester");
        frame.addWindowListener (new WindowAdapter () {
            public void windowClosing (WindowEvent event) {
                System.exit (1);
            }
        });

        Container contentPane = frame.getContentPane ();
        
        contentPane.setLayout (new GridLayout (2, 2, 0, 0));
        contentPane.add (new JLabel ("Pattern "));
        contentPane.add (patternField);
        contentPane.add (new JLabel ("Test String "));
        contentPane.add (testField);
        frame.pack ();
        frame.show ();
    }
}

  The next FAQ will cover more of the options available in Regular Expressions, searching using Regular Expressions, and go over some real-world problems solved by Regular Expressions.

  As always, I hope this Helps,
  MarsChelios

  P.S. I'd like to hear some feedback on the FAQ and also how people are using Regular Expressions to solve real-world problems.

Back to Java FAQ Index
Back to Java Forum

My Archive

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close