INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

regex help needed pertaining to url cleaning

regex help needed pertaining to url cleaning

(OP)
I wonder if you can help me with your regex expertise.  
I need to write a java method that will have the following signature:

String cleanUrl (String regex, String url)

The method itself will likely be easy - something like
return url.replaceAll(regex, "");

but it doesn't have to be that - you can suggest a different regex processing invocation.

My question has to do more with the regexs I will pass into the method.   The regexs need to support 5 different url-cleaning operations:

1)      remove specified query params
2)      keep specified query params
3)      remove all query params
4)      just leave the full host
5)      just leave the specified number of host components


Examples corresponding to the 5 types above:
Let's say the url is http://www.sub.blah.co.uk/whatever.html?var1=blah1&var2=blah2&var3=blah3
1)      remove var1 and var 3 ->  http://www.sub.blah.co.uk/whatever.html?var2=blah2
2)      keep var3 -> http://www.sub.blah.co.uk/whatever.html?var3=blah3
3)      remove all -> http://www.sub.blah.co.uk/whatever.html
4)      just leave the host -> http://www.sub.blah.co.uk
5)      just leave 3 host components -> http://blah.co.uk

Any idea how I would write regexs (and the corresponding code) to handle all these while keeping the java code exactly the same for all 5 cases – so only the regex part is different?  In other words, the client code should not specify which of the 5 operations I need - all that has to be implicit in the regex and the code in the cleanUrl method.

RE: regex help needed pertaining to url cleaning

I wouldn't use regex for this. I would write a method to parse the URL and get the individual components to a String array or something like that and then would build the result.

Cheers,
Dian

RE: regex help needed pertaining to url cleaning

[0] Since the url including query string can vary quite a bit, I doubt a single pattern, except some over-worked and hard to maintain one? could cover the very contingent needs such as 1-5's and, why not, more. Besides, the functionality is also materially dependent on the exact use of the replaceAll method. Hence, I would make the method cleanUrl's first parameter be an integer to indicate the specific need of cleaning up. Within the method, the specific pattern and use of replaceAll are coded.

[1] This is a quick implementation of the idea.

private String cleanUrl(int n, String url) {
  Pattern p;
  Matcher m;
  String r=url;
  switch (n) {
    case 1:  //interpreted as only keeping the 2nd query name/value pair
      p=Pattern.compile("^(http://[^?]*)(\\?)([^=]+=[^&]*)(&)([^=]+=[^&]*)(&.+)?$");
      m=p.matcher(url);
      r=m.replaceAll("$1$2$5");
      break;
    case 2:
      p=Pattern.compile("^(http://[^?]*)(\\?)([^=]+=[^&]*){2}(&)(.+)$");
      m=p.matcher(url);
      r=m.replaceAll("$1$2$5");
      break;
    case 3:
      p=Pattern.compile("^(http://[^?]*)(\\?.*)?$");
      m=p.matcher(url);
      r=m.replaceAll("$1");
      break;
    case 4:
      p=Pattern.compile("^(http://[^/]*)(/.*)?$");
      m=p.matcher(url);
      r=m.replaceAll("$1");
      break;
    case 5:
      p=Pattern.compile("^(http://)([^.]+\\.)+(([^/]*?\\.){2}[^/]*)(/.*)$");
      m=p.matcher(url);
      r=m.replaceAll("$1$3");
    default:
      //do nothing
  }
  return r;
}

[1.1] The patterns I put forward may be a bit quick with certain way of interpretation of the needs. You can refine them per your exact interpretation of the needs.

[2] The use of it has nothing special. Take an example: suppose it is an instance x of the class in question.

  String surl="http://www.sub.blah.co.uk/whatever.html?n1=v1&n2=v2&n3=v3";
  String surl_cleaned;
  surl_cleaned=x.cleanUrl(1,surl);
  System.out.println("[1]\n"+surl+"\n"+surl_cleaned);
  surl_cleaned=x.cleanUrl(2,surl);
  System.out.println("[2]\n"+surl+"\n"+surl_cleaned);
  //etc ...

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close