Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

website URL validation in a text box

ravicoder

Programmer
Apr 29, 2006
43
IN
Hi all

I have data entry form which has a text box fror entry of a website URL. does anyone have a logic for validating the entered URL whether the format is valid or not?
maybe a regex which can be checked against the string?

any help appreciated
Thanks
Ravi
 

This will check whether the URL begins with a protocol (http:// or https:// or also ftp://) and not much more, the samples given tell how little it does, you could also write your own such test of string begin, there is not much more to test and the actual verification can only be to navigate to the URL, because any valid URL might still not exist, there's not much more than testing it. Especially when you take into account that www.something.something is automatically extended by browsers with http:// and when not successful https://, while a COM xmlhttprequest class would not automatically try these variations, so you will need to do so, if you want the same behavior as with browsers. You have the status of request going to 200, if OK and others like 404 not found and many more, if not ok and you have to react to that.
 
Last edited:
Quick answer: An option would be to Ping the url in a cmd-command and look at the result in text.txt. If it contains "check the name" the url is wrong or does not exist. And if it contains something like "Packets Sent:" or "Received =" the url is right and exists. So this more that just checking is the url could exist in theory.
ping -n 1 amazon.com > text.txt
ping -n 1 abczon.com > text.txt
You can automate that in VFP.
I am not a ping specialist. I don't know if this is always true. try if it works.
 
Ping will work only for one thing, though: Assuring the host exists. Ping does not ping a URL, it pings a host, so just verifies, whether the domain exists, not whther the URL exists. You can only check that by requesting the URL.
 
The basic structure of a URL is pretty basic, so I'd start by using what Chriss said, checking to see if http://, https:// are at the start (you can add ftp:// if you want to support it).

Beyond that you can check to see if there is at least one period before the presence of either a slash or question mark and that there are no spaces.

If the URL is supposed to point to some sort of content, you could write a function that actually reads the contents of the URL before you store the URL in your database.

Here's an example that definitely works. It reads the data, and returns whatever's in it. If it's blank, there's nothing there, it will return "* INVALID URL *", or you can make it return whatever you prefer, otherwise it will return the contents of the URL (assuming the URL is supposed to return something).

Code:
FUNCTION GetHttp
LPARAMETERS lcUrl
LOCAL MyResult
LOCAL loHttp as WinHttp.WinHttpRequest.5.1, MyResult
loHTTP = CREATEOBJECT("WinHttp.WinHttpRequest.5.1")    
loHTTP.Open("GET", lcUrl,.F.)
TRY 
    loHttp.Send()
    lcResult = loHttp.ResponseText
CATCH
    lcResult = "* INVALID URL *"
ENDTRY 
loHttp = NULL 
RETURN lcResult

Examples:

This will return the Tek-Tips home page:
x = gethttp("https://www.tek-tips.com")

And this will return "* INVALID URL *"
x = gethttp("https://jibberish.something")
 
That's a good take, if you expand it as follows it'll give you further info from either the Statustext of a request not being status 200 OK or an excpetion message from Winhttp.WinHttpRequest. Notice I added the test from the VFPX/Win32API sample and changed to a HEAD request, so the result is only about the status. If you're only nterested in some headers, you can also be selective and only get the wanted headers like loHttp.getResponseHeader("Content-Length") only. Some headers are in all responses, some not, so GetAllResponseHeaders works generally and gives you all info about the response besides the response body you will only get with a GET request.

Code:
Clear

? '---------------https://github.com----------------------------------------------------------------'
x = getURLstatus("https://github.com")
? x
? '---------------www.microsoft.com-----------------------------------------------------------------'
x = getURLstatus("www.microsoft.com")
? x
? "---------------http://if protocol prefix included it's not valid with this function--------------"
x = getURLstatus("http://if protocol prefix included it's not valid with this function")
? x
? '---------------https://something.wild------------------------------------------------------------'
x = getURLstatus("https://something.wild")
? x
? '---------------https://google.com/notexistingurlonexistinghost-----------------------------------'
x = getURLstatus("https://google.com/notexistingurlonexistinghost")
? x
Function getURLstatus
   Lparameters tcUrl

   Local lcRestult, lnStatus
   Local loHttp As WinHttp.WinHttpRequest.5.1
   loHttp = Createobject("WinHttp.WinHttpRequest.5.1")
   Try
      loHttp.Open("HEAD", tcUrl,.F.)
      loHttp.Send()
      lcResult = Transform(loHttp.Status)+" "+loHttp.StatusText
      *optional, if status is 200 OK, also return all response headers
      If loHttp.Status = 200
         lcResult = lcResult+Chr(13)+Chr(10)
         lcResult = lcResult+Left(loHttp.GetAllResponseHeaders(),300)+"..."+Chr(13)+Chr(10)
      Endif
   Catch To loException
      lcResult = loException.Message
   Endtry
   Return lcResult

By the way: The exception for "www.microsoft.com" already is thrown, when calling the open method, therefore that's also in the TRY block in my extended version.
 
Last edited:
That's a good take, if you expand it as follows it'll give you further info from either the Statustext of a request not being status 200 OK or an excpetion message from Winhttp.WinHttpRequest. Notice I changed to HEAD, so the result is only about the status. If you're only nterested in some headers, you can also be selective and only get the wanted headers like loHttp.getResponseHeader("Content-Length") only. Some headers are in all responses, some not, so GetAllResponseHeaders works generally and gives you all info about the response besides the response body you will only get with a GET request.
Good idea. By reading just the HEAD, you have the potential to speed it up if the URL returns a lot of data.

My only suggestion is that if the main objective is knowing if the URL is good, I wouldn't return loException.Message, I'd make it a consistent value such as "* INVALID *", or simply return .T. for a 200, and .F. for any exception.
 
simply return .T. for a 200, and .F. for any exception.
Whatever suits you. You'd miss out subtleties like a 301 redirect, for example. Or a 503 Service Unavailable that's usually temporarily, even 502 Bad Gateway can be a temporary problem only, so not getting 200 OK and any other status or excecption you don't consider these things.

The sample is quite exhaustively showing how WinHttp.WinHttpRequest actually has several excpetions from non conformant URLs to (temporarily) non resolvable URLs. Besides, just like trying to find out whether you can get a file exclusive can give you a yes or no answer that can be wrong next time.

All in all it depends what you plan to do with the result. If you want to verify a link list, your bookmarked websites, for example, any status can in theory change next moment, only some like completely invalid URLs would stay invalid, but a non resolvable server can become present next moment and a 200 OK URL can fail the next moment. In the end you can take the logic of putting your request in a TRY..CATCH block and then just go for it, either you get the response you aim for or you get a current reason for not getting it. A problem with that approach also is, that GET only is one of the request methods to do, even if you want to get something that should be a response of submitting form data or for using API functions, you may want to use POST or PUT instead.
 
Last edited:
It's not up to me, it's up to the OP.

Remember, all Ravi asked for is validating that it's a valid URL format. If that's the only criteria, even a redirected, unavailable, or Bad Gateway means the URL was formatted correctly.
 
Well, if Ravi wants it that way it can be turned into that, too. It would mean to ignore the Status, because you only get a status, even a 404, if the URL is valid.

Still not all exceptions would mean an invalid URL in the sense that you can mend the www.microsoft.com example when you just prefix it with https:// that's what the response error message "The URL does not use a recognized protocol" means.

Well, and if it's all only about that you can just use IsValidURL in that case.
 
Well, and if it's all only about that you can just use IsValidURL in that case.
That's probably all that's needed. If Ravi's goal was just knowing a URL is formatted correctly, the function should return true even if the domain doesn't even exist.
 
the function should return true even if the domain doesn't even exist.
Exactly, and you should know that implication. There's no point in complaining that requesting a valid URL still does not work in one or another aspect, for example.
So time for Ravi to get into more details or just pick out what's necessary.
 
Valid points.

I'd be more inclined to use your suggestion of IsValidURL to just confirm the formatting, but the bored developer inside of us finds it fun to build something that does more than it needs to.

It's like asking an engineer to make a flathead screwdriver and they build you a Swiss Army knife, just in case we also want to scale a fish or open a wine bottle.
 
asking an engineer to make a flathead screwdriver and they build you a Swiss Army knife
Validating a URL by making a request to it still also is in the scope of validating it. It's clearly slower but gives you concrete result of the status right now. And depending on the use case. If this textbox Ravi talks about is just the input for the URL to visit, then just do it, if it fails you have the error handling mechanism and can inform the user what's wrong. You can choose to autoexpand or not, etc. And for that use case the "Swiss Army knife" is the better tool.

You see? If the question comes from the idea to prevent erros from happening by first ensuring the URL is valid, that's not possible, and then it's just a lack of knowledge about using TRY..CATCH it seems to me.
 
Last edited:
The Swiss Army Knife is definitely the better tool, by a wide margin.

That said, since the original request was for a potential RegEx solution, it's potentially more than the OP needed.

I've used similar approaches to validate just email addresses, where I need not know if the mailbox exists, just that they don't have any typos that make it an invalid address.
 

Part and Inventory Search

Sponsor

Back
Top