Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

website URL validation in a text box

ravicoder

Programmer
Apr 29, 2006
45
IN
Hi all

I have data entry form which has a text box fror entry of a website URL. does anyone have a logic for validating the entered URL whether the format is valid or not?
maybe a regex which can be checked against the string?

any help appreciated
Thanks
Ravi
 

This will check whether the URL begins with a protocol (http:// or https:// or also ftp://) and not much more, the samples given tell how little it does, you could also write your own such test of string begin, there is not much more to test and the actual verification can only be to navigate to the URL, because any valid URL might still not exist, there's not much more than testing it. Especially when you take into account that www.something.something is automatically extended by browsers with http:// and when not successful https://, while a COM xmlhttprequest class would not automatically try these variations, so you will need to do so, if you want the same behavior as with browsers. You have the status of request going to 200, if OK and others like 404 not found and many more, if not ok and you have to react to that.
 
Last edited:
Quick answer: An option would be to Ping the url in a cmd-command and look at the result in text.txt. If it contains "check the name" the url is wrong or does not exist. And if it contains something like "Packets Sent:" or "Received =" the url is right and exists. So this more that just checking is the url could exist in theory.
ping -n 1 amazon.com > text.txt
ping -n 1 abczon.com > text.txt
You can automate that in VFP.
I am not a ping specialist. I don't know if this is always true. try if it works.
 
Ping will work only for one thing, though: Assuring the host exists. Ping does not ping a URL, it pings a host, so just verifies, whether the domain exists, not whther the URL exists. You can only check that by requesting the URL.
 
The basic structure of a URL is pretty basic, so I'd start by using what Chriss said, checking to see if http://, https:// are at the start (you can add ftp:// if you want to support it).

Beyond that you can check to see if there is at least one period before the presence of either a slash or question mark and that there are no spaces.

If the URL is supposed to point to some sort of content, you could write a function that actually reads the contents of the URL before you store the URL in your database.

Here's an example that definitely works. It reads the data, and returns whatever's in it. If it's blank, there's nothing there, it will return "* INVALID URL *", or you can make it return whatever you prefer, otherwise it will return the contents of the URL (assuming the URL is supposed to return something).

Code:
FUNCTION GetHttp
LPARAMETERS lcUrl
LOCAL MyResult
LOCAL loHttp as WinHttp.WinHttpRequest.5.1, MyResult
loHTTP = CREATEOBJECT("WinHttp.WinHttpRequest.5.1")    
loHTTP.Open("GET", lcUrl,.F.)
TRY 
    loHttp.Send()
    lcResult = loHttp.ResponseText
CATCH
    lcResult = "* INVALID URL *"
ENDTRY 
loHttp = NULL 
RETURN lcResult

Examples:

This will return the Tek-Tips home page:
x = gethttp("https://www.tek-tips.com")

And this will return "* INVALID URL *"
x = gethttp("https://jibberish.something")
 
That's a good take, if you expand it as follows it'll give you further info from either the Statustext of a request not being status 200 OK or an excpetion message from Winhttp.WinHttpRequest. Notice I added the test from the VFPX/Win32API sample and changed to a HEAD request, so the result is only about the status. If you're only nterested in some headers, you can also be selective and only get the wanted headers like loHttp.getResponseHeader("Content-Length") only. Some headers are in all responses, some not, so GetAllResponseHeaders works generally and gives you all info about the response besides the response body you will only get with a GET request.

Code:
Clear

? '---------------https://github.com----------------------------------------------------------------'
x = getURLstatus("https://github.com")
? x
? '---------------www.microsoft.com-----------------------------------------------------------------'
x = getURLstatus("www.microsoft.com")
? x
? "---------------http://if protocol prefix included it's not valid with this function--------------"
x = getURLstatus("http://if protocol prefix included it's not valid with this function")
? x
? '---------------https://something.wild------------------------------------------------------------'
x = getURLstatus("https://something.wild")
? x
? '---------------https://google.com/notexistingurlonexistinghost-----------------------------------'
x = getURLstatus("https://google.com/notexistingurlonexistinghost")
? x
Function getURLstatus
   Lparameters tcUrl

   Local lcRestult, lnStatus
   Local loHttp As WinHttp.WinHttpRequest.5.1
   loHttp = Createobject("WinHttp.WinHttpRequest.5.1")
   Try
      loHttp.Open("HEAD", tcUrl,.F.)
      loHttp.Send()
      lcResult = Transform(loHttp.Status)+" "+loHttp.StatusText
      *optional, if status is 200 OK, also return all response headers
      If loHttp.Status = 200
         lcResult = lcResult+Chr(13)+Chr(10)
         lcResult = lcResult+Left(loHttp.GetAllResponseHeaders(),300)+"..."+Chr(13)+Chr(10)
      Endif
   Catch To loException
      lcResult = loException.Message
   Endtry
   Return lcResult

By the way: The exception for "www.microsoft.com" already is thrown, when calling the open method, therefore that's also in the TRY block in my extended version.
 
Last edited:
That's a good take, if you expand it as follows it'll give you further info from either the Statustext of a request not being status 200 OK or an excpetion message from Winhttp.WinHttpRequest. Notice I changed to HEAD, so the result is only about the status. If you're only nterested in some headers, you can also be selective and only get the wanted headers like loHttp.getResponseHeader("Content-Length") only. Some headers are in all responses, some not, so GetAllResponseHeaders works generally and gives you all info about the response besides the response body you will only get with a GET request.
Good idea. By reading just the HEAD, you have the potential to speed it up if the URL returns a lot of data.

My only suggestion is that if the main objective is knowing if the URL is good, I wouldn't return loException.Message, I'd make it a consistent value such as "* INVALID *", or simply return .T. for a 200, and .F. for any exception.
 
simply return .T. for a 200, and .F. for any exception.
Whatever suits you. You'd miss out subtleties like a 301 redirect, for example. Or a 503 Service Unavailable that's usually temporarily, even 502 Bad Gateway can be a temporary problem only, so not getting 200 OK and any other status or excecption you don't consider these things.

The sample is quite exhaustively showing how WinHttp.WinHttpRequest actually has several excpetions from non conformant URLs to (temporarily) non resolvable URLs. Besides, just like trying to find out whether you can get a file exclusive can give you a yes or no answer that can be wrong next time.

All in all it depends what you plan to do with the result. If you want to verify a link list, your bookmarked websites, for example, any status can in theory change next moment, only some like completely invalid URLs would stay invalid, but a non resolvable server can become present next moment and a 200 OK URL can fail the next moment. In the end you can take the logic of putting your request in a TRY..CATCH block and then just go for it, either you get the response you aim for or you get a current reason for not getting it. A problem with that approach also is, that GET only is one of the request methods to do, even if you want to get something that should be a response of submitting form data or for using API functions, you may want to use POST or PUT instead.
 
Last edited:
It's not up to me, it's up to the OP.

Remember, all Ravi asked for is validating that it's a valid URL format. If that's the only criteria, even a redirected, unavailable, or Bad Gateway means the URL was formatted correctly.
 
Well, if Ravi wants it that way it can be turned into that, too. It would mean to ignore the Status, because you only get a status, even a 404, if the URL is valid.

Still not all exceptions would mean an invalid URL in the sense that you can mend the www.microsoft.com example when you just prefix it with https:// that's what the response error message "The URL does not use a recognized protocol" means.

Well, and if it's all only about that you can just use IsValidURL in that case.
 
Well, and if it's all only about that you can just use IsValidURL in that case.
That's probably all that's needed. If Ravi's goal was just knowing a URL is formatted correctly, the function should return true even if the domain doesn't even exist.
 
the function should return true even if the domain doesn't even exist.
Exactly, and you should know that implication. There's no point in complaining that requesting a valid URL still does not work in one or another aspect, for example.
So time for Ravi to get into more details or just pick out what's necessary.
 
Valid points.

I'd be more inclined to use your suggestion of IsValidURL to just confirm the formatting, but the bored developer inside of us finds it fun to build something that does more than it needs to.

It's like asking an engineer to make a flathead screwdriver and they build you a Swiss Army knife, just in case we also want to scale a fish or open a wine bottle.
 
asking an engineer to make a flathead screwdriver and they build you a Swiss Army knife
Validating a URL by making a request to it still also is in the scope of validating it. It's clearly slower but gives you concrete result of the status right now. And depending on the use case. If this textbox Ravi talks about is just the input for the URL to visit, then just do it, if it fails you have the error handling mechanism and can inform the user what's wrong. You can choose to autoexpand or not, etc. And for that use case the "Swiss Army knife" is the better tool.

You see? If the question comes from the idea to prevent erros from happening by first ensuring the URL is valid, that's not possible, and then it's just a lack of knowledge about using TRY..CATCH it seems to me.
 
Last edited:
The Swiss Army Knife is definitely the better tool, by a wide margin.

That said, since the original request was for a potential RegEx solution, it's potentially more than the OP needed.

I've used similar approaches to validate just email addresses, where I need not know if the mailbox exists, just that they don't have any typos that make it an invalid address.
 
Hi all

I agree with the suggestion that we should always try to provide more than expected in a software to be very user-friendly which I have always done for most of my customised applications where the user got more time-saving features while doing data entry etc..

for now, I did some study of regular expressions and have come up with the following:

Function websitevalid
Lparameters websitestr

Local loRegExp
loRegExp = Createobject("VBScript.RegExp")
loRegExp.IgnoreCase = .T.
loRegExp.Pattern = "^(https?:\/\/)?(www\.)?([a-zA-Z0-9-]{1,256}\.)+[a-zA-Z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)$"
m.valid = loRegExp.Test(websitestr)
Release loRegExp
Return m.valid

This works for my present requirement

Thanks for all the help & Suggestions
Regards
Ravi
 
Hi all

I agree with the suggestion that we should always try to provide more than expected in a software to be very user-friendly which I have always done for most of my customised applications where the user got more time-saving features while doing data entry etc..

for now, I did some study of regular expressions and have come up with the following:

Function websitevalid
Lparameters websitestr

Local loRegExp
loRegExp = Createobject("VBScript.RegExp")
loRegExp.IgnoreCase = .T.
loRegExp.Pattern = "^(https?:\/\/)?(www\.)?([a-zA-Z0-9-]{1,256}\.)+[a-zA-Z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)$"
m.valid = loRegExp.Test(websitestr)
Release loRegExp
Return m.valid

This works for my present requirement

Thanks for all the help & Suggestions
Regards
Ravi
Be aware that vbscript will be deprecated. https://techcommunity.microsoft.com...-deprecation-timelines-and-next-steps/4148301
 
We're used to finding ways to work with depreciated products, after all, VFP was discontinued ages ago.

The funny thing is Microsoft has a lot of products that use VBScript, including Excel, so it will probably take a while before they pull the plug entirely.

Microsoft suggests a few alternatives to VBScript, most of which are awkward from Excel or VFP, such as PowerShell or JavaScript.

Here's an example of how sloppy it can be to call a RegEx from PowerShell in VFP. Basically, PowerShell can't "Return" things to VFP, so you need to redirect the console to a file, then open the file and read what's in it. That's a mess for multi-user environments, so on top of this, you'll need to generate a unique filename each time.

Code:
FUNCTION RunPowerShellRegex(tcInput, tcPattern)
    LOCAL lcCmd, lcOutputFile
    lcOutputFile = "C:\temp\regex_result.txt"  && Path for output

    * Construct PowerShell command string
    lcCmd = 'powershell -command "$input=\'' + tcInput + '\'; $pattern=\'' + tcPattern + '\'; if ($input -match $pattern) { $matches[0] } else { ''No match'' } | Out-File ' + lcOutputFile + '"'

    * Execute command
    RUN &lcCmd

    * Read result
    LOCAL lcResult
    lcResult = FILETOSTR(lcOutputFile)
    
    RETURN lcResult
ENDFUNC
 
VBScript.RegExp is also used in the FFC sample: ...\Samples\Solution\FFC\Regexp.scx

I already pointed out VBScript can be disabled in Win11 here: https://www.tek-tips.com/threads/execute-dll-file-after-copying.1830277/post-7573028
Overall in this thread I came to the conclusion that even if it's disabled you still can use VBScript.RegExp. In a future deprecation that might remove that, too and so you'd need to distribute it, not impossible. We discussed using the DLL with RegFree COM.

It's nice to have an alternative and for customers you know allow using Powershell you could go that route. There's also the .NET RegEx class and West Windows .NET bridge to use that: https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex?view=net-9.0

A nifty solution would be making use of VFPs editor syntax highlighting which formats valid links blue, underlined and clickable and is also available in the runtime, if you turn on _VFP.editoroptions the right way within an Editbox. But indeed all it detects is http:// or https:// and some more and then renders everything following as a link.

I also found regular expressions, I wasn't satisfied with any I found, as they all limit characters too much. For example this is a valid link your pattern won't recognize as valid:

It would only allow the URL in an escaped version. Many browsers will automatically turn a link this way and there also is a Windows API function to do that, though by modern standards as I already said almost all UTF8 characters are possible. To demonstrate:
Code:
loHttp = Createobject("WinHttp.WinHttpRequest.5.1")
loHttp.Open("HEAD", "https://en.wikipedia.org/wiki/Blue_Öyster_Cult",.F.)
loHTTP.Send()
? loHTTP.StatusText

If you're fine with missing out on such things, you can stay with that regex. Otherwise it would at least be advisable to check an escaped URL variation from the one entered into a textbox to allow such valid URLs, too.
 
Last edited:
Two VFP alternatives to resorting to VBScript RegExp:
At least for the latter, the required caution that Chriss pointed out regarding non-ASCII characters still applies, though.
 

Part and Inventory Search

Sponsor

Back
Top