








Random Review

|
| Validity of email addresses |
|
Introduction
What follows is a brief description of the usage of regular expressions to verify the validity of an email address.
In fact, most web programmers will some day have to realize a project where he has to verify the validity of information that has been sent to the server by a client. This article will help you verify email addresses that are sent to you using a simple algorythm.
Sadly, I am far from being a regular expressions expert. However, I feel that this article will inform you and greatly risks to intrigue you enough for you to seek more information elsewhere.
Lets start with a brief explanation of what a regular expression is. If you have ever used an operating system such as DOS or Unix, you should be familiar with the ? and the * characters. They greatly simplify search patterns. A search using the expression "test?.asp" will find files such as test1.asp, test2.asp, testx.asp, etc. Furthermore, a search using " test*.asp " will find these files and others such as testxyz.asp.
Hence, you can find multiple files using one search pattern. These expressions and simplified examples of regular expressions.
These can be used on character strings to verify their validity or to select specific portions of text to be extracted or remplaced. A few languages use regular expressions. (I will not list hem here because I do not know them all). They are particularly useful in languages that do not have many string manipulation functions such as JavaScript but they are still very interesting in other languages such as VBScript (which I use in my ASP pages).
The problem
You wish to analyze a string of caracters to verify if they compose a valid email address.
The solutions
a) traditional solution
A valid email adresse ressembles the following:
jkealey@shade.ca
g.alain@videotron.ca
shutchison@qc.aira.com
darkangel-31@hotmail.com
For it to be valid, an email addres must contain only one @ and at least one period. Preceding the @ character, there can only be letters, numbers and a few other characters: the period (.), the underscore (_) and the dash (-). What is after the @ and precedes the final period follows the same rules.
Finally, only letters can follow the last period. Also, there can only be two or three letters (such as .ca, .fr, .com, .net, .org, etc). I am concious that in the future this rule will not apply because there are new domain names such as .bizz, .church and others. However, they are not currently widespread. Furthermore, french characters (and even chinese ones) will be accepted in domaine names in the future. However, this article only takes into account current facts.
Following the traditional method, many lines of code are necessary. Please not that what follows has been written in VBScript and not in Visual Basic. The string comes from the web and will be stored into a database.
<%@ Language=VBScript %>
<%
Option Explicit
Dim iInvalid, sString
iInvalid=0
sString="jkealey@shade.ca"
function isValidChar(c)
isValidChar=1
end function
function isValidLetter(c)
isValidLetter=1
end function
'verifying what precedes the @
For i=1 to inStr(1,sString, "@")
If isValidChar(mid(sString,i,1)) <> 1 then
iInvalid=1
end if
next
'verifying what follows
for i=inStr(1,sString,"@") to len(sString)
if isValidChar(mid(sString,i,1)) <> 1 then
iInvalid=1
end if
next
'verifying what follows the last period
if inStrRev(sString,".") = 0 then
iInvalid=0
else
for i=len(sString) to inStrRev(sString,".") step -1
if isValidLetter(mid(sString,i,1)) <> 1 then
iInvalid=1
end if
next
end if
'we can't have two consecutive periods
for i=inStr(1,sString,"@") to len(sString)-1
if mid(sString,i,1) = "." And mid(sString,i+1,1)="." Then
iInvalid=1
end if
next
if iInvalid=1 then
response.write("invalid email address")
else
response.write("valid email address")
end if
%>
What you have just read has not been tested and its exactitude is doubtful. One thing is certain though, you would need to code the two additionnal functions isValidLetter and isValidChar to verify if a character is a letter (upper or lower case), a number or one of the accepted extra characters.
I did not program them because I feel that I have proven that this method is very long.
b) solution using regular expressions
Only a few lines of code are used with regular expressions.
<%@ Language=VBScript %>
<%
Option Explicit
Dim sString
sString="jkealey@shade.ca"
dim objRegExpr
Set objRegExpr = New regexp
'with what should we compare our string
objRegExpr.Pattern = "^[a-zA-Z0-9\._-]+@([a-zA-Z0-9_-]+\.)+([a-zA-Z]{2,3})$"
'lets compare it.
if objRegExpr.Test (sEmail) then
response.write("valid email address")
else
response.write("invalid email address")
end if
%>
Explanation
Here is our regular expression:
"^[a-zA-Z0-9\._-]+@([a-zA-Z0-9_-]+\.)+([a-zA-Z]{2,3})$"
It is a string of characters, we will analyze it.
The ^ indicates that what follows must be at the beginning of the string.
[a-zA-Z0-9\._-] are the valid characters. As you can see, three intervals and a few other caracters are used to represent them. Please note that the \ is not a valid character. It is inserted before the period because it is a special character in regular expressions. If you have ever programmed in a language such as C/C++ you should be familiar with this escape notation.
The plus sign (+) that follows indicates that what precedes it must be repeated at least once.
The following caracter is the @ because it is obligatory in our email address.
Brackets followed by the plus sign (+) define a part of the string that we will analyze seperately. As seen previously, the + indicates that what precedes it must be repeated at least once. The brackets merge a group of expressions. Here this group will be repeated.
[a-zA-Z0-9_-]+ are the valid characters (note that the period is now absent). These will be repated at least once.
The next expression is followed by a period. Hence, we are representing the possible sub-domains and domains, whatever length might they be. In common terms, we are linking strings that end with the period character.
This. Is. An. Example. becomes This.Is.An.Example
Finally, more brackets followed by the dollar sign ($). This means that the contents of the brackets must end the string.
In the brackets, we see that the only valid characters are letters ([a-zA-Z]). What follows {2,3} ressembles the + character. It indicates that what precedes it must be rempated two to three times (nothing more, nothing less).
In conclusion, this was a brief look at the possibilities of regular expressions.I strongly invite you to visit the website listed below to learn more about the syntax for regular expressions.
Conclusion
Here are the references on Microsoft's website that you have all been waiting for:
http://msdn.microsoft.com/scripting/VBScript/
Jason: jkealey@shade.ca
|
|
| Jason ( 20/06/2001 ) |  |
|
|
Read this article in French
Back
|
|