shade.ca
Games
Art
Coding
Beauty Contest
Reviews
Spoofs
Corporate site
Webcams
Links




Tell Me the Truth

Hacky the raccoon
Shade.ca Web Design

3D Molecule representation (C++)

SouthPark Slot Machine

Random Review
Random Review



Join our mailing list!   Tell a friend about this site!

Validity of email addresses

Introduction

    What follows is a brief description of the usage of regular expressions to verify the validity of an email address. In fact, most web programmers will some day have to realize a project where he has to verify the validity of information that has been sent to the server by a client. This article will help you verify email addresses that are sent to you using a simple algorythm. Sadly, I am far from being a regular expressions expert. However, I feel that this article will inform you and greatly risks to intrigue you enough for you to seek more information elsewhere.

    Lets start with a brief explanation of what a regular expression is. If you have ever used an operating system such as DOS or Unix, you should be familiar with the ? and the * characters. They greatly simplify search patterns. A search using the expression "test?.asp" will find files such as test1.asp, test2.asp, testx.asp, etc. Furthermore, a search using " test*.asp " will find these files and others such as testxyz.asp. Hence, you can find multiple files using one search pattern. These expressions and simplified examples of regular expressions.

    These can be used on character strings to verify their validity or to select specific portions of text to be extracted or remplaced. A few languages use regular expressions. (I will not list hem here because I do not know them all). They are particularly useful in languages that do not have many string manipulation functions such as JavaScript but they are still very interesting in other languages such as VBScript (which I use in my ASP pages).



The problem

    You wish to analyze a string of caracters to verify if they compose a valid email address.



The solutions

a) traditional solution

A valid email adresse ressembles the following:
[email protected]
[email protected]
[email protected]
[email protected]


    For it to be valid, an email addres must contain only one @ and at least one period. Preceding the @ character, there can only be letters, numbers and a few other characters: the period (.), the underscore (_) and the dash (-). What is after the @ and precedes the final period follows the same rules. Finally, only letters can follow the last period. Also, there can only be two or three letters (such as .ca, .fr, .com, .net, .org, etc). I am concious that in the future this rule will not apply because there are new domain names such as .bizz, .church and others. However, they are not currently widespread. Furthermore, french characters (and even chinese ones) will be accepted in domaine names in the future. However, this article only takes into account current facts.

     Following the traditional method, many lines of code are necessary. Please not that what follows has been written in VBScript and not in Visual Basic. The string comes from the web and will be stored into a database.

<%@ Language=VBScript %>
<%
Option Explicit
Dim iInvalid, sString
iInvalid=0
sString="[email protected]"


function isValidChar(c)
     isValidChar=1
end function


function isValidLetter(c)
     isValidLetter=1
end function


'verifying what precedes the @
For i=1 to inStr(1,sString, "@")
     If isValidChar(mid(sString,i,1)) <> 1 then
         iInvalid=1
     end if
next


'verifying what follows
for i=inStr(1,sString,"@") to len(sString)
     if isValidChar(mid(sString,i,1)) <> 1 then
         iInvalid=1
     end if
next


'verifying what follows the last period
if inStrRev(sString,".") = 0 then
     iInvalid=0
else
    for i=len(sString) to inStrRev(sString,".") step -1
         if isValidLetter(mid(sString,i,1)) <> 1 then
             iInvalid=1
         end if
    next
end if


'we can't have two consecutive periods
for i=inStr(1,sString,"@") to len(sString)-1
     if mid(sString,i,1) = "." And mid(sString,i+1,1)="." Then
         iInvalid=1
     end if
next


if iInvalid=1 then
     response.write("invalid email address")
else
     response.write("valid email address")
end if


%>


    What you have just read has not been tested and its exactitude is doubtful. One thing is certain though, you would need to code the two additionnal functions isValidLetter and isValidChar to verify if a character is a letter (upper or lower case), a number or one of the accepted extra characters. I did not program them because I feel that I have proven that this method is very long.


b) solution using regular expressions

Only a few lines of code are used with regular expressions.

<%@ Language=VBScript %>
<%
Option Explicit
Dim sString
sString="[email protected]"

dim objRegExpr
Set objRegExpr = New regexp

'with what should we compare our string
objRegExpr.Pattern = "^[a-zA-Z0-9\._-][email protected]([a-zA-Z0-9_-]+\.)+([a-zA-Z]{2,3})$"

'lets compare it.
if objRegExpr.Test (sEmail) then
     response.write("valid email address")
else
    response.write("invalid email address")
end if

%>



Explanation

Here is our regular expression:
"^[a-zA-Z0-9\._-][email protected]([a-zA-Z0-9_-]+\.)+([a-zA-Z]{2,3})$"

It is a string of characters, we will analyze it.
    The ^ indicates that what follows must be at the beginning of the string.

    [a-zA-Z0-9\._-] are the valid characters. As you can see, three intervals and a few other caracters are used to represent them. Please note that the \ is not a valid character. It is inserted before the period because it is a special character in regular expressions. If you have ever programmed in a language such as C/C++ you should be familiar with this escape notation.

    The plus sign (+) that follows indicates that what precedes it must be repeated at least once.

    The following caracter is the @ because it is obligatory in our email address.

    Brackets followed by the plus sign (+) define a part of the string that we will analyze seperately. As seen previously, the + indicates that what precedes it must be repeated at least once. The brackets merge a group of expressions. Here this group will be repeated.

    [a-zA-Z0-9_-]+ are the valid characters (note that the period is now absent). These will be repated at least once.

    The next expression is followed by a period. Hence, we are representing the possible sub-domains and domains, whatever length might they be. In common terms, we are linking strings that end with the period character.

This.    Is.    An.    Example.   becomes This.Is.An.Example

    Finally, more brackets followed by the dollar sign ($). This means that the contents of the brackets must end the string. In the brackets, we see that the only valid characters are letters ([a-zA-Z]). What follows {2,3} ressembles the + character. It indicates that what precedes it must be rempated two to three times (nothing more, nothing less).

    In conclusion, this was a brief look at the possibilities of regular expressions.I strongly invite you to visit the website listed below to learn more about the syntax for regular expressions.



Conclusion

    Here are the references on Microsoft's website that you have all been waiting for:

http://msdn.microsoft.com/scripting/VBScript/

Jason: [email protected]




Jason  
( 2001-06-20 )  




Read this article in French




Back



Français - English