Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Wanet Telecoms Ltd on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

address split

Status
Not open for further replies.

s0777

Technical User
Nov 7, 2006
3
US
hi, i need to split address field in my database into street#, direction (N/S/E/W), Street NAME, designated field(apt/ste/etc), desig # into seperate fields. right now everything is in one field. data i have is not standard. can someone help me on that?
 
Can you not just use a combination of SUBSTRING, LEFT, RIGHT on the field. ie SELECT LEFT('128 Me Street West',3) would give you 128
 
no cause numbers are different in each field, like some address start with letter and some are with 4 or 5 numbers.
 
If theres no logic to how this information is stored then it suggests your in trouble.
 
I feel bad for you. I once wrote this functionality in VB, and it was pretty ugly. Probably still isn't bullet proof, because people tend to enter addresses very differently, making the problem even worse.

My suggestion is to tackle this problem slowly, 1 piece at a time working from both ends. Let me explain...

Suppose you have an address like: 101 N. Main St.

The hardest part is to extract the street name, so do it last. First, work on the house number, then work on the prefix (N/S/E/W). Then, work on the end of the address. Check for prefixes (NE/NW/SE/SW). Then the street type. The street type can be difficult too because there are many different street types, and people can missplell them easily.

To make this task easier, I suggest you create 'helper' tables that list acceptable values. For example, you should have a StreetTypes table that lists all of the acceptable street types (including their abbreviations). You should have a helper table for StreetTypes, Prefix, Suffix, and address2 types (like apt#, apt, etc...). If your address data also includes city, state and zip, then you should have helper tables for those also.

Code:
Create Table StreetTypes(StreetType VarChar(50))

Insert Into StreetTypes Values('Street')
Insert Into StreetTypes Values('St')
Insert Into StreetTypes Values('St.')
Insert Into StreetTypes Values('Road')
Insert Into StreetTypes Values('Rd')
Insert Into StreetTypes Values('RD.')

Then, when you are normalizing an address, you can use the helper tables to extract the data.

This is just an example. Ultimately, your final code will be more complex than this, but to get you started...

Code:
Declare @Temp Table(Address VarChar(100), HouseNumber VarChar(20), Prefix VarChar(20), StreetName VarChar(20), StreetType VarChar(50), Suffix VarChar(20))

Insert Into @Temp(Address) Values('101 N. Main St.')

Select 'Original' As Status, * From @Temp

Update @Temp
Set    HouseNumber = Left(Address, CharIndex(' ', Address)),
       Address = Right(Address, Len(Address) - CharIndex(' ', Address))
Where  IsNumeric(Left(Address, CharIndex(' ', Address))) = 1

Select 'After House Number' As Status, * From @Temp

Update T
Set    StreetType = StreetTypes.StreetType,
       Address = RTrim(Left(Address, Len(Address) - Len(StreetTypes.StreetType)))
From   @Temp T
       Left Join StreetTypes
         On LTrim(Right(Address, CharIndex(' ', Reverse(Address)))) = StreetTypes.StreetType

Select 'After Street Type' As Status, * From @Temp

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Oh... 1 more thing.

The example I show above uses the IsNumeric function for the house number. IsNumeric can be problematic for various reasons. Instead, you should use the IsNumber or IsInt function shown in this FAQs: faq183-6423.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
... this assumes that the house / or building doesn't have a name as well as or instead of a number, but in general I would agree with George's approach.

[vampire][bat]
 
And in the end, you may need to identify whicj records don't meet any of your rules and manually fix those.

Questions about posting. See faq183-874
 
If you are doing this for addresses in your local area where all addresses are of the same format, you can pretty much tackle the problem as listed above.

However, if you are dealing with a database of U.S. addresses on a national scale, you must consider all of the different formats in the U.S.

For example, all of the following addresses are valid addresses in various U.S. locations:

1) 101 Main St (quite common - no directionals)
2) 101 N Main St (quite common - directional between number and name)
3) N101 Main St (common in WI and Spokane County WA - directional prefixed to number with NO space)
4) N 101 Main St (rural Lewis County WA - directional before number)
5) 101 Main St N (quite common - directional after name and street type)
6) 101 W 1400 S (UT and southern ID - two directionals split by name)
7) 101 W Main St N (parts of OK - two directionals split by name and street type)

and other examples that escape me at the moment.

If you are dealing with a database containing several or all of these types of addresses, you should probably create templates of all the possible address combinations. Before you start tearing each address apart you would first determine which template to use and dismantle it accordingly.

Other things to consider are that the street type USUALLY comes after the street name, but NOT ALWAYS. In parts of CA and CO addresses like 101 Rd J2 are common, reversing the street name and street type.

Apartment addresses can be problematic also. Most apartment addresses are like 101 Main St Apt 1. However, sometimes you will see 101A Main St which is Apt A at 101 Main St.

If you have addresses in the Brooklyn or Queens boroughs of New York City, your numbers will contain a dash like 1-01 Main St. In other areas you will find numbers like 101 1/4 Main St, which just happens to be an address behind 101 Main St.

If you have Portland, OR addresses you could have 01000 Virginia St SW and 1000 Virginia St SW. They are NOT the same address, but two DIFFERENT addresses a mile apart on the SAME street. Needless to say you cannot use VAL() on addresses like that.

In Seattle, WA you can find type 1 above, type 2, and type 5 addresses all within the same zip code. So trying to tear them apart by zip is also tough. Many post office zip codes nationwide have two or more of the above types of addresses within a single zip code.

Of course, other address types include post office box addresses, and named buildings. They further complicate the issue.

Although getting rarer and rarer, you must consider the rural route and highway contract type of addresses. Typical addresses are Rt 1 Box 66, RR1 B ox 66, HCR Box 66, HCR1 Box 66. These types of addresses are slowly disappearing in favor of physical addresses as opposed to delivery order addresses.

As an aside, the rationale for using physical addresses is related to the adoption of 911 emergency services. Supposedly they enhance speed to the address in question during an emergency. However, the very first operational 911 system in the U.S. was in Wilson County, Kansas at Neodesha about 20 years ago. ALL rural addresses in Wilson County were Rural Route addresses then. Today, 20+ years later ALL rural addresses are STILL Rural Route addresses. IF physical addresses indeed were needed for emergency response, the question arises as to WHY Neodesha has NOT converted from Rural Address numbers to physical addresses decades ago. The flip side is, "Why have rural addresses nationwide been jetisoned using 911 as the excuse?"

As if your life isn't hard enough with all of the above considerations, you also have another problem. Some street names are the same as the directionals. For example, in Fresno, CA you have an address like 100 E North St, which happens to be SOUTH of Fresno, while 100 E South St is NORTH of Fresno. There are many many addresses nationwide like A) 100 East St, B) 100 East St E, and C) 100 E East St. These will make your life even more fun.

And finally, the letttered streets to cap off your day. Addresses like 100 E E St which is 100 East E (the letter E) St. Or like 100 W St S(outh), 100 Rd W E(ast), ad infinitum. Lots of fun and games there, too.

Anyhow, I wish you lots of luck trying to sort out the various types of information from addresses.


mmerlinn

"Political correctness is the BADGE of a COWARD!"

 

I forgot another problem you could run into.

Occasionally you will run into something like 2200 Allen Street Road or 2200 Harris Street Road, where it appears that you have two consecutive street types.

Kelso, WA has a street named Allen Street. When this street crosses the city limits line and enters Cowlitz County, the suffix 'Road' is added to it to become Allen Street Road. Ditto with Harris Street and Harris Street Road.

Within the city, 'Allen' is the name of the street and 'Street' is the street type. Outside of the city 'Allen Street' is the name of the street and 'Road' is the street type.

As you can see this can cause a computer program to have fits trying to correctly determine what is what.




mmerlinn

"Political correctness is the BADGE of a COWARD!"

 
I am also working on trying to get something like this to work. What I have now is similar to what George laid out. However just this mornign I ran into this problem... I have two different streets in the database, say for example

123 North Causeway
456 North Causeway

According to the post office, the street which the address 123 North Causeway is on is broken out with 'North' being the street name and 'Causeway' the suffix, or...
123 North Cswy

According to the post office, the street which 456 North Causeway is on is broken out with 'North' being the pre directional and 'Causeway' being the street name, or....
456 N Causeway

Obviously there is no way to know for sure how to split apart this address without having some sort of master list to check it against.... good luck!
 
I am also working on trying to get something like this to work. What I have now is similar to what George laid out. However just this mornign I ran into this problem... I have two different streets in the database, say for example

123 North Causeway
456 North Causeway

According to the post office, the street which the address 123 North Causeway is on is broken out with 'North' being the street name and 'Causeway' the suffix, or...
123 North Cswy

According to the post office, the street which 456 North Causeway is on is broken out with 'North' being the pre directional and 'Causeway' being the street name, or....
456 N Causeway

Obviously there is no way to know for sure how to split apart this address without having some sort of master list to check it against.... good luck!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top