If there is any possability that another substring will be of the same pattern, you need to take some extra precaution to assure that the [supplier code] is taken as the correct substring.
Regular Expression, or most any generic procedure will probably be a necessary - but not sufficient part of the overall process.
One additional step will be to chack that the extracted substring(s) are a valid [supplier code]. Where there is ONLY one such sub string, it may be (prbably is) reasonable to assume that the extracted substring is the [supplier code]. Where multiple matches occur, you will ne additional processing.
A recordset w/ 10K records in the apparent free form would be expected (at least by me) to include a large variation in the individual field content. If, for example, a standard UPC bar code were entered, and the string of four consecutive numerics were the search criteria, the porcess would return numerous matches. Adding the leading trailing 'white space' fileters probably helps, but does not eliminate the potential for impropper extraction, as there might be other instances of four concecutive numeric characters.
Without considerable additional information (or actual review of the information available), it is difficult to impossible to be sure that there is any foolproof method which can be totally 'autometed'. I would suggest that the goals of the process be thoroughly reviewed, with at least a thought to the extraction of the [suplier code] only for those instances which are totally unambigious, and the refferal of the remainder to clerical (i.e. Human) processing.
MichaelRed
m.red@att.net
Searching for employment in all the wrong places