Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Wanet Telecoms Ltd on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reading HTML File and Parsing Tags 1

Status
Not open for further replies.

CassidyHunt

IS-IT--Management
Jan 7, 2004
688
US
I need to read in an HTML file that has image map coordinates in a <AREA> tag. I know I could use brute force and parse the file one character at a time looking for the tags and delimeters, but I wonder if there is a better method.

The end result is to put all the coordinates into a database. Here is an example of a file:

Code:
<HTML>
<HEAD>
<TITLE>mapus</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
</HEAD>
<BODY BGCOLOR=#FFFFFF LEFTMARGIN=0 TOPMARGIN=0 MARGINWIDTH=0 MARGINHEIGHT=0>
<!-- ImageReady Slices (mapus.psd) -->
<IMG SRC="images/Argentina.gif" WIDTH=2438 HEIGHT=1705 BORDER=0 ALT="" USEMAP="#Argentina_Map">
<MAP NAME="Argentina_Map">
<AREA SHAPE="poly" ALT="Argentina" COORDS="783,1086, 784,1086, 785,1087, 785,1088, 786,1089, 787,1089, 788,1089, 789,1089, 790,1089, 791,1089, 792,1089, 793,1089, 794,1090, 794,1091, 795,1092, 796,1091, 796,1090, 797,1089, 797,1088, 798,1088, 799,1088, 800,1088, 801,1088, 802,1088, 803,1088, 804,1088, 805,1089, 806,1090,
807,1090, 808,1091, 809,1091, 809,1092, 810,1093, 811,1094, 812,1095, 813,1096, 814,1097, 815,1098, 816,1099, 817,1099, 818,1100, 818,1101, 819,1102, 820,1102, 821,1103, 822,1103, 823,1103, 824,1103, 825,1103, 826,1103, 827,1104, 828,1104, 829,1105, 830,1106, 831,1106, 832,1107, 833,1107, 834,1108, 835,1109, 836,1109, 837,1110,
838,1111, 839,1112, 839,1113, 839,1114, 839,1115, 838,1116, 838,1117, 837,1118, 836,1119, 836,1120, 835,1121, 835,1122, 835,1123, 834,1124, 835,1125, 836,1125, 837,1125, 838,1125, 839,1125, 840,1125, 841,1125, 842,1126, 843,1126, 844,1127, 845,1127, 846,1127, 847,1127, 848,1127, 849,1126, 850,1126, 851,1126, 852,1126, 853,1125,
854,1124, 855,1124, 856,1123, 857,1122, 858,1121, 859,1120, 859,1119, 859,1118, 860,1117, 860,1116, 860,1115, 860,1114, 860,1113, 861,1113, 862,1113, 863,1113, 864,1113, 864,1114, 865,1115, 865,1116, 865,1117, 865,1118, 866,1119, 866,1120, 866,1121, 866,1122, 866,1123, 865,1124, 865,1125, 864,1126, 863,1126, 862,1127, 861,1127,
860,1127, 859,1128, 858,1129, 857,1130, 856,1131, 855,1132, 854,1133, 853,1134, 852,1135, 851,1136, 850,1137, 850,1138, 849,1139, 848,1140, 847,1141, 847,1142, 846,1143, 845,1143, 844,1144, 843,1145, 842,1146, 841,1147, 840,1148, 839,1149, 838,1150, 838,1151, 838,1152, 838,1153, 838,1154, 838,1155, 838,1156, 838,1157, 838,1158,
838,1159, 837,1160, 837,1161, 836,1162, 836,1163, 836,1164, 836,1165, 836,1166, 836,1167, 836,1168, 836,1169, 836,1170, 835,1171, 835,1172, 835,1173, 834,1174, 834,1175, 834,1176, 834,1177, 834,1178, 834,1179, 834,1180, 834,1181, 835,1182, 836,1182, 837,1183, 838,1184, 839,1184, 840,1185, 841,1186, 842,1187, 842,1188, 842,1189,
842,1190, 842,1191, 841,1192, 841,1193, 842,1194, 843,1195, 844,1195, 845,1196, 846,1196, 846,1197, 846,1198, 846,1199, 846,1200, 846,1201, 845,1202, 845,1203, 844,1204, 843,1205, 842,1206, 842,1207, 841,1208, 840,1209, 840,1210, 840,1211, 839,1212, 838,1213, 837,1213, 836,1214, 835,1214, 834,1214, 833,1214, 832,1215, 831,1215,
830,1215, 829,1216, 828,1216, 827,1216, 826,1216, 825,1216, 824,1217, 823,1217, 822,1217, 821,1217, 820,1217, 819,1218, 818,1218, 817,1218, 816,1218, 815,1218, 814,1218, 813,1218, 812,1218, 811,1217, 810,1217, 809,1218, 809,1219, 809,1220, 810,1221, 810,1222, 810,1223, 810,1224, 810,1225, 809,1226, 809,1227, 808,1228, 808,1229,
809,1230, 809,1231, 809,1232, 809,1233, 809,1234, 808,1235, 807,1235, 806,1235, 805,1236, 804,1236, 803,1236, 802,1236, 801,1236, 800,1236, 799,1236, 798,1236, 797,1236, 796,1235, 795,1235, 794,1234, 793,1234, 792,1233, 791,1233, 790,1234, 790,1235, 790,1236, 791,1237, 791,1238, 791,1239, 791,1240, 791,1241, 791,1242, 791,1243,
791,1244, 792,1245, 793,1245, 794,1246, 794,1247, 795,1248, 796,1247, 796,1246, 797,1245, 798,1245, 799,1244, 800,1245, 800,1246, 800,1247, 801,1248, 801,1249, 801,1250, 800,1251, 799,1251, 798,1251, 797,1251, 796,1252, 796,1253, 795,1254, 794,1254, 793,1255, 792,1256, 791,1257, 790,1258, 790,1259, 790,1260, 790,1261, 790,1262,
790,1263, 790,1264, 790,1265, 790,1266, 789,1267, 788,1268, 787,1269, 788,1270, 788,1271, 788,1272, 787,1272, 786,1272, 785,1271, 784,1271, 783,1272, 782,1273, 781,1273, 780,1273, 779,1274, 778,1275, 777,1276, 777,1277, 776,1278, 776,1279, 775,1280, 775,1281, 774,1282, 775,1283, 775,1284, 776,1285, 777,1286, 778,1287, 779,1288,
779,1289, 780,1290, 781,1290, 782,1291, 783,1291, 784,1291, 785,1291, 786,1291, 787,1292, 787,1293, 787,1294, 787,1295, 787,1296, 786,1297, 786,1298, 786,1299, 786,1300, 785,1301, 784,1302, 783,1302, 782,1303, 782,1304, 781,1304, 780,1305, 779,1306, 778,1307, 777,1308, 776,1309, 775,1310, 774,1311, 774,1312, 774,1313, 774,1314,
774,1315, 774,1316, 773,1317, 773,1318, 773,1319, 772,1320, 771,1321, 770,1321, 769,1321, 768,1322, 767,1323, 766,1323, 765,1324, 765,1325, 764,1326, 764,1327, 764,1328, 763,1329, 764,1330, 764,1331, 764,1332, 765,1333, 765,1334, 765,1335, 765,1336, 765,1337, 766,1338, 766,1339, 767,1340, 767,1341, 768,1342, 769,1343, 769,1344,
768,1344, 767,1343, 766,1343, 765,1342, 764,1342, 763,1342, 762,1342, 761,1341, 760,1341, 759,1341, 758,1341, 757,1341, 756,1341, 755,1341, 754,1341, 753,1341, 752,1341, 751,1341, 750,1341, 749,1341, 748,1341, 747,1341, 746,1341, 745,1341, 744,1340, 743,1339, 743,1338, 743,1337, 743,1336, 743,1335, 743,1334, 743,1333, 743,1332,
743,1331, 743,1330, 743,1329, 743,1328, 742,1327, 741,1327, 740,1327, 739,1327, 738,1328, 737,1328, 736,1327, 736,1326, 736,1325, 736,1324, 736,1323, 736,1322, 736,1321, 736,1320, 736,1319, 736,1318, 737,1317, 737,1316, 737,1315, 737,1314, 738,1313, 739,1312, 739,1311, 740,1310, 740,1309, 741,1308, 742,1307, 742,1306, 742,1305,
742,1304, 742,1303, 743,1302, 743,1301, 742,1300, 742,1299, 742,1298, 742,1297, 742,1296, 743,1295, 744,1294, 745,1293, 745,1292, 746,1291, 746,1290, 746,1289, 747,1288, 747,1287, 747,1286, 747,1285, 747,1284, 747,1283, 747,1282, 747,1281, 747,1280, 747,1279, 747,1278, 747,1277, 747,1276, 748,1275, 748,1274, 749,1273, 748,1272,
747,1271, 746,1271, 746,1270, 745,1269, 746,1269, 747,1269, 748,1269, 749,1268, 749,1267, 748,1266, 747,1265, 746,1265, 746,1264, 746,1263, 746,1262, 746,1261, 746,1260, 746,1259, 745,1258, 745,1257, 745,1256, 745,1255, 745,1254, 745,1253, 745,1252, 745,1251, 745,1250, 745,1249, 745,1248, 745,1247, 745,1246, 746,1245, 746,1244,
747,1243, 747,1242, 747,1241, 746,1240, 746,1239, 745,1238, 745,1237, 746,1236, 746,1235, 746,1234, 746,1233, 746,1232, 746,1231, 747,1230, 747,1229, 747,1228, 747,1227, 747,1226, 747,1225, 747,1224, 748,1223, 748,1222, 748,1221, 749,1220, 749,1219, 749,1218, 749,1217, 750,1216, 751,1215, 751,1214, 751,1213, 751,1212, 751,1211,
751,1210, 750,1209, 750,1208, 750,1207, 750,1206, 750,1205, 750,1204, 750,1203, 751,1202, 751,1201, 751,1200, 751,1199, 751,1198, 751,1197, 752,1196, 753,1196, 754,1195, 754,1194, 755,1193, 755,1192, 755,1191, 755,1190, 755,1189, 755,1188, 755,1187, 756,1186, 756,1185, 756,1184, 757,1183, 757,1182, 757,1181, 758,1180, 758,1179,
759,1178, 759,1177, 759,1176, 759,1175, 759,1174, 759,1173, 758,1172, 757,1171, 757,1170, 757,1169, 757,1168, 757,1167, 756,1166, 756,1165, 756,1164, 756,1163, 756,1162, 756,1161, 755,1160, 755,1159, 755,1158, 755,1157, 755,1156, 755,1155, 755,1154, 756,1153, 756,1152, 757,1151, 757,1150, 758,1149, 758,1148, 759,1147, 759,1146,
759,1145, 759,1144, 759,1143, 759,1142, 759,1141, 759,1140, 759,1139, 760,1138, 760,1137, 760,1136, 760,1135, 761,1134, 761,1133, 761,1132, 762,1131, 763,1130, 764,1129, 764,1128, 765,1127, 765,1126, 766,1125, 767,1124, 768,1123, 769,1122, 769,1121, 769,1120, 768,1119, 768,1118, 768,1117, 768,1116, 768,1115, 768,1114, 768,1113,
768,1112, 768,1111, 768,1110, 768,1109, 768,1108, 768,1107, 768,1106, 769,1105, 770,1105, 771,1104, 772,1104, 773,1103, 774,1103, 775,1103, 776,1102, 776,1101, 777,1100, 777,1099, 777,1098, 777,1097, 777,1096, 777,1095, 777,1094, 777,1093, 778,1092, 779,1091, 779,1090, 780,1089, 781,1089, 782,1088, 782,1087" HREF="[URL unfurl="true"]http://www.yahoo.com">[/URL]
<AREA SHAPE="poly" ALT="Argentina" COORDS="767,1347, 768,1348, 769,1349, 769,1350, 770,1351, 770,1352, 771,1353, 770,1354, 770,1355, 771,1356, 771,1357, 772,1358, 773,1359, 774,1360, 775,1361, 776,1362, 777,1363, 778,1363, 779,1364, 780,1365, 781,1365, 781,1366, 782,1367, 783,1367, 784,1368, 785,1369, 786,1369, 787,1370,
788,1370, 789,1370, 790,1370, 790,1371, 790,1372, 789,1373, 788,1373, 787,1373, 786,1373, 785,1373, 784,1373, 783,1373, 782,1373, 781,1373, 780,1373, 779,1373, 778,1372, 777,1372, 776,1372, 775,1372, 774,1372, 773,1372, 772,1372, 771,1371, 770,1371, 769,1372, 768,1372, 768,1371, 767,1370, 767,1369, 767,1368, 767,1367, 767,1366,
767,1365, 767,1364, 767,1363, 767,1362, 767,1361, 767,1360, 767,1359, 767,1358, 767,1357, 767,1356, 767,1355, 767,1354, 767,1353, 767,1352, 767,1351, 767,1350, 767,1349, 767,1348" HREF="[URL unfurl="true"]http://www.yahoo.com">[/URL]
<AREA SHAPE="poly" ALT="Argentina" COORDS="799,1370, 800,1370, 800,1371, 799,1371, 798,1371, 797,1372, 796,1372, 795,1372, 794,1372, 794,1371, 795,1371, 796,1371, 797,1371, 798,1371" HREF="[URL unfurl="true"]http://www.yahoo.com">[/URL]
</MAP>
<!-- End ImageReady Slices -->
</BODY>
</HTML>

Thanks

Cassidy
 
Did you think about the split function?

I'm guessing this is all i one string variable.

Something like

dim strhtml as string 'contains html
dim strareas() as string 'contains areas
dim strcoords() as string 'contains coords

strareas = strhtml.split("<AREA")
strcoords = strareas(0).split("COORDS=")(1).split(",")

from here you should have all the coords in one area now it is up to you to put them in the right place

I would create a class and then a collection of that class. It would stiil need a loop to get everything in that collection.

Christiaan Baes
Belgium

"My new site" - Me
 
Thats how I have it now. Basically I use the instr function to find the <Area, Coord, and Href then position myself to the start and end of the coordinate group. Then I use the split function to peal it off. I was kinda wondering if I might be able to handle it like an XML file. That way I could call the attribute directly and split that making it faster and possibly a little easier to code.
 
It would help if I didn't have an ear infection in both my ears when I read that. That is much better than what I did.

Thanks
 
I heard that one before.

I wonder why people think there's something wrong with me, I'm perfectly normal most of the time.

Christiaan Baes
Belgium

"My new site" - Me
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top