Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

array comparison & deduper vs. broken xml patch

Status
Not open for further replies.

californe

Technical User
Sep 3, 2007
4
US
i'm just starting out and i've currently got a program someone else deisgned and am trying to fix a broken element within it... so that aside, it is basically an xml highlighting link engine. The engine searches for specific words from a list, as well as non-lexicon phrases, so any combinations break the xml. i devised a rough patch to fix the broken xml, but due to the way it searches, if it finds one, it will interrupt the phrase or list of words.

specifically it's for reports, and within the program it searches, in order:
1.nonlexicon phrases
2.dates
3.(list of phrases to look for)
4.locations

is there anyway i can create something to scan the arrays to find out if any could be contained within eachother?

if i place non-lexicon last, it will not find non-lexicons should a location/date/phrase that resides within the non-lexicon

i was talking to a friend, who's currently on vacation, and he mentioned creating something to scan each word within an array to see if that term/phrase would match any other term/phrase (or portion therein) to kill duplicates

any ideas the direction to go for this?

my current patch is ugly but it fixes a part of it:

(the way i have it broken down for editing)
Code:
$FileContents=~s/
<a href..name=
($term1)
      <ahref..name=
      ($term2)
      \"\>
      ($term2)
($term3)
\"\>
($term1)
      <ahref..name=
      ($term2)
      \"\>
      ($term2)
      \<\/a\>
($term3)
\<\/a\>
/
<a href..name=
$1
\"\>
$5
\<\/a\>
      <a href..name=
      $2
      \"\>
      $3
      \<\/a\>
<a href..name=
$4
\"\>
$8
\<\/a\>
/gi;
term one and 3 being a non-lexicon and term2 being a word/phrase actually searched for.

this is actually one of my smallest sample strings.. i had one going to.. 50 some-odd temporary variables.

massive apologies for the large amount of text

my extreme gratitude for any help.
 
what does any of that have to do with:

array comparison & deduper vs. broken xml patch

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
well, i'm trying to figure out how to make a array comparison & deduper, and i currently only have a half-way usable patch...
 
sample data may be difficult.. i can post some of the program/patch if that would help

heres some of the patch..
Code:
	my($NLstart)="\<a href\=\"\#__top__\" style\=\"text\-decoration:none\;color:\#AD7C22\;\" name\=\"";

	my($CONstart)="\<a href\=\"\#__top__\" style\=\"background\-color:\#113377\;text\-decoration:none\;color:\#FFFFFF\;font\-weight:900\;\" name\=\"";
	my($CONstart2)="\<a href\=\"\#__top__\" style\=\"background\-color:\#113377\;text\-decoration:none\;color:\#AD7C22\;font\-weight:900\;\" name\=\"";

	my($GALstart)="<a href=\"\#__top__\" style\=\"background\-color:\#71B7B7\;text\-decoration:none\;color:\#000000\;font\-weight:900\;\" name\=\"";
	my($GALstart2)="<a href=\"\#__top__\" style\=\"background\-color:\#71B7B7\;text\-decoration:none\;color:\#AD7C22\;font\-weight:900\;\" name\=\"";
	my($GALstart3)="<a href=\"\#__top__\" style\=\"background\-color:\#71B7B7\;text\-decoration:none\;color:\#FFFFFF\;font\-weight:900\;\" name\=\"";
		
	#Yes, i know i could use the same term, but for editing purposes, it keeps me sane.
	my($tt_1)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_2)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_3)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_4)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_5)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_6)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_7)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_8)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_9)="[a-zA-Z 0-9':.,?;\(\)-]*";
	
#step one: kill outer layer
	foreach($FileContents=~m/$CONstart($tt_1)\"\>$CONstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$CONstart$1\"\>$3\<\/a\>\<\*\>/gi;)
	{		
		$FileContents=~s/$CONstart($tt_1)\"\>$CONstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$CONstart$1\"\>$3\<\/a\>\<\*\>/gi;
	}
	
	foreach($FileContents=~m/$CONstart($tt_1)\"\>$CONstart($tt_1)\"\>($tt_1)\<\/a\>\<\*\>\<\/a\>/$CONstart$1\"\>$3\<\/a\>\<\*\>/gi;)
	{		
		$FileContents=~s/$CONstart($tt_1)\"\>$CONstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$CONstart$1\"\>$3\<\/a\>\<\*\>/gi;
	}
	
	
	foreach($FileContents=~s/$GALstart($tt_1)\"\>$GALstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$GALstart$1\"\>$3\<\/a\>\<\*\>/gi;)
	{
		$FileContents=~s/$GALstart($tt_1)\"\>$GALstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$GALstart$1\"\>$3\<\/a\>\<\*\>/gi;
	}
	foreach($FileContents=~s/$GALstart($tt_1)\"\>$GALstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$GALstart$1\"\>$3\<\/a\>\<\*\>/gi;)
		{
			$FileContents=~s/$GALstart($tt_1)\"\>$GALstart($tt_1)\"\>($tt_1)\<\/a\>\<\*\>\<\/a\>/$GALstart$1\"\>$3\<\/a\>\<\*\>/gi;
	}
 
my goodness mate, give yourself a break and rewrite your strings using the q{} operator instead of those hard-to-read-double-quoted-full-of-escapes stuff:

Code:
 my $NLstart = q{<a href="#__top__" style="text-decoration:none;color:#AD7C22;" name="};

there is no need to put parenthesis around your scalars. Parenthesis are for making lists and other things, but not for when you assign a string to a single scalar.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top