array comparison & deduper vs. broken xml patch

californe · Sep 3, 2007

i'm just starting out and i've currently got a program someone else deisgned and am trying to fix a broken element within it... so that aside, it is basically an xml highlighting link engine. The engine searches for specific words from a list, as well as non-lexicon phrases, so any combinations break the xml. i devised a rough patch to fix the broken xml, but due to the way it searches, if it finds one, it will interrupt the phrase or list of words.

specifically it's for reports, and within the program it searches, in order:
1.nonlexicon phrases
2.dates
3.(list of phrases to look for)
4.locations

is there anyway i can create something to scan the arrays to find out if any could be contained within eachother?

if i place non-lexicon last, it will not find non-lexicons should a location/date/phrase that resides within the non-lexicon

i was talking to a friend, who's currently on vacation, and he mentioned creating something to scan each word within an array to see if that term/phrase would match any other term/phrase (or portion therein) to kill duplicates

any ideas the direction to go for this?

my current patch is ugly but it fixes a part of it:

(the way i have it broken down for editing)

Code:

$FileContents=~s/
<a href..name=
($term1)
      <ahref..name=
      ($term2)
      \"\>
      ($term2)
($term3)
\"\>
($term1)
      <ahref..name=
      ($term2)
      \"\>
      ($term2)
      \<\/a\>
($term3)
\<\/a\>
/
<a href..name=
$1
\"\>
$5
\<\/a\>
      <a href..name=
      $2
      \"\>
      $3
      \<\/a\>
<a href..name=
$4
\"\>
$8
\<\/a\>
/gi;

term one and 3 being a non-lexicon and term2 being a word/phrase actually searched for.

this is actually one of my smallest sample strings.. i had one going to.. 50 some-odd temporary variables.

massive apologies for the large amount of text

my extreme gratitude for any help.

KevinADC · Sep 3, 2007

what does any of that have to do with:

array comparison & deduper vs. broken xml patch

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

californe · Sep 3, 2007

well, i'm trying to figure out how to make a array comparison & deduper, and i currently only have a half-way usable patch...

KevinADC · Sep 3, 2007

Maybe after you get your patch to work you can use one of the deduper modules:

http://search.cpan.org/search?query=deduper&mode=all

posting some sample data might assist anyone that wants to try and help you with your patch.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

californe · Sep 4, 2007

sample data may be difficult.. i can post some of the program/patch if that would help

heres some of the patch..

Code:

	my($NLstart)="\<a href\=\"\#__top__\" style\=\"text\-decoration:none\;color:\#AD7C22\;\" name\=\"";

	my($CONstart)="\<a href\=\"\#__top__\" style\=\"background\-color:\#113377\;text\-decoration:none\;color:\#FFFFFF\;font\-weight:900\;\" name\=\"";
	my($CONstart2)="\<a href\=\"\#__top__\" style\=\"background\-color:\#113377\;text\-decoration:none\;color:\#AD7C22\;font\-weight:900\;\" name\=\"";

	my($GALstart)="<a href=\"\#__top__\" style\=\"background\-color:\#71B7B7\;text\-decoration:none\;color:\#000000\;font\-weight:900\;\" name\=\"";
	my($GALstart2)="<a href=\"\#__top__\" style\=\"background\-color:\#71B7B7\;text\-decoration:none\;color:\#AD7C22\;font\-weight:900\;\" name\=\"";
	my($GALstart3)="<a href=\"\#__top__\" style\=\"background\-color:\#71B7B7\;text\-decoration:none\;color:\#FFFFFF\;font\-weight:900\;\" name\=\"";
		
	#Yes, i know i could use the same term, but for editing purposes, it keeps me sane.
	my($tt_1)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_2)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_3)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_4)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_5)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_6)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_7)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_8)="[a-zA-Z 0-9':.,?;\(\)-]*";
	my($tt_9)="[a-zA-Z 0-9':.,?;\(\)-]*";
	
#step one: kill outer layer
	foreach($FileContents=~m/$CONstart($tt_1)\"\>$CONstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$CONstart$1\"\>$3\<\/a\>\<\*\>/gi;)
	{		
		$FileContents=~s/$CONstart($tt_1)\"\>$CONstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$CONstart$1\"\>$3\<\/a\>\<\*\>/gi;
	}
	
	foreach($FileContents=~m/$CONstart($tt_1)\"\>$CONstart($tt_1)\"\>($tt_1)\<\/a\>\<\*\>\<\/a\>/$CONstart$1\"\>$3\<\/a\>\<\*\>/gi;)
	{		
		$FileContents=~s/$CONstart($tt_1)\"\>$CONstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$CONstart$1\"\>$3\<\/a\>\<\*\>/gi;
	}
	
	
	foreach($FileContents=~s/$GALstart($tt_1)\"\>$GALstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$GALstart$1\"\>$3\<\/a\>\<\*\>/gi;)
	{
		$FileContents=~s/$GALstart($tt_1)\"\>$GALstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$GALstart$1\"\>$3\<\/a\>\<\*\>/gi;
	}
	foreach($FileContents=~s/$GALstart($tt_1)\"\>$GALstart($tt_1)\"\>($tt_1)\<\/a\>\<\/a\>/$GALstart$1\"\>$3\<\/a\>\<\*\>/gi;)
		{
			$FileContents=~s/$GALstart($tt_1)\"\>$GALstart($tt_1)\"\>($tt_1)\<\/a\>\<\*\>\<\/a\>/$GALstart$1\"\>$3\<\/a\>\<\*\>/gi;
	}

KevinADC · Sep 4, 2007

my goodness mate, give yourself a break and rewrite your strings using the q{} operator instead of those hard-to-read-double-quoted-full-of-escapes stuff:

Code:

 my $NLstart = q{<a href="#__top__" style="text-decoration:none;color:#AD7C22;" name="};

there is no need to put parenthesis around your scalars. Parenthesis are for making lists and other things, but not for when you assign a string to a single scalar.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

KevinADC · Sep 4, 2007

You may want to look into using an HTML parser too. There are a number of them on CPAN:

http://search.cpan.org/search?query=html::Parser&mode=all

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

californe · Sep 4, 2007

thanks

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

array comparison & deduper vs. broken xml patch

californe

Technical User

KevinADC

Technical User

californe

Technical User

KevinADC

Technical User

californe

Technical User

KevinADC

Technical User

KevinADC

Technical User

californe

Technical User

Similar threads

Part and Inventory Search

Sponsor

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

array comparison &amp; deduper vs. broken xml patch

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor

array comparison & deduper vs. broken xml patch