Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Thesaurus for php

Status
Not open for further replies.

pcs800

IS-IT--Management
Apr 9, 2002
339
US
I have a site that uses php and mysql, we would like to incorporate some type of thesaurus or a way to have the script run and change words to other words with similar meaning. Is there such a script already compiled or maybe a downloadable thesaurus DB or huge txt file or something?

Eric VanLandingham
The Bargain Monkey
 
Ok maybe i need to elaborate a bit.
We use product catalogs from multiple stores to fill our site with content. These catalogs come as txt files and our script reads them and creates pages for each product.
the problem is that anyone else who uses the same stores catalog probably does about the same thing, therefor we have duplicate content. Google, I mean God, has decided to crack down on duplicate content taking our site from 15,000 page views per day to less than 500, so we need to alter the output of the script to make the content unique.
any help at all would be great.

Eric VanLandingham
The Bargain Monkey
 
It would seem to me that automagically doing this through the use of a digital thesaurus is not the way to go. Have you ever read an essay written by a third-grade student who has just discovered the thesaurus? That's what your content will sound like.

I don't know how much content you're talking about, but nothing will beat a human being's paraphrasing.


Want the best answers? Ask the best questions!

TANSTAAFL!!
 
Well let me give you some quick info.
over 1 million products = 1 million+ pages = 5 million+ keywords.
Keywords are generated from relevant content in the body of the product page.

For example:
A product page on our site for buster brown socks has a title of this:
Buster Brown 6-Pack Infant Boys Crew Sock

Which could be changed to this:
Buster Brown 6-Pack baby Boys Crew Sock

This would change Infant to baby in the title, description, keywords and body, which would eliminate our duplicate content problem.

I agree that human being paraphrasing is the best route, but not in a case like ours with so many pages and products. Not to mention we get a new set of product catalogs every week, so it has to be part of the script or it will be impossible for human paraphrasing.

Eric VanLandingham
The Bargain Monkey
 
If you're doing keywords, perhaps a third-grade thesaurus lookup is all you need.

There's lots of data and libraries out there -- I've never had a need for this, but a Google search pulled up lots of vendors.


Want the best answers? Ask the best questions!

TANSTAAFL!!
 
Yes, an engine search pulls up lots of vendors selling thesaurus makers or thesaurus software. But to input a word and have it come up with the synonyms automatically is hard to find. Also needs to export the finished database to a text file of some sort.

Eric VanLandingham
The Bargain Monkey
 
I dont know if I'm off beam here but...
If you go to you can download the text of Rogets. Now you will probabbly have to dig about a bit but you should be able to extract the word and the synonyms (or whatever they are) and load them into a mysql database with an appropriate list structure supporting it (something like insert the key word in one table and then insert into second table which allows duplicates and entry for each word and the syonym, you might not even need the first table except to do a quick look up.
You could code a function which takes a word as input and returns a array of posible words, you take you pick.
Am I on the correct lines here ?
 
You are on the correct line.....almost.
Most of what you posted is right on the money, i will look at the file and let you know if it will work. Thanks!

Eric VanLandingham
The Bargain Monkey
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top