Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Renaming the Microsoft Internet Control for weblogs. 1

Status
Not open for further replies.

Homersim2600

Technical User
Dec 15, 2004
253
US
Hello,
I am in the process of writing a personal spider-bot using the Microsoft Internet Control. I have been practicing the spider tecnique by crawling my own website. I have noticed in my weblogs that the browser name identifies itself as "Microsoft+URL+Control+Version...". I have seen the various identifiers of other bots, such as googlebot, and I am interested in changing this identifying information with my bot to read something much cooler than the aformentioned. However, I have had no success locating any information on how to go about this. Thank you for your help.

LF

p.s. if you would like to try this code out, then let me know. As of now, it is not really that fast, but I plan on tweaking it a bit more; the program is not yet complete.
 
Hello,
I have done some further research and I have found that if I could change the ProductName/Version on the MsiNet.OCX file, then I would be able to effectively change the name that is written into weblogs when my program visits the website. Does anybody know how I would go about doing this??? Thank you.

LF
 
Ok, if I keep answering myself then perhaps I will eventually figure out the answer to my problem...

I have found some more information regarding my problem. I learned how to change the "User-Agent" string for Internet Explorer, via the registry, but I was unable to find a similar listing for the Inet control. By-the-way, the proper name for what I am trying to change is "User-Agent"; just to clear that up...
Any Ideas????

LF
 
Ok, if I keep answering myself then perhaps I will eventually figure out the answer to my problem...

heh, works for me half the time :p

Personally I don't think I have ran accross what you are dealing with... so I don't have anything else to post... Sorry [sad]

Good Luck! ;-)

Have Fun, Be Young... Code BASIC
-Josh

cubee101.gif


PROGRAMMER: (n) Red-eyed, mumbling mammal capable of conversing with inanimate objects.
 
Hey no problem! I am seriously reconsidering rewriting the code for this. I simply cannot stand the fact that I can't change this one little string! It's funny, I have CommView installed on my computer and whenever I run the bot I watch the header information being sent out. It is so disappointing to think that my bot will be put into the category of all of the other bots that use the same control. Many websites are banning this User-Agent because people write these spiders without conforming to the Robots.txt protocol. My bot is different, and so I need to make it different by changing this little teeny-weeny string! lol. There has got to be a better way, I am thinking that there is a way to do it through the API, but I am not that knowledgeable about it to write the code. I am really disappointed that M-soft did not provide a provision for this--at least as far as I can tell now. Oh, well, perhaps there is another wizard out there with the answer.:)

LF
 
BUT... most sites depend on money from the ads on their site and want REAL people looking at them, and not just bots collecting information... so I can see the justification for the ban... [neutral]

If google is an exception, it may be that because it is the most widely used international search engine, and can bring income (as in people wanting to visit there site, and possible buy something or click ads), where google might be viewed, from a business viewpoint, as a gain rather than a loss...

I hope you find what you are looking for, and I am sure there is a way around it... though it may require some hardcore hex hacking ;-)

But good luck anyway :)

Have Fun, Be Young... Code BASIC
-Josh

cubee101.gif


PROGRAMMER: (n) Red-eyed, mumbling mammal capable of conversing with inanimate objects.
 
Yeah, I see what you mean, but in a sence, there will be real people looking at the pages once they are indexed. I really don't *have* to live by the robots.txt standard, but I thought I would just to be nice :). From the forums which I have stumbled upon whilst searching for "Microsoft URL Control", I can see that people are more worried about their E-mail addresses being sniffed from the web page. My bot can do that, but I have not really put that much into the effort of this, and so it won't do it really well. I am more interested in providing myself with webpages that are more of interest to me, which is something that Google used to do pretty well, but unfortunately, lately, it is not really performing like it used to. I really dislike all of the other search engines that are available because they are just so darn frustrating to deal with.

I have pretty much given up on finding a resolution to this problem... There is a way to do it, but it doesn't involve the Inet control, rather, it involves either Java, XML, or C++, and I do not know any of these languages. Now, I am working on the database aspect of the program which is really taking a long time because this seems to be a whole other language in itself! Anyway, Thanks for your input, and if you do see anything regarding this, then please feel free to let me know. Thanks.

LF
 
I have found the answer!!!!!!! I direct you to Thread763-257763 .

with special attention to the following line...

' create an internet connection
hOpen = InternetOpen("userAgent", INTERNET_OPEN_TYPE_DIRECT, vbNullString, vbNullString, 0)


"userAgent" is the actual User agent that is written to the weblog. I was able to verify it with my website. I am the happiest person alive right now!

Hats off to zubla8! (I think that was his handle)
 
Have a star for finding your answer...

It might be helpful someday ;-)

Have Fun, Be Young... Code BASIC
-Josh

cubee101.gif


PROGRAMMER: (n) Red-eyed, mumbling mammal capable of conversing with inanimate objects.
 
Seriously... there are many many post that people find there answer and nobody goes and post back what they found... then someone else comes along and has the same question but can't find an answer and starts another thread...

So thanks for posting back what you found, you never know who is going to run into the same problem, and need the solution you posted... that's what this forum is all about ;-)

Have Fun, Be Young... Code BASIC
-Josh

cubee101.gif


PROGRAMMER: (n) Red-eyed, mumbling mammal capable of conversing with inanimate objects.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top