Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Seemingly Unsolvable Error... 1

Status
Not open for further replies.

IanGlinka

IS-IT--Management
Joined
Feb 28, 2002
Messages
215
Location
US
This thread is for any network geniuses out there looking for a real challenge. It's probably going to be a long post, so go make some popcorn and put on your thinking caps. I guarantee this problem will pick your brain.

------

I was first notified of the problem when users pointed out they hadn't received any new internet mail for several days. Everything internally was working fine, but anything from anyone with an internet based email address simply wasn't showing up in their inboxes. Users would press the Send/Receive button, only to have the Send/Receive window come up for about 30 seconds, then disappear. There were no error messages, and there was no new mail.

I first thought that maybe our ISP was having problems, but we still had internet connectivity. Besides, if there was no connection to the internet, wouldn't Outlook spit back an error saying it couldn't find our POP3 server? Web pages would load at their normal speed, and we could still send outbound emails. (I verified this with a hotmail account)

I then thought I'd check and see if SpamButcher had anything to do with this. (SpamButcher is anti-spam software that runs on our server and checks our internet email accounts every 5 minutes for spam emails, downloading them locally if they are deemed spam, or leaving them at the POP server if they are deemed legitimate.) I thought maybe it was the problem... perhaps filtering out everyone's email as spam, however SpamButcher's last piece of recorded spam email was received on Saturday night at 10:38pm. It's as if all of the sudden, the SpamButcher program simply coudln't connect to our POP3 boxes anymore after 10:38pm.

First step was to call the Mail Hosting company (web2010 A.K.A HostCentric) and ask them if they're having any problems or any other users calling in complaining about mail being unretrievable. They said their systems were all running, there were no known problems, and nobody called in complaining about not being able to receive internet email. They asked us to run a tracert to their mail server. The tracert completed successfully with one row of "* * * * request timed out." - which from what I understand, doesn't matter, because IP just routes around that. Regardless, the tracert did make it to their mail server. Pinging the mail server also worked fine and gave a return in 61ms on average. So... It didn't seem like there was a connectivity problem.

The bottom line, though, was the fact that we still couldn't get any new internet email.

I sat there and thought... okay... maybe it's some sort of authentication problem. (Even though I know if the passwords were incorrect, we should have received some sort of error) So, I went into the Internet E-Mail settings inside my Outlook profile and intentionally set an incorrect password. I wanted to see if I was even getting to the point of authentication. I clicked send/receive, and after another 30 seconds, the window just went away and I had no new mail.

OK - This means authentication isn't taking place, because if it were, I would be getting a message telling me to reenter my password, which I didn't.

So... as per a request from the Mail Hosting company, I tried connecting into the POP3 server from an alternate ISP (Our main ISP is a 350k DSL line). We had a laptop with MSN8 dial-up access installed on it. I plugged it into a phone jack, dialed up, configured outlook with my mailbox settings, and bam, it worked... just like it always had. I really didn't expect it to work, so I was pretty much speechless at this point.

The tech on the line said it looked like it was a problem with our ISP (and that's sure what it looked like to me, too), so I thanked him for his time and called up Steel City Telecom to begin the round of questions. "Are all your servers and routers up and working? Are you experiencing any network outages or backbone failures? Have any users called in and complained about being unable to get their POP3 email" Naturally, everything was up and running fine and we were the only people calling in with an issue like this.

I even went as far as giving the tech from our ISP my POP3 username/password and had him set up my internet email box on his machine... i gave him permission to do a send/receive and the mail started coming in instantly on his machine.

Then I tried upping the "Server Timeout" on our POP3 server settings inside Microsoft Outlook. I set it to 5 minutes (the maximum) and after changing that, mail started coming in. The send/receives take FOREVER, but the mail DOES come in eventually. We sat there and counted out the seconds between the time we clicked send/receive an the time the first mail arrived. It was approximately 80 seconds. Taking 80 seconds to log into the mail server is unacceptable. Not only are all the users complaining, our SpamButcher program no longer catches spam mail (because it wasn't programmed to wait that long to log in to the POP3 box).

At this point, I'm looking for an answer and I'm quite lost. I fired up Ethereal (frame sniffing program) to see if maybe I could find anything in the sniffed frames that would lead me to an answer... what I found was quite interesting. Right when I click the Send/Receive button, the three way TCP handshake between my computer and the mail server takes place, and it is successful. It is then another 81 seconds before the mail server finally responds with "+OK QPOP (version 3.0.2) starting." message. Once this message is received, the login procedure happens immediately, and all mail is downloaded at the normal rate.

So... What is happening between the time of the three way handshake and the mail server's username request? Whatever it is, it's taking forever for it to happen.

I called back the ISP and Mail Serving company so many times, I think I know all the tech's schedules and favorite movies. I've tried resetting our DSL modem to no avail, but I REALLY don't think it has anything to do with any hardware/software on our end because, again, the problem seems to have started Saturday night at 10:38pm, and there wasn't anyone here at that time to change anything.

In my search for the truth, I found out that you can telnet into mail servers. I tried telnetting into our mail server, only to be greeted with an 81 second delay before prompted with the "+OK QPOP (version 3.0.2) starting." message. I'm really starting to wonder why it's consistently taking 80-82 seconds for the mail server to respond to the initial 3-way SYN/ACK sequence.

I have a personal website with a few POP3 accounts on it, and when I tried telnetting into that mail server, it worked fine and quickly.

I called the ISP back and had them try to telnet into our mail server. The tech said the QPOP banner came up instantly on his screen.

This is where the issue currently stands. I have no idea what to possibly do or where to possibly turn next. Kudos if you actually read all the way to the bottom of this post.

Any and all help would be greatly appreciated!

Thanks!
Ian Glinka <ian@mleco.com>
 
Sounds like maybe the mail server is trying to authenticate your connecting IP address via DNS before it lets you in. For example, we have a Linux server in the office which I SSH to for command line access. In it's /etc/hosts file I have a host entry for my laptop so when I connect to it the authentication box pops up straight away. However, the other day I changed my IP address and then tried to SSH to the server. It just hung for about a minute and while I was testing connectivity from my laptop all of a sudden the authentication box popped up and I was able to log in. Looking in the auth log files showed that it was trying to reverse map my IP address but couldn't, which caused the delay in it responding to the connection request.

This is the only thing that I can think of. Sorry.

Chris.


**********************
Chris Andrew, CCNA, CCSA
chris@iproute.co.uk
**********************
 
to IFRs- I haven't tried to check it with another program, neccessarily, but I DID try to telnet to the mail server, which takes exactly as long as Outlook does. At this point in time, I have ruled out mail client software malfunction.

Ian
 
to ipconfig- Based upon your thoughts on the matter, what would I be able to do about it? I'm not sure if that's the problem, really, though, because the IP address of our DSL modem hasn't changed, and I don't have the authorization to play around with any hosts files between us and the mail server. Whatever happened, it happened Saturday at 10:38pm and hasn't been right since. What irks me the most is that I just know something somewhere HAD to have gone down or have been changed at that moment... and I KNOW it wasn't here.

Ian
 
Speak to your ISP and ask them to check the authentication logs on the firewall. If they can't or won't help, change ISP!!

Chris.


**********************
Chris Andrew, CCNA, CCSA
chris@iproute.co.uk
**********************
 
Haha believe me, I have every intention to change my ISP!!!

Thanks for the help. I'll let you know what they say.

Ian
 
From Sam Spade:

resolves to 216.157.10.222

216.157.10.222 has no reverse DNS configured.

Your ISP may have a corrupt reverse DNS cache.
 
Okay, who are mleco.com and what has thier web site address got to do with this POP3 problem? Did I miss something?

Chris.


**********************
Chris Andrew, CCNA, CCSA
chris@iproute.co.uk
**********************
 
Also from Sam Spade:

Mail for mleco.com is handled by mleco.com (100) 216.157.10.222

Most other DNS lookups list port 10 for the mail service rather than port 100.

 
iproute:

From poster's signature. Same as yours:

Mail for iproute.co.uk is handled by mail-in.iproute.co.uk (10) 213.249.145.35

213.249.145.35 has valid reverse DNS of mail-in.isg.kcom.com
 
I don't think that the priority of the MX record is going to affect anything, be it 10, 100 or 1000. It's not the port number.

Anyway, you might be on to something with the reverse DNS thing. I doubt that they are doing reverse DNS when connecting to the POP server but I guess that it's a possibility. Reverse DNS is usually just for SMTP, but hey ... who knows?

Chris.


**********************
Chris Andrew, CCNA, CCSA
chris@iproute.co.uk
**********************
 
UPDATE: About 10 minutes ago, the timeout problem disappeared, and all problems are cleared. I called our ISP and asked if they had just done anything to their systems. The tech confirmed that they had just brought back a DNS Server that had crashed late Sunday night. (Gee... thanks for telling me about that, guys)

Though we initially started having problems Saturday night, I'm thinking that particular DNS server started to die on Saturday, or perhaps even did die on Saturday, but wasn't actually recorded by the ISP as being down until Sunday night.

Now... could someone explain to me how this one DNS server going down would cause this kind of strange issue? I'm thinking this particular DNS Server housed some sort of reverse lookup information pertaining to either our DSL modem or our POP3 server, and once it was out of the picture, the mail server was timing out when trying to do a reverse lookup on our IP address.

I'm not entirely sure about all of this, though, because I'm kind of new to DNS... if one of you guys or someone else could step in and maybe explain how this is possible???

I mean, the problem IS solved... but why did it happen in the first place? Is there something I can say to my ISP so that it won't be a problem in the future?

Everything's working but I feel so unsatisfied!!! Grrr...

Thanks again for all the help!!!

Ian Glinka
 
Some mail servers are using reverse DNS lookups for security or spam elimination, but if this were the case, connections would have been refused. Some servers are using reverse DNS lookups and then just logging the information or using it in headers. This was probably what was going on - the server did a reverse DNS lookup, then timed out with no response. Then log entries somewhere probably then recorded that reverse DNS wasn't available for these connections.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top