This thread is for any network geniuses out there looking for a real challenge. It's probably going to be a long post, so go make some popcorn and put on your thinking caps. I guarantee this problem will pick your brain.
------
I was first notified of the problem when users pointed out they hadn't received any new internet mail for several days. Everything internally was working fine, but anything from anyone with an internet based email address simply wasn't showing up in their inboxes. Users would press the Send/Receive button, only to have the Send/Receive window come up for about 30 seconds, then disappear. There were no error messages, and there was no new mail.
I first thought that maybe our ISP was having problems, but we still had internet connectivity. Besides, if there was no connection to the internet, wouldn't Outlook spit back an error saying it couldn't find our POP3 server? Web pages would load at their normal speed, and we could still send outbound emails. (I verified this with a hotmail account)
I then thought I'd check and see if SpamButcher had anything to do with this. (SpamButcher is anti-spam software that runs on our server and checks our internet email accounts every 5 minutes for spam emails, downloading them locally if they are deemed spam, or leaving them at the POP server if they are deemed legitimate.) I thought maybe it was the problem... perhaps filtering out everyone's email as spam, however SpamButcher's last piece of recorded spam email was received on Saturday night at 10:38pm. It's as if all of the sudden, the SpamButcher program simply coudln't connect to our POP3 boxes anymore after 10:38pm.
First step was to call the Mail Hosting company (web2010 A.K.A HostCentric) and ask them if they're having any problems or any other users calling in complaining about mail being unretrievable. They said their systems were all running, there were no known problems, and nobody called in complaining about not being able to receive internet email. They asked us to run a tracert to their mail server. The tracert completed successfully with one row of "* * * * request timed out." - which from what I understand, doesn't matter, because IP just routes around that. Regardless, the tracert did make it to their mail server. Pinging the mail server also worked fine and gave a return in 61ms on average. So... It didn't seem like there was a connectivity problem.
The bottom line, though, was the fact that we still couldn't get any new internet email.
I sat there and thought... okay... maybe it's some sort of authentication problem. (Even though I know if the passwords were incorrect, we should have received some sort of error) So, I went into the Internet E-Mail settings inside my Outlook profile and intentionally set an incorrect password. I wanted to see if I was even getting to the point of authentication. I clicked send/receive, and after another 30 seconds, the window just went away and I had no new mail.
OK - This means authentication isn't taking place, because if it were, I would be getting a message telling me to reenter my password, which I didn't.
So... as per a request from the Mail Hosting company, I tried connecting into the POP3 server from an alternate ISP (Our main ISP is a 350k DSL line). We had a laptop with MSN8 dial-up access installed on it. I plugged it into a phone jack, dialed up, configured outlook with my mailbox settings, and bam, it worked... just like it always had. I really didn't expect it to work, so I was pretty much speechless at this point.
The tech on the line said it looked like it was a problem with our ISP (and that's sure what it looked like to me, too), so I thanked him for his time and called up Steel City Telecom to begin the round of questions. "Are all your servers and routers up and working? Are you experiencing any network outages or backbone failures? Have any users called in and complained about being unable to get their POP3 email" Naturally, everything was up and running fine and we were the only people calling in with an issue like this.
I even went as far as giving the tech from our ISP my POP3 username/password and had him set up my internet email box on his machine... i gave him permission to do a send/receive and the mail started coming in instantly on his machine.
Then I tried upping the "Server Timeout" on our POP3 server settings inside Microsoft Outlook. I set it to 5 minutes (the maximum) and after changing that, mail started coming in. The send/receives take FOREVER, but the mail DOES come in eventually. We sat there and counted out the seconds between the time we clicked send/receive an the time the first mail arrived. It was approximately 80 seconds. Taking 80 seconds to log into the mail server is unacceptable. Not only are all the users complaining, our SpamButcher program no longer catches spam mail (because it wasn't programmed to wait that long to log in to the POP3 box).
At this point, I'm looking for an answer and I'm quite lost. I fired up Ethereal (frame sniffing program) to see if maybe I could find anything in the sniffed frames that would lead me to an answer... what I found was quite interesting. Right when I click the Send/Receive button, the three way TCP handshake between my computer and the mail server takes place, and it is successful. It is then another 81 seconds before the mail server finally responds with "+OK QPOP (version 3.0.2) starting." message. Once this message is received, the login procedure happens immediately, and all mail is downloaded at the normal rate.
So... What is happening between the time of the three way handshake and the mail server's username request? Whatever it is, it's taking forever for it to happen.
I called back the ISP and Mail Serving company so many times, I think I know all the tech's schedules and favorite movies. I've tried resetting our DSL modem to no avail, but I REALLY don't think it has anything to do with any hardware/software on our end because, again, the problem seems to have started Saturday night at 10:38pm, and there wasn't anyone here at that time to change anything.
In my search for the truth, I found out that you can telnet into mail servers. I tried telnetting into our mail server, only to be greeted with an 81 second delay before prompted with the "+OK QPOP (version 3.0.2) starting." message. I'm really starting to wonder why it's consistently taking 80-82 seconds for the mail server to respond to the initial 3-way SYN/ACK sequence.
I have a personal website with a few POP3 accounts on it, and when I tried telnetting into that mail server, it worked fine and quickly.
I called the ISP back and had them try to telnet into our mail server. The tech said the QPOP banner came up instantly on his screen.
This is where the issue currently stands. I have no idea what to possibly do or where to possibly turn next. Kudos if you actually read all the way to the bottom of this post.
Any and all help would be greatly appreciated!
Thanks!
Ian Glinka <ian@mleco.com>
------
I was first notified of the problem when users pointed out they hadn't received any new internet mail for several days. Everything internally was working fine, but anything from anyone with an internet based email address simply wasn't showing up in their inboxes. Users would press the Send/Receive button, only to have the Send/Receive window come up for about 30 seconds, then disappear. There were no error messages, and there was no new mail.
I first thought that maybe our ISP was having problems, but we still had internet connectivity. Besides, if there was no connection to the internet, wouldn't Outlook spit back an error saying it couldn't find our POP3 server? Web pages would load at their normal speed, and we could still send outbound emails. (I verified this with a hotmail account)
I then thought I'd check and see if SpamButcher had anything to do with this. (SpamButcher is anti-spam software that runs on our server and checks our internet email accounts every 5 minutes for spam emails, downloading them locally if they are deemed spam, or leaving them at the POP server if they are deemed legitimate.) I thought maybe it was the problem... perhaps filtering out everyone's email as spam, however SpamButcher's last piece of recorded spam email was received on Saturday night at 10:38pm. It's as if all of the sudden, the SpamButcher program simply coudln't connect to our POP3 boxes anymore after 10:38pm.
First step was to call the Mail Hosting company (web2010 A.K.A HostCentric) and ask them if they're having any problems or any other users calling in complaining about mail being unretrievable. They said their systems were all running, there were no known problems, and nobody called in complaining about not being able to receive internet email. They asked us to run a tracert to their mail server. The tracert completed successfully with one row of "* * * * request timed out." - which from what I understand, doesn't matter, because IP just routes around that. Regardless, the tracert did make it to their mail server. Pinging the mail server also worked fine and gave a return in 61ms on average. So... It didn't seem like there was a connectivity problem.
The bottom line, though, was the fact that we still couldn't get any new internet email.
I sat there and thought... okay... maybe it's some sort of authentication problem. (Even though I know if the passwords were incorrect, we should have received some sort of error) So, I went into the Internet E-Mail settings inside my Outlook profile and intentionally set an incorrect password. I wanted to see if I was even getting to the point of authentication. I clicked send/receive, and after another 30 seconds, the window just went away and I had no new mail.
OK - This means authentication isn't taking place, because if it were, I would be getting a message telling me to reenter my password, which I didn't.
So... as per a request from the Mail Hosting company, I tried connecting into the POP3 server from an alternate ISP (Our main ISP is a 350k DSL line). We had a laptop with MSN8 dial-up access installed on it. I plugged it into a phone jack, dialed up, configured outlook with my mailbox settings, and bam, it worked... just like it always had. I really didn't expect it to work, so I was pretty much speechless at this point.
The tech on the line said it looked like it was a problem with our ISP (and that's sure what it looked like to me, too), so I thanked him for his time and called up Steel City Telecom to begin the round of questions. "Are all your servers and routers up and working? Are you experiencing any network outages or backbone failures? Have any users called in and complained about being unable to get their POP3 email" Naturally, everything was up and running fine and we were the only people calling in with an issue like this.
I even went as far as giving the tech from our ISP my POP3 username/password and had him set up my internet email box on his machine... i gave him permission to do a send/receive and the mail started coming in instantly on his machine.
Then I tried upping the "Server Timeout" on our POP3 server settings inside Microsoft Outlook. I set it to 5 minutes (the maximum) and after changing that, mail started coming in. The send/receives take FOREVER, but the mail DOES come in eventually. We sat there and counted out the seconds between the time we clicked send/receive an the time the first mail arrived. It was approximately 80 seconds. Taking 80 seconds to log into the mail server is unacceptable. Not only are all the users complaining, our SpamButcher program no longer catches spam mail (because it wasn't programmed to wait that long to log in to the POP3 box).
At this point, I'm looking for an answer and I'm quite lost. I fired up Ethereal (frame sniffing program) to see if maybe I could find anything in the sniffed frames that would lead me to an answer... what I found was quite interesting. Right when I click the Send/Receive button, the three way TCP handshake between my computer and the mail server takes place, and it is successful. It is then another 81 seconds before the mail server finally responds with "+OK QPOP (version 3.0.2) starting." message. Once this message is received, the login procedure happens immediately, and all mail is downloaded at the normal rate.
So... What is happening between the time of the three way handshake and the mail server's username request? Whatever it is, it's taking forever for it to happen.
I called back the ISP and Mail Serving company so many times, I think I know all the tech's schedules and favorite movies. I've tried resetting our DSL modem to no avail, but I REALLY don't think it has anything to do with any hardware/software on our end because, again, the problem seems to have started Saturday night at 10:38pm, and there wasn't anyone here at that time to change anything.
In my search for the truth, I found out that you can telnet into mail servers. I tried telnetting into our mail server, only to be greeted with an 81 second delay before prompted with the "+OK QPOP (version 3.0.2) starting." message. I'm really starting to wonder why it's consistently taking 80-82 seconds for the mail server to respond to the initial 3-way SYN/ACK sequence.
I have a personal website with a few POP3 accounts on it, and when I tried telnetting into that mail server, it worked fine and quickly.
I called the ISP back and had them try to telnet into our mail server. The tech said the QPOP banner came up instantly on his screen.
This is where the issue currently stands. I have no idea what to possibly do or where to possibly turn next. Kudos if you actually read all the way to the bottom of this post.
Any and all help would be greatly appreciated!
Thanks!
Ian Glinka <ian@mleco.com>