Problem description:
Lookups for certain hosts/records result in no response to the client at best (timeouts), at worst results in a SRV FAIL message from our internal DNS servers.
If a client queries an external DNS server the response fast and correct. For certain domains though its like they just don't exist...
Example domains:
(lb.msnb.com)
We are seeing a LOT of "Badly formed DNS" and "Illegal query format" Even after we disable strict DNS/UDP protocol enforcement the errors stopped being logged BUT DNS SERVER FAILURE problems remained.
Setup:
Client PCs (various OSs) --->2 x DNS Server (NT 4.0 SP6a) ----> FW (NAT) --->Internet
DNS Server is not forwarding queries, but resolving it itself.
Internal domain looks/checks out OK. Both Internal DNS servers produce identical results (Server Failure code back to the requesting PC)
Things I've tried to test/diagnose issues:
We have checked that these sites are functioning and up from several different outside DNS servers. This eliminates the sites being down. We are able to resolve 99% of all other domain queries without any problems
(and FAST).
There is no problem with AddressAnswerLimit since all clients (W2K, NT, Win9x, Redhat Linux 7.3) exhibit the same problem. MS KB Q164300.
MS KB Q295933 describes such an error when it receives an non-authoritative response.
MS KB Q159310 "Updated version of DNS fixes several problems." After installing the patch it did not fix the problem.
Send queries from an internal PC to an internal BIND DNS server running on RHN 7.3 (bind-9.2.1-9) Same server failures were observed.
Consulted MS KB Q186820, Q295611, Q251384,
In all cases Root hints were updated.
Example trace of such a failures (cleared of private info.)
(Ethereal trace
Internet Protocol, Src Addr: 192.168.X.YYY (192.168.X.YYY), Dst Addr:
192.168.Z. Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..0. = ECN-Capable Transport (ECT): 0
.... ...0 = ECN-CE: 0
Total Length: 72
Identification: 0x8b2a
Flags: 0x04
.1.. = Don't fragment: Set
..0. = More fragments: Not set
Fragment offset: 0
Time to live: 128
Protocol: TCP (0x06)
Header checksum: 0xeb15 (correct)
Source: 192.168.X.YYY (192.168.X.YYY)
Destination: 192.168.Z.Transmission Control Protocol, Src Port: domain (53), Dst Port: 1440 (1440),
Seq: 39785541, Ack: 694240, Len: 32
Source port: domain (53)
Destination port: 1440 (1440)
Sequence number: 39785541
Next sequence number: 39785573
Acknowledgement number: 694240
Header length: 20 bytes
Flags: 0x0018 (PSH, ACK)
0... .... = Congestion Window Reduced (CWR): Not set
.0.. .... = ECN-Echo: Not set
..0. .... = Urgent: Not set
...1 .... = Acknowledgment: Set
.... 1... = Push: Set
.... .0.. = Reset: Not set
.... ..0. = Syn: Not set
.... ...0 = Fin: Not set
Window size: 8728
Checksum: 0x5556 (correct)
Domain Name System (response)
Length: 30
Transaction ID: 0x090f
Flags: 0x8182 (Standard query response, Server failure)
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .0.. .... .... = Authoritative: Server is not an authority for
domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...1 .... .... = Recursion desired: Do query recursively
.... .... 1... .... = Recursion available: Server can do recursive
queries
.... .... ..0. .... = Answer authenticated: Answer/authority portion
was not authenticated by the server
.... .... .... 0010 = Reply code: Server failure (2)
Questions: 1
Answer RRs: 0
Authority RRs: 0
Additional RRs: 0
Queries
lb.msnbc.com: type ANY, class inet
Name: lb.msnbc.com
Type: Request for all records
Class: inet
David Nemeth
Systems Administrator
dnemeth@connectpositronic.com
Positronic Industries, Inc.
Lookups for certain hosts/records result in no response to the client at best (timeouts), at worst results in a SRV FAIL message from our internal DNS servers.
If a client queries an external DNS server the response fast and correct. For certain domains though its like they just don't exist...
Example domains:
(lb.msnb.com)
We are seeing a LOT of "Badly formed DNS" and "Illegal query format" Even after we disable strict DNS/UDP protocol enforcement the errors stopped being logged BUT DNS SERVER FAILURE problems remained.
Setup:
Client PCs (various OSs) --->2 x DNS Server (NT 4.0 SP6a) ----> FW (NAT) --->Internet
DNS Server is not forwarding queries, but resolving it itself.
Internal domain looks/checks out OK. Both Internal DNS servers produce identical results (Server Failure code back to the requesting PC)
Things I've tried to test/diagnose issues:
We have checked that these sites are functioning and up from several different outside DNS servers. This eliminates the sites being down. We are able to resolve 99% of all other domain queries without any problems
(and FAST).
There is no problem with AddressAnswerLimit since all clients (W2K, NT, Win9x, Redhat Linux 7.3) exhibit the same problem. MS KB Q164300.
MS KB Q295933 describes such an error when it receives an non-authoritative response.
MS KB Q159310 "Updated version of DNS fixes several problems." After installing the patch it did not fix the problem.
Send queries from an internal PC to an internal BIND DNS server running on RHN 7.3 (bind-9.2.1-9) Same server failures were observed.
Consulted MS KB Q186820, Q295611, Q251384,
In all cases Root hints were updated.
Example trace of such a failures (cleared of private info.)
(Ethereal trace
Internet Protocol, Src Addr: 192.168.X.YYY (192.168.X.YYY), Dst Addr:
192.168.Z. Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..0. = ECN-Capable Transport (ECT): 0
.... ...0 = ECN-CE: 0
Total Length: 72
Identification: 0x8b2a
Flags: 0x04
.1.. = Don't fragment: Set
..0. = More fragments: Not set
Fragment offset: 0
Time to live: 128
Protocol: TCP (0x06)
Header checksum: 0xeb15 (correct)
Source: 192.168.X.YYY (192.168.X.YYY)
Destination: 192.168.Z.Transmission Control Protocol, Src Port: domain (53), Dst Port: 1440 (1440),
Seq: 39785541, Ack: 694240, Len: 32
Source port: domain (53)
Destination port: 1440 (1440)
Sequence number: 39785541
Next sequence number: 39785573
Acknowledgement number: 694240
Header length: 20 bytes
Flags: 0x0018 (PSH, ACK)
0... .... = Congestion Window Reduced (CWR): Not set
.0.. .... = ECN-Echo: Not set
..0. .... = Urgent: Not set
...1 .... = Acknowledgment: Set
.... 1... = Push: Set
.... .0.. = Reset: Not set
.... ..0. = Syn: Not set
.... ...0 = Fin: Not set
Window size: 8728
Checksum: 0x5556 (correct)
Domain Name System (response)
Length: 30
Transaction ID: 0x090f
Flags: 0x8182 (Standard query response, Server failure)
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .0.. .... .... = Authoritative: Server is not an authority for
domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...1 .... .... = Recursion desired: Do query recursively
.... .... 1... .... = Recursion available: Server can do recursive
queries
.... .... ..0. .... = Answer authenticated: Answer/authority portion
was not authenticated by the server
.... .... .... 0010 = Reply code: Server failure (2)
Questions: 1
Answer RRs: 0
Authority RRs: 0
Additional RRs: 0
Queries
lb.msnbc.com: type ANY, class inet
Name: lb.msnbc.com
Type: Request for all records
Class: inet
David Nemeth
Systems Administrator
dnemeth@connectpositronic.com
Positronic Industries, Inc.