DNS SERVFAIL and Incorrect Flag only via TCP: Broken DNS Servers?
Yes, it's poor configuration and/or implementation - there's no reason for an authoritative server to return root referrals in an otherwise valid response.
Furthermore, I'm seeing other errors that simply shouldn't happen from those two Worldnic servers:
-
sometimes it gives the right answer, but with a
SERVFAIL
error code and without theAA
bit set. -
UDP replies are always truncated at 512 bytes, even with EDNS0 (RFC 2671) specified. This means that DNSSEC won't work with this name server
-
It's not just the
ADDITIONAL
section that's a problem, it's putting the root name servers in theAUTHORITY
section of an authoritative (AA
bit set) answe.
Related videos on Youtube
Comments
-
Rob Olmos almost 2 years
Is it poor configuration to return the root name servers in the additional section for a CNAME lookup that points to another domain? Particularly the one I'm seeing this with is a CNAME hosted by Network Solutions with the CNAME pointing to a different domain & TLD.
I ask if this is poor configuration because all these additional records result in exceeding the size of the UDP packet forcing the query to be re-done with TCP.
dig www.unitedstatesartists.org +trace
A name server response:
example.org. 86400 IN NS ns15.worldnic.com. example.org. 86400 IN NS ns16.worldnic.com. ;; Received 95 bytes from 199.249.120.1#53(b2.org.afilias-nst.org) in 79 ms ;; Warning: Message parser reports malformed message packet. ;; Truncated, retrying in TCP mode. www.example.org. 7200 IN CNAME load-01-123.us-west-1.elb.amazonaws.com. . 518400 IN NS a.root-servers.net. . 518400 IN NS b.root-servers.net. . 518400 IN NS c.root-servers.net. . 518400 IN NS d.root-servers.net. . 518400 IN NS e.root-servers.net. . 518400 IN NS f.root-servers.net. . 518400 IN NS g.root-servers.net. . 518400 IN NS h.root-servers.net. . 518400 IN NS i.root-servers.net. . 518400 IN NS j.root-servers.net. . 518400 IN NS k.root-servers.net. . 518400 IN NS l.root-servers.net. . 518400 IN NS m.root-servers.net. ;; Received 526 bytes from 205.178.190.8#53(ns15.worldnic.com) in 173 ms
Returning the additional records or not is random. Sometimes when they don't return the additional there's still a truncated response and dig retries in TCP.
example.org. 86400 IN NS ns15.worldnic.com. example.org. 86400 IN NS ns16.worldnic.com. ;; Received 95 bytes from 199.19.56.1#53(a0.org.afilias-nst.info) in 82 ms ;; Warning: Message parser reports malformed message packet. ;; Truncated, retrying in TCP mode. www.example.org. 7200 IN CNAME load-01-123.us-west-1.elb.amazonaws.com. ;; Received 107 bytes from 205.178.190.8#53(ns15.worldnic.com) in 164 ms
Update 2010-12-08
With more testing found:
- Network Solutions responds with a SERVFAIL (server failure) with a recursive query (dig's default if not +trace) yet still gives the correct answer.
- Setting dig's +norecurse works fine but not always. Sometimes a SERVFAIL is returned - Not good. Details of possibly why follows below
- Network Solutions' inclusion of the root servers in the authoritative and additional section causes the UDP truncation and requires TCP to complete.
Overview of the following capture:
- Non-recursive request record from ns15
- ns15 answer includes root servers in auth and additional and marks reply as truncated
- Non-recursive request is retried in TCP due to truncated UDP
- Similar answer from ns15 using TCP except "recursion desired" is incorrectly set and "server failure" code is also set
We've already created a ticket with them but we'll see if it goes anywhere. Follows is the DNS packets from tshark details earlier:
First question (via UDP):
Domain Name System (query) Transaction ID: 0x27ef Flags: 0x0000 (Standard query) 0... .... .... .... = Response: Message is a query .000 0... .... .... = Opcode: Standard query (0) .... ..0. .... .... = Truncated: Message is not truncated .... ...0 .... .... = Recursion desired: Don't do query recursively .... .... .0.. .... = Z: reserved (0) .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data is unacceptable
First answer (via UDP):
Domain Name System (response) [Request In: 1] [Time: 0.078623000 seconds] Transaction ID: 0x27ef Flags: 0x8600 (Standard query response, No error) 1... .... .... .... = Response: Message is a response .000 0... .... .... = Opcode: Standard query (0) .... .1.. .... .... = Authoritative: Server is an authority for domain .... ..1. .... .... = Truncated: Message is truncated .... ...0 .... .... = Recursion desired: Don't do query recursively .... .... 0... .... = Recursion available: Server can't do recursive queries .... .... .0.. .... = Z: reserved (0) .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server .... .... .... 0000 = Reply code: No error (0)
Second question (via TCP):
Domain Name System (query) Length: 56 Transaction ID: 0xbc37 Flags: 0x0000 (Standard query) 0... .... .... .... = Response: Message is a query .000 0... .... .... = Opcode: Standard query (0) .... ..0. .... .... = Truncated: Message is not truncated .... ...0 .... .... = Recursion desired: Don't do query recursively .... .... .0.. .... = Z: reserved (0) .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data is unacceptable
Second answer (via TCP, notice "recursion desire"):
Domain Name System (response) [Request In: 6] [Time: 0.147357000 seconds] Length: 107 Transaction ID: 0xbc37 Flags: 0x8102 (Standard query response, Server failure) 1... .... .... .... = Response: Message is a response .000 0... .... .... = Opcode: Standard query (0) .... .0.. .... .... = Authoritative: Server is not an authority for domain .... ..0. .... .... = Truncated: Message is not truncated .... ...1 .... .... = Recursion desired: Do query recursively .... .... 0... .... = Recursion available: Server can't do recursive queries .... .... .0.. .... = Z: reserved (0) .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server .... .... .... 0010 = Reply code: Server failure (2)
-
Alnitak over 13 yearsplease show the lookup details. It's unusual, but not necessarily poor configuration.
-
BestPractices over 11 yearsI realized this was about 2 years ago that you posted this question, but did you ever get a resolution to your problem? I'm having the same issue with Network Solutions.
-
Rob Olmos over 11 years@BestPractices Sorry the late reply and no unfortunately no resolution. We decided to instead run our own load balancer that doesn't need a CNAME or apex redirect since we couldn't move the DNS to Route 53.
-
Efren over 6 yearsPlease mark the answer if it answers the question.
-
Rob Olmos over 13 yearsThanks. Let me investigate this further and try bringing these issues to their attention (I doubt they'll care).
-
Rob Olmos over 13 yearsI did some more testing and detailed my findings in the updated question.