Why is BIND giving me a SERVFAIL in this case? (Notes inside)

10,183

You probably ran foul of the "EXPIRE" field in the SOA record - from §3.3.13 of RFC 1035:

EXPIRE          A 32 bit time value that specifies the upper limit on
                the time interval that can elapse before the zone is no
                longer authoritative.

This field tells a secondary server how long to serve a zone for if the master is no longer responding.

When you changed the zone file on "ns3" did you also reconfigure BIND so that the zone was listed as a "master" rather that a "slave" ? If so, it's that change rather than the change to the first SOA field that actually fixed it.

Share:
10,183

Related videos on Youtube

imaginative
Author by

imaginative

Updated on September 17, 2022

Comments

  • imaginative
    imaginative almost 2 years

    Woke up this morning to a bunch of the following:

    root@foo:/etc/bind# dig @1.2.3.4 foo.example.com
    
    ; <<>> DiG 9.6.1-P2 <<>> @1.2.3.4 foo.example.com
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36121
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
    
    ;; QUESTION SECTION:
    ;;foo.example.com.   IN A
    
    ;; Query time: 0 msec
    ;; SERVER: 1.2.3.4#53(1.2.3.4)
    ;; WHEN: Thu Apr  1 09:57:59 2010
    ;; MSG SIZE  rcvd: 31
    

    Some background on the fictitious "1.2.3.4". It's a slave name server in my nameserver "farm". Technically I have ns1 (being the master) and ns2/ns3. Currently ns1/ns2 are down for maintenance, so I left ns3 at it serving live traffic. That's the point, DNS is supposed to be resilient.

    Now the odd part is, "1.2.3.4" was serving requests for example.com just fine for the last 4-5 days. This morning I get a phone call that it's non-responsive. After investigation I see the message you see above, SERVFAIL.

    I looked into the zone file and saw the following:

    example.com               IN SOA  ns1.example.com. hostmaster.mail.example.com. (
    

    I wondered if at this point that the nameserver thought it was not authoritative over example.com and adjusted it to the following:

    example.com               IN SOA  ns3.example.com. hostmaster.mail.example.com. (
    

    After that, it started responding again for all authoritative queries for example.com. I have no idea why. I thought these things were supposed to be normalized upon zone transfer from ns1 -> ns3?

    Can someone please example why this happened and how to prevent it from happening in the future? I've never had a similar problem, and because I don't understand it well, I might be missing some critical information in this question. So please let me know if I can further add any detail to make things clearer as well.

    One more thing to note: I have other domains that I'm authoritative for that have their SOA still saying ns1.example.com. and not ns3.example.com. Those domains are serving requests just fine! Is it a matter of time before they stop also and I have to change SOA to ns3.example.com? Is this also only required because ns1 and ns2 are currently offline?

    • bortzmeyer
      bortzmeyer over 14 years
      The fact that it is authoritative or not has nothing to with being master or slave. Both are authoritative.
  • imaginative
    imaginative over 14 years
    Expire makes sense, I'm seeing the following in my syslog: Apr 1 09:02:53 dr0 named[6831]: zone example.com/IN: expired However, I didn't change it to become 'master'. It's still a slave and started responding again, so that is leaving me confused.
  • Alnitak
    Alnitak over 14 years
    I think that what happened is that when you restarted BIND it forgot that the "master" wasn't there any more, and continued serving the zone from the local copy. Chances are that in seven days it'll stop again...