BIND/DNS - dig +trace = Bad Referral and Bad Horizontal Referral

7,960

Per @andrew-b's comment, this is usually due to a mismatch in delegation.

I came across this same error where a developer was attempting to do a +trace lookup of a record along the lines of host.subdomain.example.org. The exact cause will likely differ - but will probably be of a similar theme.

The cause in our case was that we have a firewall rule that captures-and-redirects* DNS lookups sent out to "unauthorised" servers. The request would instead reach our own DNS server which then performed a recursive lookup. The client would think it was sending each successive lookup to the Internet but these requests would actually be responded to by our internal server.

The fix was to remind the developer of the fact that DNS requests would be intercepted - and that they could do testing from a server that was whitelisted to bypass the DNS redirect rule.

See redacted error as the developer received it below:

tricky-desktop:~ tricky$ dig +trace host.subdomain.example.org

; <<>> DiG 9.8.3-P1 <<>> +trace host.subdomain.example.org
;; global options: +cmd
.           3600    IN  NS  g.root-servers.net.
.           3600    IN  NS  l.root-servers.net.
.           3600    IN  NS  j.root-servers.net.
.           3600    IN  NS  k.root-servers.net.
.           3600    IN  NS  b.root-servers.net.
.           3600    IN  NS  m.root-servers.net.
.           3600    IN  NS  d.root-servers.net.
.           3600    IN  NS  i.root-servers.net.
.           3600    IN  NS  e.root-servers.net.
.           3600    IN  NS  c.root-servers.net.
.           3600    IN  NS  h.root-servers.net.
.           3600    IN  NS  a.root-servers.net.
.           3600    IN  NS  f.root-servers.net.
;; Received 477 bytes from 192.168.1.2#53(192.168.1.2) in 87 ms

subdomain.example.org.  0   IN  NS  ns-outside-1.example.org.
subdomain.example.org.  0   IN  NS  ns-outside-2.example.org.
subdomain.example.org.  0   IN  NS  ns-outside-3.example.org.
subdomain.example.org.  0   IN  NS  ns-outside-4.example.org.
;; Received 295 bytes from 199.43.133.53#53(199.43.133.53) in 14 ms

subdomain.example.org.  0   IN  NS  ns-outside-2.example.org.
subdomain.example.org.  0   IN  NS  ns-outside-3.example.org.
subdomain.example.org.  0   IN  NS  ns-outside-4.example.org.
subdomain.example.org.  0   IN  NS  ns-outside-1.example.org.
;; BAD (HORIZONTAL) REFERRAL
;; Received 295 bytes from 199.43.135.53#53(199.43.135.53) in 5 ms

... 29 REPEATS REDACTED ...

subdomain.example.org.  0   IN  NS  ns-outside-4.example.org.
subdomain.example.org.  0   IN  NS  ns-outside-1.example.org.
subdomain.example.org.  0   IN  NS  ns-outside-2.example.org.
subdomain.example.org.  0   IN  NS  ns-outside-3.example.org.
;; BAD (HORIZONTAL) REFERRAL
dig: too many lookups
tricky-desktop:~ tricky$

The firewall rule was originally necessitated by BYOD staff not being able to look up private internal services due to "Smart DNS" services changing their DNS configuration.

Share:
7,960

Related videos on Youtube

user53029
Author by

user53029

Updated on September 18, 2022

Comments

  • user53029
    user53029 almost 2 years

    I have an interesting problem. I started noticing that when I do a dig +trace against one of the domains we are authoritative for, we get errors from our nameservers for "Bad Referral" and you can see where it forwarded the request back up the namespace tree instead of giving the answer. Unfortunately I am unable to reproduce this problem at the moment. However I can reproduce the Bad (HORIZONTAL) referral. Basically once the query is referred to our name server I see this:

    ;; BAD (HORIZONTAL) REFERRAL
    ;; Received 187 bytes from x.x.x.x#53(ns.example.com in 2 ms
    

    Sometimes this loops until I reach the "too many lookups" error and it just gives up, or it will stop and try our other server which then gets an answer. Here's the interesting part. If I was to perform a simple dig A lookup against the server that continuously failed the trace, I get an answer. If I then turn around and do the dig +trace against that same query again it never fails again. Its almost like records are being cached somewhere and after it expires, then you will start to see the message again. Can anyone help me figure out what is going on here? Here is info about our environment.

    1)RHEL 6 running BIND 9.8.2

    2)Public facing authoritative Master and Slave Server."

    3)Servers are setup in a "view" configuration. (dual-view - one for "internal" one for external)

    4)It seems like we just started to see these oddities after implementing views.

    5)Queries that hit the internal view are forwarded to the external view for zones not contained in the internal view. We use the loopback IP to achieve this.

    6) These authoritative servers are also setup to answer non-authoritative queries with recursion through the use of the recursion statement and a root "hint" zone.

    Here is our setup simplified.

    Master server:

    acl ext_trusted {
    x.x.x.x; // trusted net external
    }; // end of "ext_trusted" ACL definition.
    
    acl int_trusted {
    x.x.x.x; // trusted internal
    }; // end of ACL for "int_trusted"  
    
    
    options {
        directory "/var/named";
        recursive-clients 30000;
        tcp-clients 2000;
        check-names master ignore;
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named.stats.txt";
        memstatistics-file "/var/named/data/named_mem/stats.txt";
        zone-statistics yes;
        cleaning-interval 30;
    
    
    // Listen on ipv4: // Adding localhost for view forwarding
    listen-on port 53 { x.x.x.x; 127.0.0.1; };
    
    // And, also listen on ipv6:
    // loopback is required for view forwarding do not remove
    listen-on-v6 { ::1; x.x.x.x; };
    
    // Enforce the Customer DNS Query ACL Defined Above:
    allow-query { ext_trusted; int_trusted; };
    
    };
    
    
    key "internal" {
    algorithm HMAC-SHA512;
    secret "xxxxxxxxx";
    };
    
    key "external" {
    algorithm HMAC-SHA512;
    secret "xxxxxxxx";
    };
    
    view "internal" {
        match-clients { !key external; int_trusted; key internal; };
    
        //IP of slave server IPv4
        server x.x.x.x {
        keys { internal; };
    };
        //IP of slave server IPv6
        server x.x.x.x {
        keys { internal; };
    };
    
        also-notify { //slave servers go here
        x.x.x.x; x.x.x.x; 
    
    };
    
        allow-transfer { key internal; local_ns; int_ns; };
        empty-zones-enable no;
        server fe80::/16 { bogus yes; };
        server 0.0.0.0/8 {bogus yes; };
    
        zone "example.org" {
        type master;
        file "db.eamplein.org";
        allow-query { int_trusted; };
    };
    
        forwarders {
        // forward to external view //
        127.0.0.1; ::1; 
    };
    
        forward only;
    
    };
    
    view "external" {
      match-clients { any; };
    
     //IP of slave server IPv4
      server x.x.x.x {
      keys { external; };
    };
      //IP of slave IPv6
      server x.x.x.x {
      keys { external; };
    };
    
    also-notify { //IP address of slave server
       x.x.x.x; x.x.x.x;
    };
    
    allow-transfer { key external; ext_ns; local_ns; };
    server fe80::/16 { bogus yes; };
    server 0.0.0.0/8 {bogus yes; };
    empty-zones-enable yes;
    recursion yes;
    allow-recursion { any; };
    
    zone "." {
         type hint;
         file "/var/named/named.ca";
    };
    
    
    zone "example.org" {
        type master;
        file "db.eampleout.org";
        allow-query { any; };
    };
    
    zone "example.com" {
        type master;
        file "db.eample.com";
        allow-query { any; };
    };
    
    };
    

    Slave server config:

    acl ext_trusted {
    x.x.x.x; // trusted net external
    }; // end of "ext_trusted" ACL definition.
    
    acl int_trusted {
    x.x.x.x; // trusted internal
    }; // end of ACL for "int_trusted"  
    
    options {
        directory "/var/named/slaves";
        recursive-clients 30000;
        tcp-clients 2000;
        check-names master ignore;
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt"; 
        cleaning-interval 30;
    
    // Listen on ipv4:
    // Change this to the proper IP address if you ever switch back!
    // loopback is required for view forwarding do not remove
    listen-on port 53 { 127.0.0.1; x.,x.x.x;; };
    
    // And, also listen on ipv6:
    
    // Configure ipv6 before uncommenting this:
    // loopback is required for view forwarding do not remove
    listen-on-v6 port 53 { ::1; x,x.x.x; ;
    
    // Enforce the "trusted" ACL defined at the begining of this config file: 
    allow-query { ext_trusted; int_trusted; };
    
    };
    
    
    key "internal" {
    algorithm HMAC-SHA512;
    secret "xxxxxxxxx";
    };
    
    key "external" {
    algorithm HMAC-SHA512;
    secret "xxxxxxxx";
    };
    
    view "internal" {
        match-clients { !key external; int_trusted; key internal; };
    
        //IPv4 of master server
        server x.x.x.x {
        keys { internal; };
    };
        // IPv6
        server x.x.x.x. {
        keys { internal; };
    };
        allow-transfer { key internal; local_ns; int_ns; };
        transfer-source x.x.x.x.; 
        empty-zones-enable no;
        server fe80::/16 { bogus yes; };
        server 0.0.0.0/8 {bogus yes; };
    
        zone "example.org" {
        type slave;
        file "db.example.org";
        masters { x.x.x.x; x.x.x.x; };
        allow-query { int_trusted; };
    };
    
        forwarders {
        // forward to external view // 
        127.0.0.1; ::1; 
    };
    
        forward only;
    };
    
    view "external" {
      match-clients { any; };
    
     //IP of master server
     server x.x.x.x {
     keys { external; };
    };
     //IPv6
     server x.x.x.x. {
     keys { external; }; 
    };
    
    allow-transfer { key external; ext_ns; local_ns; };
    transfer-source x.x.x.x; 
    server fe80::/16 { bogus yes; };
    server 0.0.0.0/8 {bogus yes; };
    empty-zones-enable yes;
    
    recursion yes;
    allow-recursion { any; };
    
    zone "." {
        type hint;
        file "/var/named/named.ca";
    };
    
    zone "example.org" {
        type slave;
        file "db.exampleout.org";
        masters { x.x.x.x; x.x.x.x; };
        allow-query { any; };
    };
    
    zone "example.com" {
        type master;
        file "db.example.com";
        allow-query { any; }; 
    };
    
    };
    

    UPDATE: just a quick note that I have noticed that a dig +trace coming from an IP in the acl for the internal view would never fail doing a dig +trace to a zone within the internal view. This only seems to fail when you dig +trace zones in the external view from an IP pointed to the internal view.

    • HBruijn
      HBruijn almost 8 years
      I think I have seen BAD (HORIZONTAL) REFERRAL errors when a subdomain uk.example.com was delegated to different name servers with NS records but the example.com zone still had other records for uk.example.com (A or MX records for instance) when those records are only supposed to be in the uk.example.com zone.
    • Andrew B
      Andrew B almost 8 years
      You will need to share the name of your domain. The inconsistent results are often due to glue record mismatches in the NS or A(AAA) records. A horizontal referral typically indicates that a nameserver is attempting to re-delegate a name it is receiving a referral for. (which is illegal)
    • user53029
      user53029 almost 8 years
      @HBruijn thats a good point but wouldn't we see that without DNS views? Our zones have not changed in that regards since we started using views. This worked without issue before we implemented views. But I will look back over some of the zone files that are failing just for good measure.
    • user53029
      user53029 almost 8 years
      @AndrewB it would not do any good. This problem only manifests itself from within our internal network. Externally, everything works fine.