comp.protocols.tcp-ip.domains FAQ - Section 6
PROBLEMS


Question 6.1. No address for root server

Date: Wed Jan 14 12:15:54 EST 1998
Q: I've been getting the following messages lately from bind-4.9.2..
        ns_req: no address for root server
 
We are behind a firewall and have the following for our named.cache file -
 
        ; list of servers
        .               99999999    IN  NS  POBOX.FOOBAR.COM.
                        99999999    IN  NS  FOOHOST.FOOBAR.COM.
        foobar.com.     99999999    IN  NS  pobox.foobar.com.

You can't do that. Your nameserver contacts POBOX.FOOBAR.COM, gets the correct list of root servers from it, then tries again and fails because of your firewall.

You will need a 'forwarder' definition, to ensure that all requests are forwarded to a host which can penetrate the firewall. And it is unwise to put phony data into 'named.cache'.

Q: We are getting logging information in the form:

Apr  8 08:05:22 gute named[107]: sysquery: no addrs found for root NS
                (A.ROOT-SERVERS.NET)
Apr  8 08:05:22 gute named[107]: sysquery: no addrs found for root NS
                (B.ROOT-SERVERS.NET)
Apr  8 08:05:22 gute named[107]: sysquery: no addrs found for root NS
                (C.ROOT-SERVERS.NET)  
...

We are running bind 4.9.5PL1 Our system IS NOT behind a firewall.  Any ideas ?

This was discussed on the mailing list in November of 1996. The short answer was to ignore it as it was not a problem. That being said, you should upgrade to a newer version at this time if you are running a non-current version :-)

Question 6.2. Error - No Root Nameservers for Class XX

Date: Sun Nov 27 23:32:41 EST 1994
Q: I've received errors before about "No root nameservers for class XX"
   but they've been because of network connectivity problems.
   I believe that Class 1 is Internet Class data.
   And I think I heard someone say that Class 4 is Hesiod??
   Does anyone know what the various Class numbers are?

From RFC 1700:
       DOMAIN NAME SYSTEM PARAMETERS
       The Internet Domain Naming System (DOMAIN) includes several
       parameters.  These are documented in [RFC1034] and [RFC1035].  The
       CLASS parameter is listed here.  The per CLASS parameters are 
       defined in separate RFCs as indicated.

       Domain System Parameters:

       Decimal   Name                                          References
       --------  ----                                          ----------
              0  Reserved                                           [PM1]
              1  Internet (IN)                              [RFC1034,PM1]
              2  Unassigned                                         [PM1]
              3  Chaos (CH)                                         [PM1]
              4  Hesoid (HS)                                       [PM1]
        5-65534  Unassigned                                         [PM1]
          65535  Reserved                                           [PM1]

DNS information for RFC 1700 was taken from

ftp.isi.edu : /in-notes/iana/assignments/dns-parameters

Hesiod is class 4, and there are no official root nameservers for class 4, so you can safely declare yourself one if you like. You might want to put up a packet filter so that no one outside your network is capable of making Hesiod queries of your machines, if you define yourself to be a root nameserver for class 4.

Question 6.3. Bind 4.9.x and MX querying?

Date: Sun Nov 27 23:32:41 EST 1994

If you query a 4.9.x DNS server for MX records, a list of the MX records as well as a list of the authorative nameservers is returned. This happens because bind 4.9.2 returns the list of nameserver that are authorative for a domain in the response packet, along with their IP addresses in the additional section.

Question 6.4. Do I need to define an A record for localhost ?

Date: Sat Sep 9 00:36:01 EDT 1995

Somewhere deep in the BOG (BIND Operations Guide) that came with 4.9.3 (section 5.4.3), it says that you define this yourself (if need be) in the same zone files as your "real" IP addresses for your domain. Quoting the BOG:


                                 ... As  implied by this PTR
         record, there should be a  ``localhost.my.dom.ain''
         A  record  (with address 127.0.0.1) in every domain
         that contains hosts.  ``localhost.'' will lose  its
         trailing dot when 1.0.0.127.in-addr.arpa is queried
         for;...

The sample files in the BIND distribution show you what needs to be done (see the BOG).

Some HP boxen (especially those running HP OpenView) will also need "loopback" defined with this IP address. You may set it as a CNAME record pointing to the "localhost." record.

Question 6.5. MX records, CNAMES and A records for MX targets

Date: Wed Jun 16 22:09:03 EDT 1999

The O'Reilly "DNS and Bind" book warns against using non-canonical names in MX records, however, this warning is given in the context of mail hubs that MX to each other for backup purposes. How does this apply to mail spokes. RFC 974 has a similar warning, but where is it specifically prohibited to us an alias in an MX record ?

Without the restrictions in the RFC, a MTA must request the A records for every MX listed to determine if it is in the MX list then reduce the list. This introduces many more lookups than would other wise be required. If you are behind a 1200 bps link YOU DON'T WANT TO DO THIS. The addresses associated with CNAMES are not passed as additional data so you will force additional traffic to result even if you are running a caching server locally.

There is also the problem of how does the MTA find all of it's IP addresses. This is not straight forward. You have to be able to do this is you allow CNAMEs (or extra A's) as MX targets.

The letter of the law is that an MX record should point to an A record.

There is no "real" reason to use CNAMEs for MX targets or separate As for nameservers any more. CNAMEs for services other than mail should be used because there is no specified method for locating the desired server yet.

People don't care what the names of MX targets are. They're invisible to the process anyway. If you have mail for "mary" redirected to "sue" is totally irrelevant. Having CNAMEs as the targets of MX's just needlessly complicates things, and is more work for the resolver.

Having separate A's for nameservers like "ns.your.domain" is pointless too, since again nobody cares what the name of your nameserver is, since that too is invisible to the process. If you move your nameserver from "mary.your.domain" to "sue.your.domain" nobody need care except you and your parent domain administrator (and the InterNIC). Even less so for mail servers, since only you are affected.

Q: Given the example - 

     hello in cname     realname
     mailx in mx        0 hello

   Now, while reading the operating manual of bind it clearly states
   that this is *not* valid.  These two statements clearly contradict
   each other.  Is there some later RFC than 974 that overrides what is
   said in there with respect to MX and CNAMEs?  Anyone have the
   reference handy?

A: This isn't what the BOG says at all.  See below.  You can have a CNAME 
   that points to some other RR type; in fact, all CNAMEs have to point
   to other names (Canonical ones, hence the C in CNAME).  What you
   can't have is an MX that points to a CNAME.  MX RR's that point to
   names which have only CNAME RR's will not work in many cases, and
   RFC 974 intimates that it's a bad idea:

      Note that the algorithm to delete irrelevant RRs breaks if LOCAL has
      a alias and the alias is listed in the MX records for REMOTE.  (E.g.
      REMOTE has an MX of ALIAS, where ALIAS has a CNAME of LOCAL).  This
      can be avoided if aliases are never used in the data section of MX
      RRs.

   Here's the relevant BOG snippet:

         aliases    {ttl   addr-class   CNAME   Canonical name
         ucbmonet           IN           CNAME   monet

         The  Canonical  Name resource record, CNAME, speci-
         fies an alias or  nickname  for  the  official,  or
         canonical,  host  name.   This record should be the
         only one associated with the alias name.  All other
         resource  records  should  be associated  with the
         canonical  name,  not  with  the   nickname.  Any
         resource  records  that  include  a  domain name as
         their value (e.g., NS or MX) must list the  canoni-
         cal name, not the nickname.

This issue seems to go on and on and is discussed from time to time and does not seem to be listed as a clear-cut rule in an RFC. John Navas contributed the following section related to various RFCs on the point:
>it is a bug in their setup.. MX or domains cannot point to cnames but to
>the real name..

Are you sure?  RFC 974 states ("Issuing a Query"):

   There is one other special case.  If the response contains an answer
   which is a CNAME RR, it indicates that REMOTE is actually an alias
   for some other domain name. The query should be repeated with the
   canonical domain name.

That seems to clearly indicate that MX records can point to CNAME records.

RFC 1034 (3.6.2) suggests avoiding MX indirection:

   Domain names in RRs which point at another name should always point at
   the primary name and not the alias.  This avoids extra indirections in
   accessing information.

but does not prohibit it:

   Of course, by the robustness principle, domain software should
   not fail when presented with CNAME chains or loops; CNAME chains
   should be followed ...

Then there's RFC 1912, which states (2.4):

   Don't use CNAMEs in combination with RRs which point to other names
   like MX, CNAME, PTR and NS.  (PTR is an exception if you want to
   implement classless in-addr delegation.)  For example, this is
   strongly discouraged:

           podunk.xx.      IN      MX      mailhost
           mailhost        IN      CNAME   mary
           mary            IN      A       1.2.3.4

   [RFC 1034] in section 3.6.2 says this should not be done, and

That's "should not" not "cannot".

   [RFC 974] explicitly states that MX records shall not point to an alias
   defined by a CNAME.

But it doesn't, as noted above; in fact, just the opposite.

Finally, there's RFC 2181 (10.3):

   The domain name used as the value of a NS resource record, or part of
   the value of a MX resource record must not be an alias.  Not only is
   the specification clear on this point, but using an alias in either
   of these positions neither works as well as might be hoped, nor well
   fulfills the ambition that may have led to this approach.  This
   domain name must have as its value one or more address records.
   Currently those will be A records, however in the future other record
   types giving addressing information may be acceptable.  It can also
   have other RRs, but never a CNAME RR.

The problem is that the "specification" is NOT "clear on this point" as
noted above.  RFC 2181 goes on to state:

   Searching for either NS or MX records causes "additional section
   processing" in which address records associated with the value of the
   record sought are appended to the answer.  This helps avoid needless
   extra queries that are easily anticipated when the first was made.

   Additional section processing does not include CNAME records, let
   alone the address records that may be associated with the canonical
   name derived from the alias.  Thus, if an alias is used as the value
   of an NS or MX record, no address will be returned with the NS or MX
   value.  This can cause extra queries, and extra network burden, on
   every query.  It is trivial for the DNS administrator to avoid this
   by resolving the alias and placing the canonical name directly in the
   affected record just once when it is updated or installed.

This suggests that this is an issue of "goodness" (avoiding the extra
lookup) rather than a real error. Continuing:

   In some particular hard cases the lack of the additional section
   address records in the results of a NS lookup can cause the
   request to fail.

That would not seem to be the case here.
 
To be clear, even though this does not appear to be an error per se, I
would still recommend not using CNAME in an MX record given that many
(most?) people will probably take RFC 2181 at face value with the result
that some implementations may fail to properly resolve MX queries that
return a CNAME.

Question 6.6. Can an NS record point to a CNAME ?

Date: Wed Mar 1 11:14:10 EST 1995

Can I do this ? Is it legal ?


   @                       SOA     (.........)
                           NS      ns.host.this.domain.
                           NS      second.host.another.domain.
   ns                      CNAME   third
   third           IN      A       xxx.xxx.xxx.xxx

No. Only one RR type is allowed to refer, in its data field, to a CNAME, and that's CNAME itself. So CNAMEs can refer to CNAMEs but NSs and MXs cannot.

BIND 4.9.3 (Beta11 and later) explicitly syslogs this case rather than simply failing as pre-4.9 servers did. Here's a current example:

      Dec  7 00:52:18 gw named[17561]: "foobar.com IN NS" \
             points to a CNAME (foobar.foobar.com)

Here is the reason why:

Nameservers are not required to include CNAME records in the Additional Info section returned after a query. It's partly an implementation decision and partly a part of the spec. The algorithm described in RFC 1034 (pp24,25; info also in RFC 1035, section 3.3.11, p 18) says 'Put whatever addresses are available into the additional section, using glue RRs [if necessary]'. Since NS records are speced to contain only primary names of hosts, not CNAMEs, then there's no reason for algorithm to mention them. If, on the other hand, it's decided to allow CNAMEs in NS records (and indeed in other records) then there's no reason that CNAME records might not be included along with A records. The Additional Info section is intended for any information that might be useful but which isn't strictly the answer to the DNS query processed. It's an implementation decision in as much as some servers used to follow CNAMEs in NS references.

Question 6.7. Nameserver forgets own A record

Date: Fri Dec 2 16:17:31 EST 1994
Q: Lately, I've been having trouble with named 4.9.2 and 4.9.3.  
   Periodically, the nameserver will seem to "forget" its own A record,
   although the other information stays intact.  One theory I had was
   that somehow a site that the nameserver was secondary for was
   "corrupting" the A record somehow.
 
A: This is invariably due to not removing ALL of the cached zones
   when you moved to 4.9.X. Remove ALL cached zones and restart
   your nameservers.
 
   You get "ignoreds" because the primaries for the relevant zones are
   running old versions of BIND which pass out more glue than is
   required. named-xfer trims off this extra glue.

Question 6.8. General problems (core dumps !)

Date: Sun Dec 4 22:21:22 EST 1994

Paul Vixie says:

   I'm always interested in hearing about cases where BIND dumps core.
   However, I need a stack trace.   Compile with -g and not -O (unless
   you are using gcc and know what you are doing) and then when it
   dumps core, get into dbx or gdb using the executable and the core
   file and use "bt" to get a stack trace.   Send it to me
   <paul@vix.com> along with specific circumstances leading to or
   surrounding the crash (test data, tail of the debug log, tail of the
   syslog... whatever matters) and ideally you should save your core
   dump for a day or so in case I have questions you can answer via
   gdb/dbx.

Question 6.9. malloc and DECstations

Date: Mon Jan 2 14:19:22 EST 1995

We have replaced malloc on our DECstations with a malloc that is more compact in memory usage, and this helped the operation of bind a lot. The source is now available for anonymous ftp from

ftp.cs.wisc.edu : /pub/misc/malloc.tar.gz

Question 6.10. Can't resolve names without a "."

(Answer written by Mark Andrews) You are not using a RFC 1535 aware resolver. Depending upon the age of your resolver you could try adding a search directive to resolv.conf.
	e.g.
	domain <domain>
	search <domain> [<domain2> ...]

If that doesn't work you can configure you server to serve the parent and grandparent domains as this is the default search list.

"domain langley.af.mil" has an implicit "search langley.af.mil af.mil mil" in the old resolvers, and you are timing out trying to resolve the address with one of these domains tacked on.

When resolving internic.net the following will be tried in order.

        internic.net.langley.af.mil
        internic.net.af.mil
        internic.net.mil
        internic.net.

RFC 1535 aware resolvers try qualified address first.
        internic.net.
        internic.net.langley.af.mil
        internic.net.af.mil
        internic.net.mil

RFC 1535 documents the problems associated with the old search algorithim, including security issues, and how to alleviate some of the problems.

Question 6.11. Why does swapping kill BIND ?

Date: Thu Jul 4 23:20:20 EDT 1996

The question was:

   I've been diagnosing a problem with BIND 4.9.x (where x is usually 3BETA9 
   or 3REL) for several months now.  I finally tracked it down to swap space
   utilization on the unix boxes.

   This happens under (at least) under Linux 1.2.9 & 1.2.13, SunOS 4.1.3U1, 
   4.1.1, and Solaris 2.5.  The symptom is that if these machines get into 
   swap at all bind quits resolving most, if not all queries.  Mind you that 
   these machines are not "swapping hard", but rather we're talking about a 
   several hundred K TEMPORARY deficiency.   I have noticed while digging 
   through various archives that there is some referral to "bind thrashing
   itself to death".   Is this what is happening ?

And the answer is:
   Yes it is. Bind can't tolerate having even a few pages swapped out.  
   The time required to send responses climbs to several seconds/request,
   and the request queue fills and overflows.

   It's possible to shrink memory consumption a lot by undefining STATS
   and XSTATS, and recompiling.  You could nuke DEBUG too, which will
   cut the code size down some, but probably not the data size.  If that
   doesn't do the job then it sounds like you'll need to move DNS onto a
   separate box.

   BIND tends to touch all of its resident pages all of the time with
   normal activity... if you look at the RSS verses the total process
   size, you will always see the RSS within, usually, 90% of the total
   size of the process.  This means that *any* paging of named-owned
   pages will stall named.  Thus, a machine running a heavily accessed
   named process cannot afford to swap *at all*.

   (Paul Vixie continues on this subject):
   I plan to try to get BIND to exhibit slightly better locality of
   reference in some future release.  Of course, I can only do this if
   the query names also exhibit some kind of hot spots.  If someone
   queries all your names often, BIND will have to touch all of its VM
   pool that often.  (Right now, BIND touches everything pretty often
   even if you're just hammering on some hot spots -- that's the part
   I'd like to fix.  Malloc isn't cooperating.)

Question 6.12. Resource limits warning in system

Date: Sun Feb 15 22:04:43 EST 1998

When bind-8.1.1 is started the following informational message appears in the syslog...

   Feb 13 14:19:35 ns1named[1986]:
       "cannot set resource limits on this system"

What does this mean ?

A: It means that BIND doesn't know how to implement the "coresize", "datasize", "stacksize", or "files" process limits on your OS.

If you're not using these options, you may ignore the message.

Question 6.13. ERROR:ns_forw: query...learnt

Date: Sun Feb 15 23:08:06 EST 1998

The following message appears in syslog:

   Jan 22 21:59:55 server1 named[21386]: ns_forw: query(testval) contains
        our address (dns1.foobar.org:1.2.3.4) learnt (A=:NS=)

what does it mean ?

A: This means that when it was looking up the NS records for the domain
containing "testval" (i.e. the root domain), it found an NS record
pointing to dns1.foobar.org, and the A record for this is 1.2.3.4.
This is server1's own IP address, but it's not authoritative for the
root domain.  The (A-:NS=) part of the message means that it didn't
learn these NS records from any other machine.

You may have listed dns1.foobar.org in your root server cache
file, even though it's not configured as a root server.  


\question 09jul:linuxq ERROR:recvfrom: Connection refused

Date: Wed Jul  9 21:57:40 EDT 1997

DNS on my linux system is reporting the error 

\verbatim
Mar 26 12:11:20 idg named[45]: recvfrom: Connection refused 

When I start or restart the named program I get no errors. What could be causing this ?

A: Are you running the BETA9 version of bind 4.9.3 ? It is a bug that does no harm and the error reporting was corrected in later releases. You should upgrade to a newer version of bind.

Question 6.14. ERROR:zone has trailing dot

Date: Wed Jul 9 22:11:51 EDT 1997

If syslog reports "zone has trailing dot", the zone information contains a trailing dot in the named.boot file where it does not belong.


   example:
   secondary  domain.com.         xxx.xxx.xxx.xxx    S-domain.com
                        ^

Question 6.15. ERROR:Zone declared more then once

Date: Wed Jul 9 22:12:45 EDT 1997

If syslog reports "Zone declared more then once",

A zone is specified multiple times in the named.boot file

   example:
   secondary  domain.com         198.247.225.251    S-domain.com
   secondary  zone.com           198.247.225.251    S-zone.com
   primary    domain.com         P-domain.com

   domain.com is declared twice, once as primary, and once as secondary

Question 6.16. ERROR:response from unexpected source

Date: Wed Jul 9 22:12:45 EDT 1997

If syslog reports "response from unexpected source", BIND (pre 4.9.3) has a bug if implimented on a multi homed server. This error indicates that the response to a query came from an address other then the one sent to. So, if ace gets a response from an unexpected source, ace will ignore the response.

Question 6.17. ERROR:record too short from [zone name]

Date: Mon Jun 15 21:34:49 EDT 1998

If syslog report "record too short from [zone name]", The secondary server is trying to pull a zone from the primary server. For some reason, the primary sent an incomplete zone. This usually is a problem at the primary server.

   To troubleshoot, try this:

   dig [zonename] axfr @[primary IP address]

   Often, this is caused by a line broken in the middle.

When the primary server's "named.boot" file contains "xfrnets" entries for other servers and the secondary is not listed, this error can occur. Creating an "xfrnets" entry for the secondary will solve the error.

Question 6.18. ERROR:sysquery: findns error (3)

Date: Wed Jul 9 22:17:09 EDT 1997

If syslog reports "sysquery: findns error (3)" or "qserial_query(zonename): sysquery FAILED", there is no ns record for the zone. or the NS record is not defined correctly.

Question 6.19. ERROR:Err/TO getting serial# for XXX

Date: Wed Jul 9 22:18:41 EDT 1997

If syslog reports "Err/TO getting serial# for XXX", there could be a number of possible errors:

   - An incorrect IP address in named.boot,
   - A network reachibility problem,
   - The primary is lame for the zone.

An external check to see if you can retrieve the SOA is the best way to work out which it is.

Question 6.20. ERROR:zonename IN NS points to a CNAME

Date: Wed Jul 9 22:20:29 EDT 1997

If syslog reports "zonename IN NS points to a CNAME" or "zonename IN MX points to a CNAME", named is 'reminding' you that due to various RFCs, an NS or MX record cannot point to a CNAME.

   EXAMPLE 1
   ---------
   domain.com    IN SOA      (...stuff...)
                 IN NS       ns.domain.com.
   ns            IN CNAME    machine.domain.com.
   machine       IN A        1.2.3.4

   The IN NS record points to ns, which is a CNAME for machine.  This
   is what results in the above error

   EXAMPLE 2
   ---------
   domain.com    IN SOA      (...stuff...)
                 IN MX       mail.domain.com.
   mail          IN CNAME    machine.domain.com.
   machine       IN A        1.2.3.4

   This would cause the MX variety of the error.

   The fix is point MX and NS records to a machine that is defined explicitly
   by an IN A record.

Question 6.21. ERROR:Masters for secondary zone [XX] unreachable

Date: Wed Jul 9 22:24:27 EDT 1997

If syslog reports "Masters for secondary zone [XX] unreachable", the initial attempts to load a zone failed, and the name server is still trying. If this occurs multiple times, a problem exists, likely on the primary server. This is a fairly generic error, and could indicate a vast number of problems. It might be that named is not running on the primary server, or they do not have the correct zone file. If this keeps up long enough a zone might expire.

Question 6.22. ERROR:secondary zone [XX] expired

Date: Wed Jul 9 22:25:53 EDT 1997

If syslog reports "secondary zone [XX] expired", there has been a expiration of a secondary zone on this server.

An expired zone is one in which a transfer hasn't successfully been completed in the amount of time specified before a zone expires.

This problem could be anything which prevents a zone transfer: The primary server is down, named isn't running on the primary, named.boot has the wrong IP address, etc.

Question 6.23. ERROR:bad response to SOA query from [address]

Date: Wed Jan 14 12:15:11 EST 1998

If syslog reports "bad response to SOA query from [address], zone [name]", a syntax error may exist in the SOA record of the zone your server is attempting to pull.

It may also indicate that the primary server is lame, possibly due to a syntax error somewhere in the zone file.

Question 6.24. ERROR:premature EOF, fetching [zone]

Date: Wed Jul 9 22:28:26 EDT 1997

If syslog reports "premature EOF, fetching [zone]", a syntax error exists on the zone at the primary location, likely towards the End of File (EOF) location.

Question 6.25. ERROR:Zone [XX] SOA serial# rcvd from [Y] is < ours

Date: Wed Jul 9 22:30:03 EDT 1997

If syslog reports "Zone [name] SOA serial# rcvd from [address] is < ours", the zone transfer failed because the primary machine has a lower serial number in the SOA record than the one on file on this server.

Question 6.26. ERROR:connect(IP/address) for zone [XX] failed

Date: Wed Jan 14 12:21:40 EST 1998

If syslog reports "connect(address) for zone [name] failed: No route to host" or "connect(address) for zone [name] failed: Connection timed out", it could be that there is no route to the specified host or a slow primary system. Try a traceroute to the address specified to isolate the problem. The problem may be a mistyped IP address in named.boot.

A very slow primary machine or a connection may have been initialized, then connectivity lost for some reason, etc. Try networking troubleshooting tools like ping and traceroute, then try connecting to port 53 using nslookup or dig.

If syslog reports "connect(address) for zone [name] failed: Connection refused", the destination address is not allowing the connection. Either the destination is not running DNS (port 53), or possibly filtering the connection from you. It is also possible that the named.boot is pointing to the wrong address.

Question 6.27. ERROR:sysquery: no addrs found for NS

Date: Wed Jul 9 22:37:01 EDT 1997

If syslog reports "sysquery: no addrs found for NS" , the IN NS record may be pointing to a host with no IN A record.

Question 6.28. ERROR:zone [name] rejected due to errors

Date: Wed Jul 9 22:37:51 EDT 1997

If syslog reports "primary zone [name] rejected due to errors", there will likely be another more descriptive error along with this, like "zonefile: line 17: database format error". That zone file should be investigated for errors.


Next: ACKNOWLEDGEMENTS.
Back: CONFIGURATION.
Return to contents.

Chris Peckham - 16 June 1999

Extracted from comp.protocols.tcp-ip.domains Frequently Asked Questions, Copyright 1999.