Aikido, programming, system administration, and other things I find interesting

DNS debugging and monitoring

(based on Damas & Kerr lecture at RIPE64)

DNS has a very limited number of error codes, but mostly it just returns ServFail.

Do not use nslookup for debugging, because it doesn’t report what it is trying to do. Use host if all you want is a simple query.

Dig is part of bind9, packaged for most distros (dnsutils in Debian or Ubuntu). It is quite configurable, you can turn section of the answer on or off. Packet size is important to check, particularly if you’re working in DNSSEC – the new packets often don’t fit into UDP, so they collide with legacy firewall settings which often only open UDP for DNS.

  • With +trace it simulates a recursive resolver.
  • With +multiline it gives you a commented version of stuff, particularly SOA record.
  • +vc is equivalent to +tcp (it’s a VAX thing) – but watch it, +notcp does not disable TCP if it is needed to get an answer.
  • +nssearch will check all nameservers for a given zone – a good way to find if the zones are not in sync.
  • +short is useful for scripting, because it just gives you the answer.

Drill is a lot like dig, now part of ldns, the big difference is that drill supports DNSSEC, and it checks signatures. A common problem is that fragmented UDP packets are dropped by many firewalls. If you limit the buffer size to the MTU, but it stops with an increased packet size, then someting in your networking is interfering, probably your firewall.

named-checkzone takes a zone file, and it can prevent a lot of silly mistakes, like detecting errors in your zone generations process. The advantage is that you don’t have to reload the server in order to check a zone. Output can be rather verbose, you can turn checks on and off. is an online service, but also downloadable. Runs a whole number of checks on your DNS zone, very strict. another online checker. Not downloadable, and it has a different set of checks than

instant analysis:

dnstop (readymade) like top but for dns queries instead of processes. enables you to determine what is loading your nameserver. It uses tcpdump in the background, so you can specify filters for it, like ignore specific servers, or ignore specific domains. May require root or have setuid set, and it will put your ethernet in promiscous mode.

libpcap/tcpdump/wireshark libpcap is the underlying library, and you can use all your tools to write your own analysis stuff.

 dnscap is a bit like tcpdump, but focused on DNS. Understands bith IPv4 and IPv6, captures UDP, TCP, and IP fragments. Collects only queries, responses, or both. It can rotate your file, and trigger an upload script when it is finished writing it.

ncap is a substitute for pcam, it’s a library, not a tool. File formats are not compatible with pcap, and has hooks for your own modules.

Passive DNS

Collect DNS information as it enters or leaves a nameserver. no active DNS role, the tool is called dnslogger (also ISC SIE built on top of it).

For long term analysis, DSC, or DNS2DB.

DSC is DNS statistic collector, does a real good job of collecting and presenting statistics for your DNS, based on packet capture, available for free download from OARC. If load is a concern, capture traffic using a passive tap. You can also capture the traffic and send it to another server for publication. Use logical grouping when you have multiple nodes.

DNS2DB takes capture data, and puts it into your SQL database. produced by IIS (.se). Includes basic GUI to look at your data. Uses Adobe flex to run the front end.


Online service available at

Shows measurements of DNS servers, (generated by RIPE Test traffic) data is available with a slight delay to avoid black hat attacks. Subscribers can get paid-for immediate access. DNSmon is mostly a toold for discovering if you have any problems at all. You can get raw data to help diagnose what your problems were.

fpdns Рfingerprint DNS servers. Currently hosted on github.

On the internet at large, about 0.3% of your DNS queries will just disappear, regardless fo how good your service is.

Presentation slides (with full links in the last slide) are at the ripe64 archive.

Related Posts

Why is my munin slow and how to speed it up

At $work we are monitoring a network of hundreds of servers, and that means that we end up recording hundreds of thousands of variable values every five minutes. After a while, the server started slowing down, taking more than 300 seconds to collect the data. Since it has a whole-system lock, that means the next […]

Read More

A munin plugin to monitor each CPU core separately

Monitoring each core separately may seem like a waste – after all, we have an overall CPU usage already available under “system” in munin, isn’t that enough? It turns out that it isn’t. Sometimes, when using top on a multicore/multicpu machine, you can see a process pegged at 100%, while other processes are comfortably using […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *