Issues with PXEBoot using dhcpd

Date November 29, 2010

I've only been working on this literally all day, so I thought I'd open it up for discussion. I posted it to the LOPSA tech list, but I thought I'd try here, too.

I'm building a machine that can be reinstalled automagically using a combination of PXEboot, kickstart, and magic. I've already done this once, and it worked great. I copied the exact configuration files, installed the same services, and basically tried my best to port the process to another network, and I'm failing miserably.

I've got 2 VMware ESXi guests on the same vSwitch, one of which is the server, running CentOS 5.5, ISC dhcpd 3.0.5-RedHat, and has tftpd started in xinetd. The client has no OS installed, and is configured to boot with PXE.

My dhcpd config file is as follows:

ddns-update-style interim;

subnet 10.x.1.0 netmask 255.255.255.0 {

      option routers          10.x.1.1;
      option subnet-mask      255.255.255.0;
      option domain-name      "mydomain";
      option domain-name-servers      10.x.1.43;
      option time-offset      -18000;
      range dynamic-bootp     10.x.1.95 10.x.1.96;
      default-lease-time      21600;
      max-lease-time          43200;

      group {
              next-server 10.x.1.91;
              filename "pxelinux.0";

              host ops1tp {
                      hardware ethernet 00:0c:29:2d:ea:5a;
                      fixed-address   10.x.1.94;
              }
      }
}

When the client boots, I immediately get:

Network boot from Intel E1000
Copyright (C) 2003-2008 VMware, Inc.
Copyright (C) 1997-2000 Intel Corporation

CLIENT MAC ADDR: 00 0C 29 2D EA 5A GUID: 564D48EF-5B1F-A4A3-C0A6-2493F02DEA5A
DHCP...|

The pipe at the end of the DHCP line is a spinner, and the dots slowly
increase in number while the spinner goes.

At the same time, on the server, I get the following log entries in /var/log/messages:

Nov 29 16:29:55 kickstart-host dhcpd: DHCPDISCOVER from
00:0c:29:2d:ea:5a via eth0
Nov 29 16:29:55 kickstart-host dhcpd: DHCPOFFER on 10.x.1.94 to
00:0c:29:2d:ea:5a via eth0

Then 2 seconds later, I get these entries:

Nov 29 16:29:57 kickstart-host dhcpd: DHCPREQUEST for 10.x.1.94
(10.x.1.91) from 00:0c:29:2d:ea:5a via eth0
Nov 29 16:29:57 kickstart-host dhcpd: DHCPACK on 10.x.1.94 to
00:0c:29:2d:ea:5a via eth0

Those 4 lines cycle a total of 4 times, after which, the client
console replaces the last "DHCP..." line with:

CLIENT IP: 10.x.1.94 MASK: 255.255.255.0 DHCP IP: 10.x.1.91
PXE-E55: ProxyDHCP service did not reply to request on port 4011.

PXE-M0F: Exiting Intel PXE ROM.
Operating System not found

Obviously, the server is seeing the request. Since the client eventually knows which IP it's supposed to have, it's receiving the DHCPOFFER. The problem appears to be that something in my DHCP configuration is making it expect a PXE server (listening on UDP port 4011) on the server (presumably 10.x.1.91, which is indeed the kickstart server).

The oddity is that the configuration is identical to the configuration that I had at the other site.

I'm pretty stuck at this point. Any advice you'd be willing to offer would be welcome.

  • Pingback: Tweets that mention Issues with PXEBoot using dhcpd | Standalone Sysadmin -- Topsy.com

  • Derek

    The magic beans I have in my dhcpd.conf are:

    allow booting; allow bootp; class "pxeclients" { match if substring(option vendor-class-identifier, 0, 9) = "PXEClient"; next-server 10.16.XX.YY; filename "linux-install/pxelinux.0"; }

    and then (as separate clauses) I have all the host declarations with their static IP addresses.

  • katre

    Have you tried running tcpdump/wireshark/what-have-you on the dhcp server? Make sure it's sending what you think. If you can afford to re-install the previous working system, maybe redo that with tcpdump, too, just to see what a good system looks like.

  • http://saintaardvarkthecarpeted.com/blog Saint Aardvark

    Any chance you've got different cards/PXE ROM versions that are behaving differently/

  • http://www.paulparadise.com/ Paul Paradise

    Try adding the following statements to the group block:


    option bootfile-name "pxelinux.0";
    server-name "10.x.1.91";

  • http://www.vanginderachter.be Serge van Ginderachter

    I think you have something missing which makes your client think the setup uses a separate proxy DHCP service to serve boot file and boot server ip, whilst runnig on the same server as a separate services (in which case the procy dhcp server would run on 4011 to avoid conflict on 67).

    I can't seem to figure out why though, your config seems just fine as far as I can tell.

    Tcpdump all needed packets and try tracing the whole conversation with full dhcp protocol analysis...

    - Can you try putting next-server and filename options outside of that group?
    - Check your config if there is something you are not tellng us :-)

  • Ben C

    Nothing stands out immediately, but if you catch me online tomorrow, I can send you snippets from our config. We do this for all ~2500 cluster nodes plus our infrastructure machines. Maybe with a little back and forth we can figure it out.

  • Claire

    The magic I have you don't is


    allow bootp;
    allow booting;

    and then, later


    group tftpboot {
    filename "/pxelinux.0";
    next-server tftp.server.tld;

    host sample {
    hardware ethernet XX:XX:XX:XX:XX:XX;
    fixed-address 123.123.123.123;
    }

  • ivar

    I have seen PXE boot failures, which appeared after ethernet switch got changed and disappeared when old switch was put back. If You have another make/model handy, give it a try.

  • natxo

    Can you fetch pxelinux.0 with a tftp client from your workstation? If yes, try some of the suggestions given before, otherwise, you may have a firewall issue. I know I have to allow tftp traffic to my fedora laptop when I want to use a tftp client from the laptop or using it as a tftp server to get router/switches configs.

  • http://blogs.ncl.ac.uk/paul.haldane Paul Haldane

    Following up on the tcpdump/wireshark idea ... my instinct in situations where a client machine isn't behaving as I expect is to monitor/analyse the network traffic at the client end. That way you get to see what the client sees even if it's not coming from the server that you expect. There might be reasons that you're sure another server couldn't be involved but it's always a good idea to verify this assumption.

  • Andrew

    I would simplify first and set the following:

    ddns-update-style none;
    authoritative;

    Verify your config:

    /usr/sbin/dhcpd3 -c -f /etc/dhcp3/dhcpd.conf

    For debugging DHCP problems I find dhcpdump is useful (it's just a wrapper around tcpdump).

    Hope this helps.

  • http://www.happysysadm.com/ Carlo

    These are the parameters I have in my dhcpd.conf:

    ddns-update-style interim;
    authoritative;

    subnet x.x.x.x netmask 255.255.255.0 {
    option broadcast-address x.x.x.254;
    option routers x.x.x.1;
    option domain-name-servers x.x.x.10;
    option domain-name "domain.com";
    allow bootp;
    allow booting;
    }

    and then in the group block :

    # PXE boot
    next-server tftpserver.com; # name of your TFTP server
    filename "pxelinux.0"; # name of the bootloader program

    ... and it works perfectly. Your syntax looks ok (apparently) so try to add the missing parameters we suggested you and if it still does not work then probably you have a firewall dropping some communication. What protocols/ports did you allow?

    Good luck!

    Carlo

  • http://www.happysysadm.com/ Carlo

    I misread your post. Your server and clients are on the same local vswitch so discard my suggestion of a firewall blocking...
    Just try the other suggestions and keep us informed.
    Carlo

  • Anthony

    You dhcpd configuration looks fine assuming that the numbers are all correct and the paths to files accurate etc.

    I would focus on what IS working and then start from there. The dhcp server is seeing the request from the client and responding with the appropriate IP address. So the systems are seeing and talking to each other.

    In my experience the error messages you get on the screen don't always match what is actually going wrong. I'm inclined to think that the ProxyDHCP error is a red-herring or at least miss-leading.

    I think the best bet at this point is as others have said to dig out wireshark and look at what exactly is getting sent to the client - and what the client is actually seeing. It's entirely possible that both sides are doing what they are supposed to and the problem is either a 3rd party interfering or something incorrect or failing later in the process. Wireshark is going to be the easiest way to test that.