CSUS supposedly holds games nights every so often. At which some labs are commandeered and the PCs converted into games machines. This involves putting Windows 2000 on them, along with a bunch of network games.
The machines don't have CD drives (being lab machines) and due to that fact and the fact that it's easier anyway they are installed over the network. Since all the machines are the same, and will contain the same software instead of actually doing an install an image of a previous install is just copied to the hard drive.
This takes a lot of time. The current games image is about 6GB. Say a lab has 20 machines, That makes 120GB to be transferred (unless multicast or broadcast is used). Over 100mb ethernet that takes an aweful long time.
Chris came up with a scheme where machines are booted with a linux boot floppy which prompts for the location of an image server from which they retrieve a massive great tar file. They are then rebooted with anotehr linux boot floopy, which turns them into an image server. The idea being that the switched network structure allows 2 machines to transfer data without affecting data transfer between other machines (which gets around the 120GB from one server problem).
I instead tried using multicast tftp, but as you would expect around these parts was told not to, because the switches (or something) around here don't like multicast.
There is a broadcast image transfer system on the web somewhere, which someone will look into at some point hopefully - I didn't because I didn't realise that broadcast would be OK even though multicast isn't.
So I took Chris' idea and converted it to work with image slices instead of tar files and came up:
Sam's CSUS Imager
It consists of a bunch of components.
- The master server - a perl script
- The client (image retriever) - C code
- The server (image sender) - C code
- Fake server (image sender) - C code
The client and server are C, because they end up on a linux boot disk, and squeezing perl onto it wasn't worth the effort. The other two components run elsewhere and hence squeezing them on a floppy isn't an issue.
The Process
The master server waits for connections.
Servers connect to the master server and tell it what port they are listening on, and how many clients they wish to server at once.
Clients connect to the master server and wait for the master server to tell them the address and port of a server to use
The fake server acts like a server to bootstrap the process
In our setup the network and disk speeds mean that the fake server serves two clients at a time, and the servers serve three clients at a time. This is because the disks are slower than the network...
Once a server has finished serving the image to a set of clients it reconnects to the master server and starts all over again.
Once a client has finished recieving the image from a server it checks some MD5 sums and if they correct becomes a server, otherwise it reconnects to the master server as a client to try again.
The master server maintains a server queue and a client queue, both of which are FIFO. When there are enough clients in the client queue for the first server the master server tells the clients to use the server and removes the server and clients from the queues.
Its quite simple, and means that a single boot floppy can be used to boot a large number of PCs who will become clients and wait for images, and after retrieving an image will then act as servers.
Retrieving an image takes around 20 minutes, this means with the fake server continually serving 2 PCs every 20 minutes, and installed PC's serving 3 PCs every 20 minutes we have the following progression of installed PCs:
- Time 0 : 0 PC's imaged
- Time 20 : 2 PC's imaged
- Time 40 : 10 PC's imaged
- Time 60 : 42 PC's imaged
- Time 80 : 170 PC's imaged
- Time 100 : 682 PC's imaged
Which in practical terms means it takes about an hour to install a lab or two. If however some PCs can be preinstalled the time can be greatly reduced.
Anyway, that's all boring enough, if you've read through all of that you might want to see the code...