Pogertt's suggestion works if you get to control the music yourself. You would use the source music as time code, and program robot movements to happen at certain time code markers. One kind of time code is "number of samples played;" other kinds are MIDI Time Code (MTC) and SMPTE time code.
If the music is not under your control, you need a microphone, and a beat detection filter. Beat detection can be done with fairly low computing power, and you don't need super high fidelity sound input (but it helps.) An ATmega running at 16 or 20 MHz could do this using one of its analog inputs, if you set it up to auto-trigger the ADC sampling, and coded your filter in 16-bit fixed point. (The Arduino libraries are too slow to support this.) The ATmega wouldn't really have the power to do anything else, though, so you'd need to send the detected beat out through some I/O pins or serial port to whatever motion sequencer you have.
There are a lot of links and research on beat detection or tempo detection from audio, because it turns out to be a tricky problem. The easiest is to build a fairly narrow notch filter with a detection frequency of 100 to 140 Hz or so. This will pick out the bass drum of most music, although you'll also get crosstalk with the bass instrument. Then run a peak detector or envelope follower on the signal that comes out, and trigger a "beat" when it has a sufficiently sharp "rise."