So assuming you are just sending voice you can get away with 8-bit sampling at 8kHz or so. Better quality will likely require external ADCs and get 16-bit 16-20kHz or something similar. Assuming the low figures, you need a processor capable of encrypting/decrypting data at roughly 62 kilobits/second and sending/receiving this data, as well as the ADC/DAC process.
The big question is why is this being encrypted? If it's basically just for the hell of it, you could use a fast-ish AVR or PIC type processor and run a simplified stream cypher on it - I don't think? you could run full TEA or RC4 on something like that, but you could do an approximation.
The simplest microcontroller I'm aware of that has hardware encryption support is the Atmel SAM7XC - quite a step up from 8-bit controllers but not impossible for the casual user to deal with since it doesn't need external memory etc... They have hardware AES and 3-DES blocks, and support DMA so you could probably encrypt CD-quality audio and manage to do some other stuff at the same time on one of those.