Real-time(ish) voice cloning for fun and profit


Nowadays, it’s relatively simple to clone a voice using an online TTS service like ElevenLabs. Find some high-quality audio of a celebrity, crop out the bad parts, upload it, and with the magic of the internet – you’re done. Using this for vishing is great and all, but obviously has it’s limitations. How do you handle push-back from a target? What if you need to go off script? What if their refrigerator is running?

My brother in christ, it’s the future! Luckily for us, a few different locally-run tools exist that we can chain together to change our voice in real(ish)-time for use during vishing engagements.

Obviously, this blog post is for educational purposes only. Only clone voices of people you have explicit permission from; like a contracted social-engineering engagement.

The requirements

There are two (three, kinda?) tools needed for this to work. One to train the AI model on your target company ceo’s voice, and one to change your own voice using that model. This post will not cover the installation, primarily because I’m lazy. Sorry dudes.

Training:

https://github.com/Mangio621/Mangio-RVC-Fork/releases

Voice changing stuffs:

https://huggingface.co/wok000/vcclient000/tree/main

The third tool is for a usable audio interface for sending the voice output back into another program, like Zoiper or Zoom or whatever. For this, I used voicemeeter.

Training the model