GenAquarius Careers AI Engineer- Text To Speech (TTS), voice cloning
Back To Careers
AI Engineer- Text To Speech (TTS), voice cloning
Location:
Noida, India
Remote Type:
Hybrid
Employment Type:
Permanent Full-Time
An ideal candidate would be someone who has:
- Developed and optimized text-to-speech models that achieved human-like voice synthesis, maintaining the unique style of voice actors across multiple languages.
- Implemented real-time processing solutions that reduced inference time to under 1 second, enhancing user interaction and experience.
- Managed large-scale datasets for voice cloning projects, ensuring high performance and reliability while supporting multilingual transcriptions.
Key Responsibilities
- Design, develop, and fine-tune deep learning models for voice synthesis (e.g., Text To Speech (TTS), voice cloning).
- Implement and optimize neural network architectures such as Tacotron, FastSpeech, WaveNet, or similar.
- Collect, preprocess, and augment speech datasets.
- Collaborate with product and engineering teams to integrate voice models into production systems.
- Perform evaluation and quality assurance of voice model outputs.
- Research and stay current on advancements in speech processing, audio generation, and machine learning.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.
- Strong experience with Python and machine learning libraries (e.g., PyTorch, TensorFlow).
- Hands-on experience with speech/audio processing and relevant toolkits (e.g., Librosa, ESPnet, Kaldi).
- Familiarity with voice model architectures (TTS, ASR, vocoders).
- Understanding of deep learning concepts and model training processes.
Preferred Qualifications
- Experience with deploying models to real-time applications or mobile devices.
- Knowledge of data labeling, voice dataset creation, and noise handling techniques.
- Experience with cloud-based AI/ML infrastructure (e.g., AWS, GCP).
- Contributions to open-source projects or published papers in speech/voice-related domains.
Text To Speech (TTS) experience in very important.