AI Engineer- Text To Speech (TTS), voice cloning

Location:

Noida, India

Remote Type:

Hybrid

Employment Type:

Permanent Full-Time

An ideal candidate would be someone who has:

Developed and optimized text-to-speech models that achieved human-like voice synthesis, maintaining the unique style of voice actors across multiple languages.
Implemented real-time processing solutions that reduced inference time to under 1 second, enhancing user interaction and experience.
Managed large-scale datasets for voice cloning projects, ensuring high performance and reliability while supporting multilingual transcriptions.

Key Responsibilities

Design, develop, and fine-tune deep learning models for voice synthesis (e.g., Text To Speech (TTS), voice cloning).
Implement and optimize neural network architectures such as Tacotron, FastSpeech, WaveNet, or similar.
Collect, preprocess, and augment speech datasets.
Collaborate with product and engineering teams to integrate voice models into production systems.
Perform evaluation and quality assurance of voice model outputs.
Research and stay current on advancements in speech processing, audio generation, and machine learning.

Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.
Strong experience with Python and machine learning libraries (e.g., PyTorch, TensorFlow).
Hands-on experience with speech/audio processing and relevant toolkits (e.g., Librosa, ESPnet, Kaldi).
Familiarity with voice model architectures (TTS, ASR, vocoders).
Understanding of deep learning concepts and model training processes.

Experience with deploying models to real-time applications or mobile devices.
Knowledge of data labeling, voice dataset creation, and noise handling techniques.
Experience with cloud-based AI/ML infrastructure (e.g., AWS, GCP).
Contributions to open-source projects or published papers in speech/voice-related domains.

Text To Speech (TTS) experience in very important.