Skip to content
Back To Careers

AI Engineer- Text To Speech (TTS), voice cloning

Location:

Noida, India

Remote Type:

Hybrid

Employment Type:

Permanent Full-Time

An ideal candidate would be someone who has:

  • Developed and optimized text-to-speech models that achieved human-like voice synthesis, maintaining the unique style of voice actors across multiple languages.
  • Implemented real-time processing solutions that reduced inference time to under 1 second, enhancing user interaction and experience.
  • Managed large-scale datasets for voice cloning projects, ensuring high performance and reliability while supporting multilingual transcriptions.

Key Responsibilities

  • Design, develop, and fine-tune deep learning models for voice synthesis (e.g., Text To Speech (TTS), voice cloning).
  • Implement and optimize neural network architectures such as Tacotron, FastSpeech, WaveNet, or similar.
  • Collect, preprocess, and augment speech datasets.
  • Collaborate with product and engineering teams to integrate voice models into production systems.
  • Perform evaluation and quality assurance of voice model outputs.
  • Research and stay current on advancements in speech processing, audio generation, and machine learning.

Required Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.
  • Strong experience with Python and machine learning libraries (e.g., PyTorch, TensorFlow).
  • Hands-on experience with speech/audio processing and relevant toolkits (e.g., Librosa, ESPnet, Kaldi).
  • Familiarity with voice model architectures (TTS, ASR, vocoders).
  • Understanding of deep learning concepts and model training processes.

Preferred Qualifications

  • Experience with deploying models to real-time applications or mobile devices.
  • Knowledge of data labeling, voice dataset creation, and noise handling techniques.
  • Experience with cloud-based AI/ML infrastructure (e.g., AWS, GCP).
  • Contributions to open-source projects or published papers in speech/voice-related domains.

Text To Speech (TTS) experience in very important.

I’m interested