• About
  • Author
  • Journal
  • Mood Board
  • Contact
June 16, 2020 by Patrick Wheeler

Voice: Deep Faking It

Voice: Deep Faking It
June 16, 2020 by Patrick Wheeler

Bring in the clones

 

Two central and related questions I will be addressing as part of this VR Experience are:

1) How one can move a typically non-diegetic element (the narrator) to a diegetic element in a VR.
2) Whether one can effectively employ mechanisms such as voice cloning / text to speech APIs for the voice of the Storyteller (and perhaps other characters).

The advantage of such an approach (Voice Cloning / Text to Speech) would be that the story script can be adapted, and new audio generated dynamically without the need for re-recording voice talent. Together with AI elements (voice interaction by the user and responses / actions / intents) decoupling these elements from the core environment built (in the case of this project in Unity means the story and AI can enhanced over time without having to rebuild and rerelease the experience as a whole.

With this in mind, I have evaluated a number of AI based voice Cloning solutions and Text to Speech services. Assessing their support for Irish accents through existing “standard” voice models, Standard “Irish” voice model Support for SSML, support for custom voice creation, and ease of custom voice creation, and SDK / API access.

Why would you use standard voice models?

Whilst building custom voices is desirable, support for Irish voice models (even though there is of course no single “Irish” accent) via standard voice models is very important given multiple characters (beyond the narrator) that need to be included in the VR Experience I am building.

 

SSML

Another important factor is support for SSML (Speech Synthesis Markup Language). SSML allows for customization of a voice model including emphasis on specific words and sentences, pauses, pitch etc. Without this ability, the results are rather monotone.

 

Ease-of-Use

Whilst ease-of-use may not seem like the most significant concern, I believe it is when examining the practicalities of using cloned voices in a VR game or Experience. Furthermore the focus of this project is not primarily concerned with programmatic voice model generation of investigating voice-cloning systems in-depth. Where any voice cloning / text to speech service is targeting game and multimedia design and development agencies, the lack of a simple interface to allow voice talent (the voices to be cloned) to record samples is a major flaw in the process. Therefore, I think IBM, Amazon, and MS have work to do here.

Resemble.ai provide a simple interface for custom voice creation online, albeit rather buggy in its current state. Batches of audio samples were not saved by the system, and the system kept demanding more and more audio samples (over 200 were recorded via the system) where 50 should suffice. Additionally, the generated custom voice needed some clean-up and the results are a little disappointing.

 

SAMPLE RESULTS

The generated voice, when compared to the original, sounds like a slightly depressed drunk robot. This may be due to the lack of support for SSML with custom voices. Or at least with the generated custom voice in question. This appears to be another issue in the system as SSML is supported in the editor interface, but the voice cannot be generated where emphasis tags are used.

Here is a sample of the REAL VOICE ACTOR:

https://remscela.com/wp-content/uploads/2020/06/realian.ogg

Here is the RESEMBLE.AI CLONED VOICE / system generated sample
(no SSML due to issues):

https://remscela.com/wp-content/uploads/2020/06/FakeIan.mp3

I hope the above results can be improved upon and the SSML issues fixed. Resemble.ai have a nice system but it does need improvement, and from my perspective, the pricing model needs rethinking quite drastically.


Based on my assessment to date, the Edinburgh base CercProc (members of their team are University of Edinburgh alumni) offer the best solution in terms of standard Irish voice support. I have yet to fully test voice creation. Furthermore, they offer access under an academic license.


Given the need to support a range of characters, I may use both MS Azure, Resemble.ai and CercProc generated voices, as well as voice actor recordings in the final project. This will also be useful for comparative evaluation.

Previous articleA river runs through itNext article The Morrígan

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

About The Blog

Nulla laoreet vestibulum turpis non finibus. Proin interdum a tortor sit amet mollis. Maecenas sollicitudin accumsan enim, ut aliquet risus.

Recent Posts

The Mórrigan 2.0August 9, 2020
A Senak PeakAugust 7, 2020
A tale of two magical bullsJuly 6, 2020

Categories

  • Background
  • Character Design
  • Environments
  • Implementation
  • Methodology
  • Motivations
  • People
  • Research
  • Storytelling
  • Uncategorized
  • Voice
  • VR Narrative

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Tags

3D AI Audio Background Character Design Characters Critique Diegetics Environment Design Folklore Game Design History Implementation Indoor Irish Myths & Legends Lighting Magic Maya Methodology Morrigan Motivations Mythology Narrative Particle Effects Phantom Queen Q&A Recordings Reflection Research Ronnie Drew Sorceress Speech Synthasis Storytelling Textures The Troubles Unity Voice Voice Acting Voice Cloning VR

 

Patrick Wheeler
Copyright 2020

 

patrick.wheeler @ setfire.to

About This Sidebar

You can quickly hide this sidebar by removing widgets from the Hidden Sidebar Settings.

Recent Posts

The Mórrigan 2.0August 9, 2020
A Senak PeakAugust 7, 2020
A tale of two magical bullsJuly 6, 2020

Categories

  • Background
  • Character Design
  • Environments
  • Implementation
  • Methodology
  • Motivations
  • People
  • Research
  • Storytelling
  • Uncategorized
  • Voice
  • VR Narrative

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org