首页 \ 问答 \ Javascript - 具有音高和持续时间控制的文本到语音(Javascript - text to speech with pitch and duration control)

Javascript - 具有音高和持续时间控制的文本到语音(Javascript - text to speech with pitch and duration control)

我一直在寻找让我的javascript程序唱歌。

我首先看了网络语音api,但音调控制看起来非常有限,所以我想也许有办法将结果发送到网络音频节点,并从那里应用效果,但似乎不可能。

我找到了mespeak.js库: http//www.masswerk.at/mespeak/

它允许返回我将视为音频节点源的音频缓冲区,从而实现更多控制。

我的输入是音符序列,具有频率和持续时间。 就像是 :

var seq = [[440hz,1000ms],[880hz,500ms],...];

我设法从这个序列和一系列单词到我的程序,用不同频率的节奏说出这些单词

但是我遇到了一些问题。

  • Web音频失谐链接到playbackRate,因此投球将改变单词持续时间。
  • 我似乎只能使用detune或playbackRate值进行播放,这不允许我输入频率(ex 440)并将结果保存在该频率中。 我唯一能做的就是近似语音的基频,计算与预期音符和音高的差异,但效果不是很好。
  • 我对如何处理持续时间毫无头绪。 playbackRate是人为改变单词持续时间的唯一方法吗? 如何强迫我的话语延伸到特定的持续时间?

如果你们有任何经验,我会很感激任何投入。

非常感谢

编辑:添加一些代码

function sing(text,note,duration){
  var buffer = meSpeak.speak(text,{rawdata:'default'});
  playSound(buffer,freqToCents(note),duration)
}

function freqToCents(freq){
  var root = 440 //no idea what is the base frequency of the speech generator
  return 3986*Math.log10(freq/440)
}

function playSound(streamBuffer, cents, duration, callback) { 
  var source = context.createBufferSource();
  source.connect(compressor);

  context.decodeAudioData(streamBuffer, function(audioData) { 
    var duration = audioData.duration; 
    var delay = (duration)? Math.ceil(duration * 1000) : 1000;
    setTimeout(callback, delay);
    source.buffer = audioData;
    source.detune.value = cents; 

    source.start(0);
  }, function(error) { }); 
}

我的音序器正在工作,并且在每个步骤中,如果需要,调用唱歌功能,例如:

sing('test', 440, 1000)

正如我所说,我希望持续时间参数影响结果


I've been looking at making my javascript program sing.

I first looked at the web speech api, but the pitch control seems very limited, so I thought maybe there is a way to send the result to a web audio node, and apply effects from there but it doesn't seems possible.

I found the mespeak.js library: http://www.masswerk.at/mespeak/

It allow to return a audio buffer that I'll treat as a source of my audio nodes, allowing for more control.

My input is a notes sequence, with a frequency and duration. Something like :

var seq = [[440hz,1000ms],[880hz,500ms],...];

I managed to get from this sequence and a series of words to my program saying those words in rhythm with different frequencies

But I'm having a few problems.

  • The web audio detune is linked to the playbackRate, so pitching will change word duration.
  • I only seem to be able to play with the detune or playbackRate values, which does not allow me to input a frequency (ex 440) and have the result in that frequency. The only thing I can do is approximate the base frequency of my speech, compute the difference to the expected note and pitch accordingly, but the result is not very good.
  • I am clueless about how to deal with duration. Is the playbackRate the only way of artificially changing the duration of a word? How can I force my word to spread to a specific duration?

If any of you had any experience with that sort of stuff I'd appreciate any input.

Thanks a lot

EDIT: add some code

function sing(text,note,duration){
  var buffer = meSpeak.speak(text,{rawdata:'default'});
  playSound(buffer,freqToCents(note),duration)
}

function freqToCents(freq){
  var root = 440 //no idea what is the base frequency of the speech generator
  return 3986*Math.log10(freq/440)
}

function playSound(streamBuffer, cents, duration, callback) { 
  var source = context.createBufferSource();
  source.connect(compressor);

  context.decodeAudioData(streamBuffer, function(audioData) { 
    var duration = audioData.duration; 
    var delay = (duration)? Math.ceil(duration * 1000) : 1000;
    setTimeout(callback, delay);
    source.buffer = audioData;
    source.detune.value = cents; 

    source.start(0);
  }, function(error) { }); 
}

My sequencer is working, and at each step, calls the sings function if necessary, for example like this:

sing('test', 440, 1000)

As I was saying, I'd like the duration parameter to impact the result


原文:https://stackoverflow.com/questions/36891851
更新时间:2019-12-07 11:12

最满意答案

Espeak 支持SSML模式 ,您需要使用它来修改参数而不是尝试后处理结果。

您需要先使用espeak,然后尝试在javascript端口中重现相同的结果。 它尚不支持,但在mespeak.js的这部分中

  '-w', 'wav.wav',
      '-a', (typeof args.amplitude !== 'undefined')? String(args.amplitude) : (typeof args.a !== 'undefined')? String(args.a) : '1
      '-g', (typeof args.wordgap !== 'undefined')? String(args.wordgap) : (typeof args.g !== 'undefined')? String(args.g) : '0',
      '-p', (typeof args.pitch !== 'undefined')? String(args.pitch) : (typeof args.p !== 'undefined')? String(args.p) : '50',
      '-s', (typeof args.speed !== 'undefined')? String(args.speed) : (typeof args.s !== 'undefined')? String(args.s) : '175',

您需要添加-m选项以启用SSML。


Espeak supports SSML mode, you need to use it to modify parameters instead of trying to postprocess results.

You need to play with espeak first and then try to reproduce same results in javascript port. It is not supported yet, but in this part in mespeak.js

  '-w', 'wav.wav',
      '-a', (typeof args.amplitude !== 'undefined')? String(args.amplitude) : (typeof args.a !== 'undefined')? String(args.a) : '1
      '-g', (typeof args.wordgap !== 'undefined')? String(args.wordgap) : (typeof args.g !== 'undefined')? String(args.g) : '0',
      '-p', (typeof args.pitch !== 'undefined')? String(args.pitch) : (typeof args.p !== 'undefined')? String(args.p) : '50',
      '-s', (typeof args.speed !== 'undefined')? String(args.speed) : (typeof args.s !== 'undefined')? String(args.s) : '175',

You need to add -m option to enable SSML.

2016-04-28

相关文章

更多

最新问答

更多
  • 这个listArray是如何填充的?(How is this listArray populated?)
  • iOS 7上的CTSubscriber(以及如何使用它)是什么?(What's CTSubscriber (and how to use it) on iOS 7?)
  • 手动创建VisualStudio 2012项目文件(Manually creating VisualStudio 2012 project file)
  • 删除不适用于JSP中使用for循环的每个id(Deletion not working for every id using for loop in JSP)
  • 如何从std :: filesystem :: path中删除引号(How to remove quotation marks from std::filesystem::path)
  • 验证多个控制器方法的URL路径(Validate URL path for several controller methods)
  • 如何在datarow []中的列中找到最大值?(How to find max value in a column in a datarow[] ?)
  • 如何使用预定义文本替换来自数据库的部分结果(How do I replace part of result coming from Database with predefined text)
  • Selenium Java注入了新的Javascript函数(Selenium Java inject new Javascript function)
  • 使用.on的多个下拉菜单选择文本仅适用于第一个下拉列表(Multiple Dropdowns Menu Selection text using .on works only on first dropdown)
  • 快速将黄土曲线添加到大型数据集图中的方法(Quick way to add loess curve to large data set graph)
  • FilteringSelect in mvc(FilteringSelect in mvc)
  • 在Delphi XE2中开发Mac或iOS应用程序需要哪些硬件/软件?(What hardware/software is necessary to develop Mac or iOS apps in Delphi XE2?)
  • 在原型的构造函数中初始化属性时获取“未定义”(Getting 'undefined' when a property is initialized in the constructor of a prototype)
  • 通过越狱加载的应用程序的Documents文件夹位置(Location of Documents folder for an app loaded via jailbreak)
  • 在OpenGL中使用可编程和固定管道功能(Using both programmable and fixed pipeline functionality in OpenGL)
  • 将任何用户输入重定向到单独的底层程序(redirect any user input to a separate underlying program)
  • 编辑文本不能正常工作android(Edit texts not working properly android)
  • “user_denied”Facebook应用页面上的Facebook用户区域设置(Facebook user locale on “user_denied” facebook app page)
  • 在大图像中找到小的部分透明图像的坐标(find coordinates of small partially-transparent image within a large image)
  • 我如何在cakephp 3.1中获得完整的相对路径?(How i can get full relative path of image in cakephp 3.1?)
  • 如何保存拖动标记的新本地化?(How to save new localization of dragged marker?)
  • MySQL UPDATE vs INSERT和DELETE(MySQL UPDATE vs INSERT and DELETE)
  • 在执行查询之前,在SQLAlchemy模型中将datetime转换为unix时间戳?(Convert datetime to unix timestamp in SQLAlchemy model before executing query?)
  • OpenCL与OpenGL互操作的优势(Advantage of OpenCL interoperability with OpenGL)
  • 如何解析用点和等分隔的数据然后添加到listview(How to parsing data from delimited with dot and equal then add to listview)
  • 带调试输出的X3解析器段错误(BOOST_SPIRIT_X3_DEBUG)(X3 parser segfaults with debug output (BOOST_SPIRIT_X3_DEBUG))
  • 将文件夹名称添加到fgrep结果(Add folder name to fgrep result)
  • 在MySQL中加载一个表是非常慢的(Loading one table in MySQL is ridiculously slow)
  • 如何将JSON放入PHP变量?(How do I put JSON into a PHP Variable?)