Transformation ============== .. currentmodule:: pyvoimooo .. function:: pvoccombo(in, fs, [ps], [ps_max], [fw], [pe], [wm_gain], [efx_smile_alpha], [efx_smile_alpha_high_shelf], [efx_inteligibility_scaling], [efx_inteligibility_e0db], [efx_inteligibility_e0db_autobias], [mode], [specs], [ampenvs]) The Phase Vocoder with effect Combinations. The transformation engine that we advise to use in priority. It uses a Pitch adaptive Overlap-Add (POLA) process. :arg in: The input wavform to extract the frames from. :type in: array :arg fs: The sampling rate. :type fs: int, Hz :arg ps: Pitch scaling coefficient (def. 1.0). WARNING: This should not go above :attr:`ps_max` (see below). Expect artefacts otherwise. :type ps: float, optional :arg ps_max: Maximum pitch scaling value. WARNING: The higher the value, the bigger the internal processing windows. Depending on the sound you process, if these windows are too big, reverberations effect might be heard. (def. 2.0) :type ps_max: float, optional :arg dpss: Time stamped pitch scaling coefficients. This has priority over :attr:`ps`. It must have two dimensions, with shapes [2,N], where the first row [0,:] are the time instants [s] of the pitch scaling coefficients of the second row [1,:]. WARNING: This cannot be a list of arrays. The time instants must be in ascending order. The pitch scaling coefficients used at each frame is then linearly interpolated between the given neighbor values (constant extrapolation is used for any time instant outside of the given time range). :type dpss: 2D array, optional :arg psmv_mean_cent: Pitch the mean on a scale in cents (def. no one applied). The scaling is made on a scale in cents. This takes priority over :attr:`ps` and :attr:`dpss`. Please consider warnings of :attr:`ps`. :type psmv_mean_cent: float, cent, optional :arg psmv_var_coef: Set pitch variance scaling (def. no one applied). The scaling is made using a median value on a linear scale in cents. This takes priority over :attr:`ps` and :attr:`dpss`. Please consider warnings of :attr:`ps`. :type psmv_var_coef: float, coefficent>=0, optional :arg psmv_forcemean: Force the mean value for the variance scaling. See :attr:`psmv_var_coef` :type psmv_forcemean: float, Hz, optional :arg pst: Set pitch target (def. none). This takes priority over :attr:`ps` and :attr:`dpss`. Please consider warnings of :attr:`ps`. :type pst: float, Hz, optional :arg fw: Frequency warping coefficient (def. 1.0). This warps the amplitude spectral envelope of the spectrum (push the smooth shapes higher (with fw>1.0), or lower (with fw<1.0)). :type fw: float, optional :arg pe: When a pitch scaling is applied, preserve the sectral envelope as is (def. true) :type pe: bool, optional :arg wm_gain: Gain of the visual watermarking (def. -12dB) :type wm_gain: float, dB, optional :arg efx_smile_alpha: Alpha parameter of the smile effect. The bigger the value, the more smily should be the effect (def. 1.0, in [1.0,+inf) ) :type efx_smile_alpha: float, optional :arg efx_smile_alpha_high_shelf: Activate or deactivate the extra high-shelf of the Smile effect (def. true) :type efx_smile_alpha_high_shelf: bool, optional :arg efx_inteligibility_scaling: Size effect of the Intelligibility effect, 0.0 means no effect, 1.0 is maximum effect (def. 0.0, in [0.0,1.0] ) :type efx_inteligibility_scaling: float, optional :arg efx_inteligibility_e0db_autobias: Bias for the automatic audio level correction of the Intelligibility effect (should not be changed) (def. +10) :type efx_inteligibility_e0db_autobias: float, dB, optional :arg efx_inteligibility_e0db: Audio level correction of the Intelligibility effect (should not be changed) (def. not used, overwritten by :attr:`efx_inteligibility_e0db_autobias`) :type efx_inteligibility_e0db: float, dB, optional :arg efx_denoiser_gate_coef_autobias: Bias for the automatic audio level of the Denoiser effect. The higher the value, the more denoised the sound. (def. -128.0) :type efx_denoiser_gate_coef_autobias: float, dB, optional :arg efx_denoiser_gate_coef: Audio level for the Denoiser effect (should not be changed) (def. not used, overwritten by :attr:`efx_denoiser_gate_coef_autobias`) :type efx_denoiser_gate_coef: float, dB, optional :arg mode: :attr:`onestep` or :attr:`analysis` or :attr:`synthesis` or :attr:`denoisenn` (def. :attr:`onestep`) :type mode: string, optional :arg denoisenn_modelpath: For option mode=:attr:`denoisenn` above. Give the path to the Neuralnet model. Has to be the root filename, which will extend to `.json` and `.norm`. Ex. `../mymodelpath`, which points to `../mymodelpath.json` and `../mymodelpath.norm` :type denoisenn_modelpath: string, optional :arg specs: The complex spectrum, at each frame (as provided by :attr:`analysis` mode), which can be modified. :type specs: list>> :arg ampenvs: The amplitude envelope, at each frame (as provided by :attr:`analysis` mode), which can be modified. :type ampenvs: list> :return: - **syn** (`array`) - The synthesized waveform. - **tts** (`array`,seconds) - The time instant of the center of the analysis window at each frame (needs :attr:`mode='analysis'`). - **f0ss** (`array`,Hz) - The :math:`f_0` values at each frame (needs :attr:`mode='analysis'`). - **f0confs** (`array`) - A confidence factor for :math:`f_0` (needs :attr:`mode='analysis'`). - **specs** (`list>>`) - The complex spectrum at each frame, which can be modified (needs :attr:`mode='analysis'`). - **ampenvs** (`list>`) - The amplitude envelope at each frame, which can be modified (needs :attr:`mode='analysis'`). Examples: .. code-block:: python syn, tts, f0s, f0confs, specs, envs = vmo.pvoccombo(wav, fs, mode='analysis') An example of analysis and synthesis steps: .. code-block:: python features = vmo.pvoccombo(wav, fs, mode='analysis') tts = features[1] specs = features[4] ampenvs = features[5] for fi in range(len(ampenvs)): dftlen = (len(ampenvs[fi])-1)/2 fleft = int(2000*dftlen/float(fs)) fright = int(6000*dftlen/float(fs)) ampenvs[fi][fleft:fright] *= 0.125 syn = vmo.pvoccombo(wav, fs, mode='synthesis', specs=specs, ampenvs=ampenvs) Complete example for scaling the f0 variance: .. literalinclude:: ../../scripts/test_pvoccombo_f0_variance_scaling.py :language: python :linenos: Complete example for making the voice more "afraid": .. literalinclude:: ../../scripts/test_pvoccombo_afraid.py :language: python :linenos: .. versionadded:: 0.17.7 .. function:: pola_ana(in, fs, [timestep_seconds], [winlen_seconds], [frame_type]) The analysis step of a `Pitch and OverLap-Add (POLA)` transformation. It basically extracts frames that can be modified and then merged using **pola_syn**. It uses a Pitch adaptive Overlap-Add (POLA) process. :arg in: The input wavform to extract the frames from. :type in: array :arg fs: The sampling rate. :type fs: int, Hz :arg timestep_seconds: (def. 0.005s) :type timestep_seconds: float, optional, seconds :arg winlen_seconds: (def. 3 periods of the fundamental frequency) :type winlen_seconds: float, optional, seconds :arg frame_type: :attr:`time` or :attr:`spec` (def. :attr:`time`) :type frame_type: string, optional The length of the DFT is always the next power of two above the winlen. .. note:: A good practice is to gather all the optional arguments into a :func:`dict()` and pass it as argument to both :func:`pola_ana` and :func:`pola_syn` since they have to be common (see the example below). :return: - **tss** (`float`,second) - Time instants of analysis (the center time of each frame) - **frames** (`list>`) - The frames - **f0ss** (`array`,Hz) - The :math:`f_0` values at each frame. See :func:`pola_syn` below for a complete example. .. function:: pola_syn(in, fs, kwargs) The synthesis step of a `Pitch and OverLap-Add (POLA)` transformation. It resynthesise frames that have been extracted using :func:`pola_ana`, and modified as desired. It uses a Pitch adaptive Overlap-Add (POLA) process. :arg tss: Times of analysis, as provided by :func:`pola_ana` (non-modified). :type tss: array, second :arg frames: The frames, as provided by :func:`pola_ana` and modified as desired. :type frames: list> :arg f0s: The :math:`f_0` values at each frame, as provided by :func:`pola_ana` (non-modified). :type f0s: `array`,Hz :arg wavlen: The number of samples in the synthesized waveform. :type wavlen: int, number of samples :arg fs: The sampling rate. :type fs: int, Hz :arg kwargs: Extra arguments to chose the type of frames extraction, as given to :func:`pola_ana` (non-modified). :return: - **syn** (`array`) - The synthesized waveform. Complete example: .. literalinclude:: ../../scripts/test_pola.py :language: python :linenos: .. versionadded:: 0.8.9 .. function:: pitch_scaling_snm(in, fs, [sps], [dpss], [spvs], [ep], [ses]) Pitch scale the given waveform using a sinusoidal model. It uses a Pitch adaptive Overlap-Add (POLA) process. :arg in: The input wavform to transform. :type in: array :arg fs: The sampling rate. :type fs: int, Hz :arg sps: Static pitch scaling factor (def. 1.0). E.g. If :attr:`sps` is 2.0, the pitch curve of the whole waveform will be twice higher than the original. :type sps: float, optional :arg dpss: Time stamped pitch scaling coefficients. It must have two dimensions, with shapes [2,N], where the first row [0,:] are the time instants [s] of the pitch scaling coefficients of the second row [1,:]. WARNING: This cannot be a list of arrays. The time instants must be in ascending order. The pitch scaling coefficients used at each frame is then linearly interpolated between the given neighbor values (constant extrapolation is used for any time instant outside of the given time range). :type dpss: 2D array, optional :arg spvs: Static pitch variance scaling factor (def. 1.0). E.g. If :attr:`spvs` is 2.0, the variancee of the pitch curve of the whole waveform will be twice wider than in the original. :arg ep: Preserve the amplitude spectral envelope (def. true). :type ep: boolean, optional :arg ses: Static envelope scaling factor (def. 1.0). E.g. If :attr:`ses` is 2.0, the envelope of the whole spectrum will be stretched with a factor 2. :type ses: float, optional :return: - **syn** (`array`) - The transformed waveform. - **tss** (`array`) - The time stamps of the f0 values. - **f0s** (`array`) - The :math:`f_0` values estimated during processing. Example: .. code-block:: python syn, tts, f0s = vmo.pitch_scaling_snm(wav, fs, sps=2.0) .. code-block:: python syn, tts, f0s = vmo.pitch_scaling_snm(wav, fs, dpss=[[0.0, 1.5, 2.0], [1.0, 0.8, 1.5]]) .. versionadded:: 0.10.1 .. function:: freqwarp_pola(in, fs, [gfs], [static_freqs], [dynamic_times], [dynamic_freqs], [f0min], [f0max]) Warp the spectral envelope in frequency domain. It uses a Pitch adaptive Overlap-Add (POLA) process. :arg in: The input wavform to transform. :type in: array :arg fs: The sampling rate. :type fs: int, Hz :arg gfs: Global frequency scaling parameter (def. 1.0). E.g. If `gfs` is 2.0, the envelope bin at 1kHz will be warped/shifted to 2kHz. This is can be combined with SFW and DFW options below. :type gfs: float, optional :arg static_freqs: Static Frequency Warping (SFW) parameters. It should have two dimensions, with shapes [2,N], where the first row [0,:] are the input frequencies and the the second row [1,:] are the corresponding output frequencies. The frequencies must be in ascending order. The warping function follows the same principle as for :attr:`gfs`. The warped frequencies between two given points in :attr:`static_freqs` are linearly interpolated between the given neighbor values. For a traditional use of frequency warping, two points should be part of :attr:`static_freqs`, one at zero frequency and one at Nyquist (please see the example below). This option is exclusive with the :attr:`dynamic_freqs` option. :type static_freqs: 2D array, Hz, optional :arg dynamic_times: Dynamic Frequency Warping (DFW) parameters time stamps. Time stamps of :attr:`dynamic_freqs` values (see below). This parameter has to be used jointly with the :attr:`dynamic_freqs` parameter. This option is exclusive with the :attr:`static_freqs` option. :type dynamic_times: array, seconds, optional :arg dynamic_freqs: Dynamic Frequency Warping (DFW) parameters. A list of 2D arrays as in :attr:`static_freqs`. The time stamps of each element of this list are in :attr:`dynamic_times`. This parameter has to be used jointly with the :attr:`dynamic_times` parameter. This option is exclusive with the :attr:`static_freqs` option. :type dynamic_freqs: list< 2D array >, Hz, optional :arg f0min: The minimal :math:`f_0` value. :type f0min: float, Hz, optional :arg f0max: The maximal :math:`f_0` value. :type f0max: float, Hz, optional :return: - **syn** (`array`) - The transformed waveform. - **tss** (`array`) - The time stamps of the f0 values. - **f0s** (`array`) - The :math:`f_0` values estimated during processing. Examples: .. code-block:: python syn, tts, f0s = vmo.freqwarp_pola(wav, fs, gfs=0.5, static_freqs=[[0,2000,fs/2],[0,4000,fs/2]]) .. code-block:: python syn, tss, f0s = vmo.freqwarp_pola(wav, fs, dynamic_times=[0.0, 2.0], dynamic_freqs=[[[0,3000,fs/2],[0,2500,fs/2]],[[0,2000,fs/2],[0,4000,fs/2]]]) .. versionadded:: 0.10.1 .. function:: smile(in, fs, [alpha], [alphas], [f0min], [f0max], [gender], [anchorfreqs], [shelf]) Transform a wavform using the SMILE algorithm. It uses a Pitch adaptive Overlap-Add (POLA) process. :arg in: The input wavform to transform. :type in: array :arg fs: The sampling rate. :type fs: int, Hz :arg alpha: The :attr:`alpha` value for scaling the effect (in [0.8,1.4], def. 1.0). :type alpha: float, optional :arg alphas: Time stamped :attr:`alpha` values. It must have two dimensions, with shapes [2,N], where the first row [0,:] are the time instants [s] of the :attr:`alpha` values of the second row [1,:]. The time instants must be in ascending order. The :attr:`alpha` value used at each frame is then linearly interpolated between the given neighbor values (constant extrapolation is used for any time instant outside of the given range in :attr:`alphas`). :type alphas: 2D array, optional :arg f0min: The minimal :math:`f_0` value. :type f0min: float, Hz, optional :arg f0max: The maximal :math:`f_0` value. :type f0max: float, Hz, optional :arg gender: 'male' or 'female', specify the gender (def. to `None`, an average value). :type gender: str, optional :arg anchorfreqs: Custom attachment frequencies A size 4 vector that defines: the two values where the frequencies are preserved (values at index 0 and 3), and the two values defining the interval of the frequencies that are warped (values at index 1 and 2). Given FN the N-th formant freqeuncy, set them to [F1/2, F2, F3, F5], in order to fallback on the alternative. Set the first value to -1 in order to disable these custom values and re-use the internal values. :type anchorfreqs: 1D array, optional :arg shelf: Custom shelf parameters (frequency[Hz], gain[dB]) A size 2 vector with: The custom frequency [Hz] and gain [dB]. Set the custom frequency to -1 in order to disable these custom values and re-use the internal values. :type shelf: 1D array, optional :return: - **syn** (`array`) - The transformed waveform. - **tss** (`array`) - The time stamps of the f0 values. - **f0s** (`array`) - The :math:`f_0` values estimated during processing. - **envs_ori** (`list>`) - The spectral envelopes of the analysed file. - **envs_new** (`list>`) - The transformed spectral envelopes applied to obtain the resulting file. Examples: .. code-block:: python syn, tts, f0s, envs_ori, envs_new = vmo.smile(wav, fs, alpha=1.2) .. code-block:: python syn, tts, f0s, envs_ori, envs_new = vmo.smile(wav, fs, alphas=[[0.0, 1.5, 2.0], [1.0, 1.0, 1.2]]) .. versionadded:: 0.10.1 .. function:: intelligibility(in, fs, [IOEC0dB], [scaling]) Transform a waveform using the Intelligibility algorithm. It uses a Pitch adaptive Overlap-Add (OLA) process. :arg in: The input waveform to transform. :type in: array :arg fs: The sampling rate. :type fs: int, Hz :arg IOEC0dB: The :attr:`IOEC0dB` compression reference (def. -10.0 dB). :type IOEC0dB: float, dB, optional :arg scaling: The :attr:`scaling` value for scaling the effect (in [0.0,1.0], def. 1.0). :type scaling: float, optional :return: - **syn** (`array`) - The transformed waveform. Examples: .. code-block:: python syn = vmo.intelligibility(wav, fs, IOEC0dB=-15.0, scaling=1.0) .. versionadded:: 0.16.6 .. function:: denoise(in, fs, [gate_coef_db], [gate_coef_auto_bias_db]) Denoise a waveform using spectral noise gate. It uses a Pitch adaptive Overlap-Add (OLA) process. :arg in: The input waveform to transform. :type in: array :arg fs: The sampling rate. :type fs: int, Hz :arg gate_coef_db: The noise threshold (def. -46.0 dB). :type gate_coef_db: float, dB, optional :arg gate_coef_auto_bias_db: The threshold bias used with semi-automatic gate threshold (def. -12.0 dB). :type gate_coef_auto_bias_db: float, dB, optional :return: - **syn** (`array`) - The transformed waveform. Examples: .. code-block:: python syn = vmo.denoise(wav, fs, gate_coef_db=-15.0) .. versionadded:: 0.17.8 .. function:: pitch_scaling_doubledelay(in, fs, [sps], [dpss], [spvs]) .. deprecated:: 0.20.1 Please use :func:`pvoccombo` or :func:`pitch_scaling_snm`