Psychoacoustic filtering for noisy speech enhancement

Regular paper

Sana Alaya


Wednesday 3 june, 2015, 16:00 - 16:20

0.2 Berlin (90)

Noise reduction techniques are constrained by a compromise between robust noise reduction, minimizing distortions level and musical noise. Modeling the human auditory system helps effectively to the problem of speech enhancement and more specifically in noise reduction to obtain a good quality and intelligibility signal. A new denoising approach is introduced in this paper. It is based on the fact that denoising may be performed by mimicking the human ear function in order to improve the psychoacoustics appearance of speech signal. The proposed method processes the signal as follows. First, the speech signal is decomposed by using a gammatone filterbank in accordance with ERB scale. This latter is characterized by her nonlinear frequency decomposition imitating the human ear decomposition. Second, spectral attenuation filtering is applied in each sub-band which is based on continuous noise estimation. Third, masking threshold is calculated using the Johnston model in the output of the spectral attenuation filter. Fourth, the calculated threshold is then inserted in the Psychoacoustics gain filter. Finally, the Psychoacoustics filter will be applied on each output sub-band of the spectral attenuation filter. The proposed method uses speech signal of TIMIT data base corrupted by real noise environment as (car and street) at different signal to noise ratio levels varying from 0dB to 15dB. Evaluation tests are realized using objective and subjective criterion such as Perceptual Evaluation of Speech Quality (PESQ) for the objectives scores and mean the quality rating of signal distortion (SIG), noise distortion (BAK) and overall quality (OVRL) for the subjective scores. For example, we obtain SIG=3.34 BAK=2.78 OVRL=3.14 for our method versus SIG=3.45 BAK=2.54 OVRL=2.87 for the classic spectral attenuation method at 10dB for car noise. The results show that our method gives best global quality of the enhanced speech signal while maximizing noise elimination and minimizing distortion.

