Neural Encoding Detection is Not All You Need for Synthetic Speech Detection

1Fraunhofer Institute for Digital Media Technology, 2National Institute of Informatics

Abstract

This paper reviews the current state and emerging trends in synthetic speech detection. It outlines the main data-driven approaches, discusses the advantages and drawbacks of focusing future research solely on neural encoding detection, and offers recommendations for promising research directions.

The observations in this paper aim to guide future state-of-the-art research in the field and to highlight the risk of overcommitting to approaches that may not stand the test of time.

This page complements the paper by providing a full evaluation of a few off-the-shelf models for synthetic speech detection, that were tested firstly on the pristine ASVSpoof 2019 LA eval dataset, and then with its variants created by neurally encoding the bona fide trials.

Neural Encoders

The neural encoders involved in the experiments are reported below. The respective github links point to the specific commit used for generating the evaluation data.

# Model name

Synthetic Speech Detection Algorithms

The off-the-shelves synthetic speech detection algorithms involved in the experiments are reported below. The respective github links point to the specific commit used for analysing the evaluation data.

# Model name

Bona Fide Examples

Example bona fide trials used for the performance evaluation. The encoders were configured to compress the input utterances, and the output was then decoded as WAV.

Dataset Variant Example ID

Performance Evaluation

Summary of the balanced accuracy (BAC) and equal error rate (EER) achieved by the latest self-supervised-learning-based methods on the ASVSpoof 2019 LA eval dataset and neurally-encoded variants thereof.

The results were obtained by using the off-the-shelf weights provided by the authors of the respective detection models.

Balanced Accuracy

Equal Error Rate

Detection performance, that are perfect on the original ASVSpoof19 dataset, degrade dramatically in presence of neural encoding of bonafide trials, with the sole exception of the Descript Audio Codec.

Results breakdown in terms of area under the curve (AUC), equal error rate (EER) and balanced accuracy (BAC).

The sampling frequency at which each neural encoder operates was noted whenever different than 16 kHz.

Baseline XLSR-AASIST XLSR-SLS XLSR-MAMBA
AUC EER BAC AUC EER BAC AUC EER BAC
TEMPLATE
ASVspoof19 LA 0.99990.00150.9983 0.99990.00230.9977 0.99990.00200.9964
ASVspoof19 LA (24 kHz) 0.99990.00260.9975 0.99990.00260.9970 0.99990.00200.9969
ASVspoof19 LA (44.1 kHz) 0.99990.00250.9974 0.99990.00260.9970 0.99990.00200.9970
Neural Encoders XLSR-AASIST XLSR-SLS XLSR-MAMBA
AUC EER BAC AUC EER BAC AUC EER BAC
EnCodec (24 kHz) 0.6276 0.3955 0.5043 0.8737 0.1992 0.5070 0.6973 0.329 0.5079
Lyra-V2 0.4695 0.5050 0.5145 0.7897 0.2755 0.5153 0.4629 0.5101 0.5009
Descript Audio Codec (44.1 kHz) 0.9989 0.0151 0.9737 0.9984 0.0189 0.9484 0.9988 0.0166 0.9802
L3AC 0.8823 0.1948 0.5584 0.9158 0.1589 0.5301 0.8792 0.1859 0.5863
SpeechTokenizer 0.9803 0.0677 0.7537 0.9721 0.0838 0.6675 0.9719 0.0877 0.7737

Since the detection performance are not affected by the sampling rate and the spoofed content was not modified, their decrease depends entirely on the occurrence of false alarms on bona fide trials.

ROC curves upon ASVSpoof 2019 LA eval dataset and neurally encoded variants. The operating point for output probability of 0.5 (where, by convention, p>0.5 implies that the content is synthetic) is marked by a circle.

The presence of neural encoding is drastically moving the p=0.5 operating point, resulting in insufficient balanced accuracy.

BibTeX

@article{cuccovillo2026iwbf,
  author = {Cuccovillo, Luca and Wang, Xin and Gerhardt, Milica and Aichroth, Patrick},
  title = {Neural Encoding Detection is Not All You Need for Synthetic Speech Detection},
  booktitle = {IEEE International Workshop on Biometrics and Forensics (IWBF)},
  location = {Sophia Antipolis, France},
  year = {2026},
  @pages = {IN_PRESS},
  note = {in press},
}