AUTHOR=Khanjani Zahra , Watson Gabrielle , Janeja Vandana P. 

TITLE=Audio deepfakes: A survey

JOURNAL=Frontiers in Big Data

VOLUME=Volume 5 - 2022

YEAR=2023

URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2022.1001063

DOI=10.3389/fdata.2022.1001063

ISSN=2624-909X

ABSTRACT=Deepfake is content or material that is synthetically generated or manipulated using artificial intelligence (AI) methods, to be passed off as real and can include audio, video, image, and text synthesis. The key differentiator between manual editing and deepfakes is that deepfakes are AI generated or AI manipulated  and closely resemble authentic artifacts. In some cases, deepfakes can be entirely fabricated using AI generated content in its entirety. Deepfakes have started to have a major impact in society with more generation mechanisms emerging everyday. Our paper particularly focuses on audio deepfakes. 
In general, purpose of this survey is to provide readers with a deeper understanding of 1) different deepfake categories 2) how generally they could be created and detected  3) the most recent trends in this domain and shortcomings in detection methods 4) audio deepfakes, how they are created and detected in more detail which is the main focus of this paper.  We found that Generative Adversarial Networks(GAN), Convolutional Neural Networks (CNN), and Deep Neural Networks (DNN) are common ways of creating and detecting deepfakes. 
In our evaluation of over 140 methods we found that the majority of the focus is on video deepfakes and in particular in the generation of video deepfakes. We found that for text deepfakes there are more generation methods but very few robust methods for detection, including fake news detection, which has become a controversial area of research because of the potential of heavy overlaps with human generation of fake content. Our study reveals a clear need to research audio deepfakes and particularly detection of audio deepfakes. 
This survey not only evaluates generation and detection methods in the different deepfake categories, but mainly focuses on audio deepfakes that are overlooked in most of the existing surveys. This paper’s most important contribution is to critically analyze and provide a unique source of audio deepfake research, mostly ranging from 2016 to 2020. To the best of our knowledge, this is the first survey focusing on audio deepfakes in English.