AI, Human, or Hybrid? Reliability of AI Detection Tools in Multi-Authored Texts
This article presents the first results of the CorpIdentIA project (Corpus Identity & Authorship Intelligence Analysis), focused on the analysis of texts generated wholly or partially by artificial intelligence. Based on an experimental Spanish corpus (n = 180) that includes human, artificial, and mixed texts, the study analyzes the performance of three detectors (Originality.ai, GPTZero, and Copyleaks) against different generative models (ChatGPT, Gemini, and Grok). The main objective of this study is to evaluate the effectiveness of different AI detection tools in classifying these texts according to their origin (AI, human, or hybrid). The results reveal significant differences among tools: Originality.ai shows the best overall performance, while GPTZero stands out for its low rate of false positives. However, none of the tools demonstrates acceptable reliability in detecting hybrid texts. Recurrent biases are observed depending on the AI model, along with misclassifications with high confidence, which raises risks in the implementation of these tools without expert human review. This work contributes to the current debate on the trustworthiness of detectors, the risk of false accusations in forensic contexts, and the need for explainable approaches from applied linguistics. Besides, these findings underline the importance of interdisciplinary collaboration between linguists, computer scientists, and legal experts.