Machine learning methods for stylometry : authorship attribution and author profiling / Jacques Savoy.

This book presents methods and approaches used to identify the true author of a doubtful document or text excerpt. It provides a broad introduction to all text categorization problems (like authorship attribution, psychological traits of the author, detecting fake news, etc.) grounded in stylistic f...

Full description

Saved in:
Bibliographic Details
Main Author: Savoy, Jacques, 1958- (Author)
Format: eBook
Language:English
Published: Cham, Switzerland : Springer, [2020]
Subjects:
Online Access:Click for online access
Table of Contents:
  • Intro
  • Preface
  • Book Structure
  • Hands-On Exercises and Examples
  • Acknowledgements
  • Contents
  • Acronyms
  • List of Symbols
  • Part I Fundamental Concepts and Models
  • 1 Introduction to Stylistic Models and Applications
  • 1.1 Overview and Definitions
  • 1.2 Style and Its Explaining Factors
  • 1.3 Authorship Attribution
  • 1.4 Author Profiling
  • 1.5 Forensic Issues
  • 1.6 Author Clustering
  • 1.7 Other Related Problems
  • 2 Basic Lexical Concepts and Measurements
  • 2.1 Stylometric Model
  • 2.2 Our Running Example: The Federalist Papers
  • 2.3 The Zipf's Law
  • 2.4 Vocabulary Richness Measures
  • 2.5 Overall Stylistic Measures
  • 2.6 And the Letters?
  • 3 Distance-Based Approaches
  • 3.1 Burrows' Delta
  • 3.2 Kullback-Leibler Divergence Method
  • 3.3 Labbé's Intertextual Distance
  • 3.4 Other Distance Functions
  • 3.5 Principal Component Analysis (PCA)
  • Part II Advanced Models and Evaluation
  • 4 Evaluation Methodology and Test Corpora
  • 4.1 Preliminary Remarks
  • 4.2 Text Quality and Preprocessing
  • 4.3 Performance Measures
  • 4.4 Precision, Recall, and F1 Measurements
  • 4.5 Confidence Interval
  • 4.6 Statistical Assessment
  • 4.7 Training and Test Sample
  • 4.8 Classical Problems
  • 4.9 CLEF PAN Test Collections
  • 4.10 Evaluation Examples
  • 5 Features Identification and Selection
  • 5.1 Word-Based Stylistic Features
  • 5.2 Other Stylistic Feature Extraction Strategies
  • 5.3 Frequency-Based Feature Selection
  • 5.4 Filter-Based Feature Selection
  • 5.5 Wrapper Feature Selection
  • 5.6 Characteristic Vocabulary
  • 6 Machine Learning Models
  • 6.1 k-Nearest Neighbors (k-NN)
  • 6.2 Naïve Bayes
  • 6.3 Support Vector Machines (SVMs)
  • 6.4 Logistic Regression
  • 6.5 Examples with R
  • 6.5.1 K-Nearest Neighbors (k-NN)
  • 6.5.2 Naïve Bayes
  • 6.5.3 Support Vector Machines (SVMs)
  • 6.5.4 Logistic Regression
  • 7 Advanced Models for Stylometric Applications
  • 7.1 Zeta Method
  • 7.2 Compression Methods
  • 7.3 Latent Dirichlet Allocation (LDA)
  • 7.4 Verification Problem
  • 7.5 Collaborative Authorship
  • 7.6 Neural Network and Authorship Attribution
  • 7.7 Distributed Language Representation
  • 7.8 Deep Learning and Long Short-Term Memory (LSTM)
  • 7.9 Adversarial Stylometry and Obfuscation
  • Part III Cases Studies
  • 8 Elena Ferrante: A Case Study in Authorship Attribution
  • 8.1 Corpus and Objectives
  • 8.2 Stylistic Mapping of the Contemporary Italian Literature
  • 8.3 Delta Model
  • 8.4 Labbé's Intertextual Distance
  • 8.5 Zeta Test
  • 8.6 Qualitative Analysis
  • 8.7 Conclusion
  • 9 Author Profiling of Tweets
  • 9.1 Corpus and Research Questions
  • 9.2 Bots versus Humans
  • 9.3 Man vs. Woman
  • 9.4 Conclusion
  • 10 Applications to Political Speeches
  • 10.1 Corpus Selection and Description
  • 10.2 Overall Measurements
  • 10.3 Stylistic Similarities Between Presidencies
  • 10.4 Characteristics Words and Sentences
  • 10.5 Rhetoric and Style Analysis by Wordlists
  • 10.6 Conclusion