Skip to main content

Statistics, Machine Learning, and Classical Japanese Orthography

April 28, 2025

What if statistics could solve a literary mystery centuries in the making? That’s exactly what Department of Asian Languages and Literature Professor Paul Atkins, along with mathematics Ph.D. students Herman Chau and Michael Zeng, set out to do.

Excerpt from a copy of “Kokin wakashū (Collection of Ancient and Modern Songs of Japan)”

Excerpt from a copy of “Kokin wakashū (Collection of Ancient and Modern Songs of Japan)” from the Reizei Family Shiguretei Library Foundation. Image by the Japanese Agency for Cultural Affairs.

Using data analysis and machine learning, the three discussed how they used statistical analysis of hiragana usage to to uncover the true authorship of disputed medieval Japanese manuscripts during the April 14 lecture “Statistics, Machine Learning, and Classical Japanese Orthography.” Atkins also explained how his team is using machine learning to automatically pull writing data from images of old manuscripts, making it easier to study large collections of classical Japanese literature.

The talk was held in-person in the Tateuchi East Asia Library (TEAL) seminar room as part of the 2025 TEAL Digital Scholarship Series, which highlights innovative digital tools and cutting-edge research in China, Japan, Korea, and Taiwan.