During her 2024-2025 research leave, Japan Studies Librarian Azusa Tanaka addressed a longstanding challenge in Japanese book history: dating premodern books (pre-1868) that lack publication dates. Traditional methods are often subjective or invasive, but Tanaka used a novel, non-invasive approach combining microscopic imaging and machine learning.
Using a digital microscope, Tanaka captured over 13,000 images from two Edo-period sources: the Ise calendars and Bukan directories. From these images, she extracted five features of paper fibers — such as density and thickness — and used them to train two machine learning models: Random Forest and XGBoost. The Random Forest model proved the most accurate, estimating publication years with errors under 20 years.
Backed by cloud research grants from Microsoft Azure and Amazon Web Services, Tanaka’s method offers a scalable, data-driven approach to improve the dating and cataloging of historical Japanese books. Her future plans include expanding the dataset and experimenting with deep learning models to enhance prediction accuracy even further.
