This study explores the relationship between manual text classifications linked to corpus metadata and computationally derived text groupings in corpora of Early New High German. We employ vector space models which can arrange texts in a multidimensional space based on semantic similarities and then cluster 463 texts from the ReF and GerManC corpora (1350–1800) into lexically and semantically defined groups. This operationalization of text types allows for the observation that there are more or less prototypical representatives of a text type and that there are overlaps and divergences in the development of such types. We evaluate the result of our quantitative analysis in a LASSO regression model which predicts the relative frequency of wh-relative pronouns, a linguistic variable known for genre-sensitive and historical variation. Our results show that data-driven clustering are at least complementary to traditional classifications in capturing semantic distinctions and diachronic variation in textual traditions. The findings contribute to historical text linguistics by proposing a bottom-up methodology for identifying text types and revealing how genre evolution correlates with linguistic change.