Most of the world's knowledge is contained in digitally available texts. These texts represent a significant source of knowledge, but how can this knowledge be extracted? In this updated and expanded new edition of the first German textbook on this topic, you will learn how digital text can be prepared, processed and used in applications with the help of text mining.
The glossary for the book is available here: Download (German)
Here you will find various resources that are used or referenced in the book. These include the text data used and the ASV Online Toolbox in which you can try out procedures explained in the book directly in your browser.
The corpora used in the book can be downloaded here in various sizes (measured in number of sentences). The format of the downloads is explained here.
Some of the procedures described in the book can be tested directly via our toolbox. We recommend using the Online Toolbox; however, the older ASV Toolbox is still available for download.
The Online Toolbox is a modular collection of different tools for analysing written language and allows you to test many of the methods presented directly in the browser.
The ASV Toolbox is a collection of different tools for analysing written language. It was developed at the Department of Automatic Speech Processing at the University of Leipzig and is no longer being developed further. Sie kann bei der Language Technology Group, Universität Hamburg heruntergeladen werden.