The Daily Corpora

This is the web platform of The Daily Corpora, a platform for evaluating and exploring linguistically annotated text corpora. Currently, it contains an annotated version of the German Wikipedia - depending on available hardware, maybe more corpora will be added in the future.

The documentation can be found here.

You can easily explore the functionality by clicking through the entries below:

Articles: Displays the plain text of a Wikipedia article. You can apply sentiment analysis, display the part of speech tags in colors or analyze the named entities in an article.

Search: Similar to Google & Co., you can enter a single word or a phrase in a simple search field and search through a corpus.

Concordances: Search for a word and display it with the context surrounding it.

Words: Display details and statistics about a certain lemma in a corpus.

Topic modeling: "What is this person's profession?" Here you can enter the name of a famous person and via machine learning models in the backgroud, the system will answer which profession the person has.

Wordcloud: draws a nice WordCloud for a given lemma while taking the surrounding tokens and their part of speech tags into account.