evidence

doi:10.5281/zenodo.3954885

Cite this software

DOI:

10.5281/zenodo.3954885

Description

Provides AI/machine-learning support for close-reading-based research
Intuitive example based search throughout large corpora
browser-based usage / User interface
concept based search using abstract doc2vec representations
context based search using word frequency/TF-IDF represenations
automated processing of user-supplied corpora

Machine-supported research in humanities

While research in the humanities has been able to leverage the digitization of text corpora and the development of computer based text analysis tools to its benefit, the interface current systems provide the user with is incompatible with the proven method of scholarly close reading of texts which is key in many research scenarios pursuing complex research questions.

What this boils down to, is the fact that it is often restrictive and difficult, if not impossible, to formulate adequate selection criteria, in particular for more complex or abstract concepts, in the framework of a keyword based search which is the standard entry point to digitized text collections.

Querying by example - close reading with tailored suggestions

evidence provides an alternative, intuitive entry point into collections by leveraging the doc2vec framework. Using doc2vec evidence learns abstract representations of the theme and content of the elements of the user's corpus. Then, instead of trying to translate the scientific query into keywords, after compiling a set of relevant elements as starting points, i.e. examples of the concept the user is interested in, the user can query the corpus based on these examples of their concept of interest. Specifically, evidence retrieves elements with similar abstract representations and presents them to the user, using the users feedback to refine its retrieval.
Furthermore, this concept-based query mode is complemented by the ability to perform additional retrieval using more-like-this context based retrieval function provided by elasticsearch.
Together, this enables a user to combine the power of a close-reading approach with that of a large digitized corpus, selecting elements from the entire corpus which are likely to be of interest, but leaving the decision up to the user as to what evidence they deem useful.

Keywords

Text analysis & natural language processing

Programming languages