evidence

doc2vec-based assisted close reading with support for abstract concept-based search and context-based search

4
mentions
9
contributors
Get started
479 commitsLast commit ≈ 34 months ago3 stars1 fork

Cite this software

What evidence can do for you

  • Provides AI/machine-learning support for close-reading-based research
  • Intuitive example based search throughout large corpora
  • browser-based usage / User interface
  • concept based search using abstract doc2vec representations
  • context based search using word frequency/TF-IDF represenations
  • automated processing of user-supplied corpora

Machine-supported research in humanities

While research in the humanities has been able to leverage the digitization of text corpora and the development of computer based text analysis tools to its benefit, the interface current systems provide the user with is incompatible with the proven method of scholarly close reading of texts which is key in many research scenarios pursuing complex research questions.

What this boils down to, is the fact that it is often restrictive and difficult, if not impossible, to formulate adequate selection criteria, in particular for more complex or abstract concepts, in the framework of a keyword based search which is the standard entry point to digitized text collections.

Querying by example - close reading with tailored suggestions

evidence provides an alternative, intuitive entry point into collections by leveraging the doc2vec framework. Using doc2vec evidence learns abstract representations of the theme and content of the elements of the user's corpus. Then, instead of trying to translate the scientific query into keywords, after compiling a set of relevant elements as starting points, i.e. examples of the concept the user is interested in, the user can query the corpus based on these examples of their concept of interest. Specifically, evidence retrieves elements with similar abstract representations and presents them to the user, using the users feedback to refine its retrieval.
Furthermore, this concept-based query mode is complemented by the ability to perform additional retrieval using more-like-this context based retrieval function provided by elasticsearch.
Together, this enables a user to combine the power of a close-reading approach with that of a large digitized corpus, selecting elements from the entire corpus which are likely to be of interest, but leaving the decision up to the user as to what evidence they deem useful.

Keywords
Programming languages
  • TypeScript 43%
  • Go 29%
  • Jupyter Notebook 19%
  • Shell 6%
  • CSS 1%
  • Dockerfile 1%
  • HTML 1%
  • Python 1%
License
</>Source code

Participating organisations

Social Sciences & Humanities
Social Sciences & Humanities
KNAW Humanities Cluster
Netherlands eScience Center

Mentions

Digital technologies to analyze eyewitness accounts of mass violence

Author(s): Netherlands eScience Center
Published in 2017

Contributors

BL
Bas Leenknegt
Faruk Diblen
Faruk Diblen
HdJ
Hayco de Jong
Jurriaan H. Spaaks
Jurriaan H. Spaaks
LB
Lars Buitinck
KNAW Humanities Cluster
Meiert Willem Grootes
Meiert Willem Grootes
Willem van Hage
Willem van Hage

Related projects

EviDENce

Ego Documents Events modelling – how individuals recall mass violence

Updated 20 months ago
Finished