The goal of Impact and Fiction was to relate types of reader response to properties of the novels that readers had read. That is: would we be able to relate expressions like “This novel was all exciting twists and turns” as a reader response of a type “narrative” to, for instance, the pace in a Dan Brown novel? Given enough data we would expect to be able to then infer more general trends, for instance that readers that often write about the style of novels would in particular appreciate more “high brow“ literature. With more than 600,000 online reviews and more than 16,000 full texts of novels at hand, we expected to be able to compute precise and insightful trends.
The reality of our project proved to be less straightforward. We do have a validated model that is able to identify terms that readers use to describe the impact works of fiction had on them in four categories(“narrative”, “aesthetic”, “stylistic”, and “other“). We were also able to show that in general users/readers behave in pretty much the same ways on platforms when describing works of fiction they read, and that the length of their review is tell tale for what type of response they will report — e.g. shorter reviews will generally only report aesthetic evaluations like “great read“, while longer reviews will include a more detailed description of the plot. It turns out however to be much harder to relate such terms used by readers insightfully to the full text of actual books of fiction. We were able to show that topics from topic models cluster extremely strongly to genre. Using a manually categorization of the topics we are able to relate certain terms used by readers to report impact to certain genres – for instance, like one would expect maybe a term such as “suspenseful” is used significantly more in relation to thrillers than in relation to other fiction genres, as is “funny” in the case of children's literature. We also applied sentiment analysis in an attempt to see if the level of volatility of sentiment could be correlated to impact terms used by readers. Results are inconclusive however, and mostly again support our doubts about the robustness and interpretability of current sentiment mining techniques.
Thus, although we made good progress, we were not able to answer all research questions we posed satisfactorily yet. One reason for this is that current common models (i.e. topic models and sentiment analysis) do not deliver the type of measurements that are useful from a literary research perspective. The mirror image of this problem is that it is mostly unclear still how literary and narrative concepts colloquially used by literary scholars can be effectively computationally operationalized.
Being able to relate concrete aspects and properties of novels (e.g. character traits of protagonists, complexity and development of story time, sentiment development, and so forth) to terms used by readers to describe such novels could be a means of bringer readers closer to the novels that they like or need but cannot effectively find or identify. We would like to continue our research in the interest of both publishers and readers so that supply and demand could be enhanced effectively. The demo application that we published is a first attempt at offering a tool to both publishers and readers that offers some insights into how readers' judgments and novel genre/topic relate. A major improvement of such 'dashboards' would be possible if we were able to measure concrete narrative aspects of fiction. Uncovering effective operationalizations of such narrative concepts is therefore a priority in our future research.