CASTEMO data blooms in Neo4j graph databases

Great news! CASTEMO data collected in InkVisitor can now be readily outputted into Neo4j graph databases. Neo4j allows for collected data to be flexibly explored, queried, enriched, and analysed with graph tools.

27 Nov 2023 David Zbíral Robert Laurence John Shaw Tomáš Hampejs

Since the inception of DISSINET, we have been developing a human controlled but computer assisted statement-based approach to the collection of complex syntactic-semantic data from texts. We call it CASTEMO - Computer-Assisted Semantic Text Modelling. If you have followed DISSINET, you will know about CASTEMO, and if you haven’t, this is where you can find out more:

Structured data vs. contextual complexity of texts: An unnecessary dilemma?

Model the source first! Towards Computer-Assisted Semantic Text Modelling and source criticism 2.0)

CASTEMO allows you to:

Record anything you want from and concerning texts simply by relating entities of various types (Person, Group, Object, Action, Concept, etc.) to each other.
Model the text itself before modelling realities or a research problem: but you can easily proceed to these later from the data you have collected.
Preserve salient lexical, syntactic, semantic, and contextual features: you can retain textual nuances (including original language) and document context (e.g. order of information).
Record conflicting and ambiguous evidence since modelling the text does not require you to make a decision during data collection.
Add analytical layers: create semantic and ontological connections at a level above the text that facilitates research.
Collect data selectively as well as comprehensively: just because you can capture anything doesn’t mean you have to.
Put source-criticism at the heart of computational research on texts: because you have modelled the source, the opportunities for source-criticism are enhanced, not curtailed.

We have developed a cutting-edge application with an advanced graphical user interface, InkVisitor, which makes the full power of CASTEMO accessible to any interested researcher.

However, it is one thing to have a great data model and data collection interface, and another is to find a convenient and well-organised way to explore and analyse that data in a research-oriented manner.

With CASTEMO, thankfully, there is a straightforward solution to this challenge. Thanks to the DISSINET team’s hard work on data model development and data transformation scripts, it is now possible to output CASTEMO data collected in InkVisitor into flexible, operable, and efficient Neo4j graph databases. Graph databases are ideal for exploring the rich web of interconnections made by following the CASTEMO approach. Neo4j Community Edition is the most widespread, open-source graph database, which can be used free of charge. Furthermore, Neo4j comes with an immense variety of extensions for data visualisation and analysis, and supports export to standard data exchange formats. Neo4j thus places CASTEMO data at your fingertips, allowing you to get on with the fun part: exploration and research.

And this is not all. Neo4j queries also allow you to enrich your CASTEMO data, creating further analytical categorisations and connections in line with your research needs as they emerge. You can thicken the relations between your data and create convenient “shortcuts” to assist your exploration and analysis. For instance, if one Person entity is related to another by way of the Concept entity “sister”, you can easily use the reciprocal Concept entity (“sibling”) to add a new relationship going the other way, just by a simple Neo4J query that can scale to cover all such connections.

Tempted by CASTEMO but never tried Neo4j queries? Don’t worry: Neo4j is presently the most widely used graph database management system. There is a lot of material to learn from.

Don’t like to be bound to a specific database format? Then we have good news for you, too: The important thing is to get the relations between entities right in InkVisitor, in the way you want them to be. Then, the final format is just a question of practicality and preference. It took us just a little more than two weeks to develop a pipeline transforming CASTEMO data collected in InkVisitor into Neo4j format. With such a very reasonable amount of energy, you can transform the data into any other format: an SQL relational database, a NoSQL database, a collection of simple spreadsheets, or even – if, for some reason, you wanted a hard copy and had a lot of space – a printed card catalogue.

DISSINET has expanded years of work developing the CASTEMO approach, its data model, and the InkVisitor application. With the ability to readily export collected data into powerful research databases, we feel these efforts are coming to full fruition. Think CASTEMO might be right for your research needs? Come join us in the adventure!