Semantics revisited

We have seen many “topics” or subdomains becoming more or less popular in the wide range of things considered “artificial intelligence” in the last decades. Even the term artificial intelligence became a bit of a term to avoid when presenting research proposals twenty years ago. Neural networks have a long history and yet ten years ago most people were not hearing about them.

Now we see a new come back: that of semantic technologies. We used to talk a lot about semantics in the nineties and in the early part of this century. Still, the concrete approaches did not became scalable. Other approaches to data management took hold.

Still, slowly but surely, semantics started to appear again, even if not so well understood by many who were supposed to apply it. In the last few years people in the NLP and related domains started to use the word semantics in the context of word embeddings and document embeddings. Open data became more important and suddenly more people started to realise ontologies can be based or enriched by machine learning approaches.

Some of the more exciting things I have seen out there:

  • the improvement of graph data bases such as Neo4J and Amazon Neptune (itself connected to other interesting services)
  • the spread of Query and to a lesser extent Gremlin
  • the appearance of some tools such as Owlready for interaction with the Python sphere
  • the maturity of resources such as DBPedia, Wikipedia and a pletora of projects connected to these
  • the improvement of algorithms for graph manipulation

There are lots of interesting challenges that we need to tackle now in order to solve real life problems. One of them is how to optimise versioning and automatic growth of ontologies with external resources, how to protect personal data in these systems and how to represent ever more complexed relations, specially those explaining “stories”.

I believe this is where we need to explore at a very abstract and then concrete level what I will call high order syntax. Linguists have worked for millenia on syntactic problems. Software specialists have worked on the syntax of programming languages for several decades now. Likewise ontology experts have been developing ever more complex frameworks to express n-nary relationships. Now we need semantic theories that help us tell and manipulate stories with data. And then we will need to spread the knowledge.

Currently there are lots of people working in the NLP spectrum who do not have an understanding of syntax in the linguistic sense and they also lack knowledge about syntax in the sense of semantic languages such as OWL. They talk about “language models” when they are experimenting with parameters for optimising transformers for this or that recognition of text and images. Computational linguists and semantic specialists are needed in order to develop more comprehensive frameworks so that digital systems can somehow tell or recognise stories and, more importantly, react upon them in a reliable way.

I recommend two newish books for those interested in semantic technologies: Knowledge Graphs, Fundamentals, Techniques and Applications, by Keyriwal et alia (2021) and, a little bit more mundane but still interesting, Ontologies with Python by Lamy Jean-Baptiste (also 2021).