Category Archives: Uncategorized

Semantics revisited

We have seen many “topics” or subdomains becoming more or less popular in the wide range of things considered “artificial intelligence” in the last decades. Even the term artificial intelligence became a bit of a term to avoid when presenting research proposals twenty years ago. Neural networks have a long history and yet ten years ago most people were not hearing about them.

Now we see a new come back: that of semantic technologies. We used to talk a lot about semantics in the nineties and in the early part of this century. Still, the concrete approaches did not became scalable. Other approaches to data management took hold.

Still, slowly but surely, semantics started to appear again, even if not so well understood by many who were supposed to apply it. In the last few years people in the NLP and related domains started to use the word semantics in the context of word embeddings and document embeddings. Open data became more important and suddenly more people started to realise ontologies can be based or enriched by machine learning approaches.

Some of the more exciting things I have seen out there:

  • the improvement of graph data bases such as Neo4J and Amazon Neptune (itself connected to other interesting services)
  • the spread of Query and to a lesser extent Gremlin
  • the appearance of some tools such as Owlready for interaction with the Python sphere
  • the maturity of resources such as DBPedia, Wikipedia and a pletora of projects connected to these
  • the improvement of algorithms for graph manipulation

There are lots of interesting challenges that we need to tackle now in order to solve real life problems. One of them is how to optimise versioning and automatic growth of ontologies with external resources, how to protect personal data in these systems and how to represent ever more complexed relations, specially those explaining “stories”.

I believe this is where we need to explore at a very abstract and then concrete level what I will call high order syntax. Linguists have worked for millenia on syntactic problems. Software specialists have worked on the syntax of programming languages for several decades now. Likewise ontology experts have been developing ever more complex frameworks to express n-nary relationships. Now we need semantic theories that help us tell and manipulate stories with data. And then we will need to spread the knowledge.

Currently there are lots of people working in the NLP spectrum who do not have an understanding of syntax in the linguistic sense and they also lack knowledge about syntax in the sense of semantic languages such as OWL. They talk about “language models” when they are experimenting with parameters for optimising transformers for this or that recognition of text and images. Computational linguists and semantic specialists are needed in order to develop more comprehensive frameworks so that digital systems can somehow tell or recognise stories and, more importantly, react upon them in a reliable way.

I recommend two newish books for those interested in semantic technologies: Knowledge Graphs, Fundamentals, Techniques and Applications, by Keyriwal et alia (2021) and, a little bit more mundane but still interesting, Ontologies with Python by Lamy Jean-Baptiste (also 2021).

Readings of last year

I did not blog for a while. Last year I read a few books that I found productive, inspiring. On the professional side I read mostly blog posts and specific sites on machine learning. On the general side I found these books particularly compelling:

How we learn by Stanislas Dehaene. This is the second book I read by this French neurologist. The first one was about the brain and mathematics. In this book Dehaene goes more into how learning processes happen in the brain and he does some useful comparison between our neural networks and the very primitive networks used in machine learning.

Gates of Europe by Harvard professor Serhii Plokhy presents a comprehensive history of Ukraine. This is really fascinating reading for those interested in the whole debate about history and politics of the Poland-Ukraine-Russia-Belarus region.

Tussen drie plagen by Jaan Kross: I have not finished this yet, it is over 1200 pages long, three books in one. Jaan Kross was a famous Estonian writer. He wrote this tetralogy between 1970 and 1980. It deals with the life of Balthasar Russow and the Livonian war. I am reading the Dutch translation of the book, which got a prize for the best translation from Estonian into another language in 2020. It is an amazing book also for those interested in that corner of the world. Trivial but not so trivial fact: back then you could travel in winter by sleighs pulled by horses from Tallinn to what later became Helsinkin. Nowadays that part of the Baltic Sea does not freeze like that at all. You learn a lot about the relations with the rising empire of the Muscovites, with the Swedes, with the Germans and also a bit about other Slavic groups living in areas constantly fought for between Germans and Poles.

Jaan Krooss in 1938 Image

Indo-Dutch fiction

One of the last fiction books I read last year was Alfred Birney‘s De Tolk van Java. As far as I know the book has only appeared in Dutch – I am writing this 1 January 2020- but an English translation will be available in a few months and the translation will be The Interpreter of Java. It won the Libris prize for best Dutch novel in 2017 and there are good reasons for that: it has a compelling story, the character description is impeccable, there is a wealth of interesting information on the history and culture of Indonesia and the Indonesian community in the Netherlands and the use of the Dutch language is superb.

Dutch army fighting back pro-Independence Indonesians (Wikipedia)

The book is about the complex relation between an “Indo”, a half-Indonesian, half-Dutch man, and his children, in particular one of them. Chapters switch from a narrative told by the father and those told by one of his children. The father was born in Java of a Chinese-Indonesian mother and an Indonesian-born Dutchman. Chapters switch from his life during the Japanese occupation of Indonesia to that in the Netherlands, back to Japanese occupation and the war of Independence. Things keep going back and forth between those sections and the traumatic experience his children and wife had with him in the Netherlands.

For me it was also interesting to connect some of the facts and impressions I learnt about the Japanese occupation of Indonesia with those I read four years ago in Richard Flanagan’s The Narrow Road to the Deep North. Styles and characters are different but both books helped me to puzzle together a bit of the history of a region we do not know much about in the Americas or Europe. Both books are fiction but the historical backgrounds they portray seem to be very accurate, as far as I have been able to judge by delving into a few reference sources here and there. There are many layers in De Tolk van Java: racism in both the Netherlands and Indonesia, identity, ethnicity, family violence, love.

A bit more on the brain

One of the latest books I read on the brain was Hjernen er stjernen by Norwegian neurologist Kaja Nordengen. The book came up in Norwegian back in 2016. The literal translation is “The Brain is a Star” but it got an English translation as “Your Brain is a Superstar”. I find both titles a bit corny but the book is a very pleasant reading. It has a lighter style than Swaab’s books I mentioned some months back and it is rather short. There will surely be a few things laymen interested in neurology would know – never mind neurologists- but most, including me, will learn a few things. I am still not sure about what Nordengen wrote on some specific food items and brain health but I will keep on reading on this very topic. At the very least she woke my interest on the relationship between neuron activity and food (beyond what we read about nuts, omega 3 and the like).

Instagram and Python

There is this article in ZDNet about how Instagram, a heavy user of Python, is trying to push for some processes to tame the Python code mess. I find their ideas quite sensible. I am still amazed at how a programming language has taken so much time to develop best practices and mechanisms for big production time.

If you compare it with Java, which is a bit “younger”, the contrast is huge.

Another thing I have been thinking lately is: why do we have so little to show when it comes to version control and general deployment of machine learning models?

Let’s hope there is more movement in the next couple of years.

I

French fiction

I had not read French literature in months and I finally got to read Au revoir là-haut by Pierre Lemaitre. It was an excellent book: there is a good plot and, above all, language and character descriptions are superb.

It also helps you understand a bit of France at the end of World War I and what came shortly afterwards.

Here you can watch a trailer to the film, which came up in 2017.

Rusia de cabo a rabo

He leído dos libros interesantes sobre Rusia. El primero es A History of Modern Russia del historiador británico Robert Service y el segundo es un libro del periodista noruego Øystein Bogen, Russlands hemmelige krig mot Vesten o, en nuestro idioma, La Guerra secreta de Rusia contra Occidente. sobre la guerra híbrida que lleva a cabo Rusia en Occidente. Ambos dan una buena perspectiva sobre lo que ha ocurrido en Rusia. El primer libro comienza a la entrada del siglo XX y llega hasta 2014. El segundo, aunque tiene muchas referencias a los tiempos soviéticos, se enfoca en la guerra híbrida de Rusia de los últimos tiempos (pero explica claramente cómo se formó la FSB y toda la trama de los siloviki).

Boris Pasternak and family…one of those great writers who could not publish as he wanted

Another NLP/Text analytics by O’Reilly

I got the Applied Text Analytics with Python by Benjamin Bengfort, Rebecca Bilbro and Tony Ojeda. If you have some NLP experience a few of the chapters will be old stuff.

That was the case for me. Still, there were some pieces from which I learnt a bit:

  • custom corpus preparation
  • some about text visualization and graph analysis of text and
  • scaling text analytics with multiprocessing.

The creative brain

I just finished the second book on the brain by Dick Swaab. I do not think the English translation has come out yet but if you read Dutch, here you have a reference about it.
This time Swaab focuses, as the Dutch title indicates, on creativity on a general scale: how it is generated, how it is shaped by our genes, by the way children develop in their mother’s wombs, by every single thing that happens to them, to us, until our deaths.

The book is written with the usual dry Dutch humour. It has lots of references to the ailments and quirks of artists, scientists and other people.

The book can become very technical but for someone really interested in the brain, it is a trove of information. It has very good references and a really complete index.

The only thing I wish could have been done better is the images on the different brain parts – there are several small pictures mostly at the start and at the end of the book. Thinking about this I thought it could be good to have something like an online search where you can enter terms about the brain and a 3-D-like visualization rotates, gets zoomed in o or out. That would be the perfect addition for this book. But perhaps I am asking too much. It would be a nice application, though, for people like me, a layman, who want to understand a bit more about our brains.

Testing Python

Professionally I started work using C++ and from there started to go more and more into Java and from there, without leaving it, moving more and more to Python. One of the things I discovered with Python was how little there was on structuring compared to Java. The other thing I found was that I found less information on possibilities for testing than in the Java world.

Percival’s book Test-Driven Development with Python has been an excellent help to getting fast on track using more than basic possibilities for proper testing in this language. A great thing this book does is to help you get a Django project up and running.

Take a look at the book’s site here. It is really good.