Category Archives: machine learning

Semantics revisited

We have seen many “topics” or subdomains becoming more or less popular in the wide range of things considered “artificial intelligence” in the last decades. Even the term artificial intelligence became a bit of a term to avoid when presenting research proposals twenty years ago. Neural networks have a long history and yet ten years ago most people were not hearing about them.

Now we see a new come back: that of semantic technologies. We used to talk a lot about semantics in the nineties and in the early part of this century. Still, the concrete approaches did not became scalable. Other approaches to data management took hold.

Still, slowly but surely, semantics started to appear again, even if not so well understood by many who were supposed to apply it. In the last few years people in the NLP and related domains started to use the word semantics in the context of word embeddings and document embeddings. Open data became more important and suddenly more people started to realise ontologies can be based or enriched by machine learning approaches.

Some of the more exciting things I have seen out there:

the improvement of graph data bases such as Neo4J and Amazon Neptune (itself connected to other interesting services)
the spread of Query and to a lesser extent Gremlin
the appearance of some tools such as Owlready for interaction with the Python sphere
the maturity of resources such as DBPedia, Wikipedia and a pletora of projects connected to these
the improvement of algorithms for graph manipulation

There are lots of interesting challenges that we need to tackle now in order to solve real life problems. One of them is how to optimise versioning and automatic growth of ontologies with external resources, how to protect personal data in these systems and how to represent ever more complexed relations, specially those explaining “stories”.

I believe this is where we need to explore at a very abstract and then concrete level what I will call high order syntax. Linguists have worked for millenia on syntactic problems. Software specialists have worked on the syntax of programming languages for several decades now. Likewise ontology experts have been developing ever more complex frameworks to express n-nary relationships. Now we need semantic theories that help us tell and manipulate stories with data. And then we will need to spread the knowledge.

Currently there are lots of people working in the NLP spectrum who do not have an understanding of syntax in the linguistic sense and they also lack knowledge about syntax in the sense of semantic languages such as OWL. They talk about “language models” when they are experimenting with parameters for optimising transformers for this or that recognition of text and images. Computational linguists and semantic specialists are needed in order to develop more comprehensive frameworks so that digital systems can somehow tell or recognise stories and, more importantly, react upon them in a reliable way.

I recommend two newish books for those interested in semantic technologies: Knowledge Graphs, Fundamentals, Techniques and Applications, by Keyriwal et alia (2021) and, a little bit more mundane but still interesting, Ontologies with Python by Lamy Jean-Baptiste (also 2021).

Generative Deep Learning

David Forster’s book Generative Deep Learning offers a general view on generative methods for several domains in ML. This is not a book for state-of-the-art recipes on generative DL for the domains you are already working into. The field is evolving very fast and one will surely want to check out the usual data science blogs and above all play with the most popular Github projects in one’s domains. But if one wants to have a general view of how generative DL is used in other ML domains and one likes sound mathematical explanations, this is a very good book.

I come from an NLP world and I became curious about the parts of this book talking about painting and composing. As on my free time I love drawing, I got a very nice introduction here on how we can let machines draw a bit. The book lets you above all think about how to work on solutions combining generative ML for different domains.

Another Pac(k)t

OK, terrible pun. I was not inspired. Anyway: I am going through Natural Language Processing with TensorFlow by Ganegedera and although I find Packt books often are published in a bit of a rush, this seems to present a neat introduction to Tensorflow as related to NLP. There are some simplifications on NLP and all, but it does present a good introduction to TensorFlow in this area.

Hands-On ML

I have been reading Géron’s Hands-On Machine Learning with Scikit-Learn & TensorFlow. The title says it all. It is a good review on ML topics but above all it is very practical and shows with real examples how to implement ML solutions, from the traditional to the deep learning sort. It is rather thin on theory but that is fine as we have seen other books that tackle that very well already but leave practicalities outside.

ML books

I read Lantz’s Machine Learning with R. It is a rather decent introduction to what it says. The first chapters are an introduction on R and a tiny one on ML. Then it goes into several classification algorithms. There are some chapters on clustering, decision trees, the usual stuff. I got most profit from the chapter on neural networks and the final topics on model performance. I wish the book had gone deeper into some R packages, but I suppose the author wanted to keep it open for those who were just getting into ML.

Now I am delving into Goodfellow’s Deep Learning and I am enjoying it. First I was just trying the on-line version, but I got now the paper one: this is a good reference, with the right amount of theory and practice.

Python and machine learning

I have been mostly using R and languages such as Java/C++ for machine learning, but I am learning how to do more things with Python for that. I am currently exploring Python’s scickit-learn library. It’s kind of cool. As support material, I am going through the examples of Python Machine Learning by Sebastian Raschka.

So far, so good. Mind: the book is not an introduction to ML, even if it offers an overview of topics – there is too much out there-. It is rather a practical book about ML using Python. It helps if you have already used the scikit library.

Some of the chapters I like the most are the one about data processing and dimensionality reduction. I still have to reach the one about Theano.

The blog

Thoughts about computing and natural language processing