The Makings of an Art Thesaurus
Updated: Jun 12, 2018
Last time we looked at the five most “similar” words to a specific word from corpus of 50 art reviews. Using the same 50 art reviews as our database of words we have now created a more interactive tool which lets you compare the “similarity” of two words and outputs a score between -100 and +100.
VIEW THESAURUS ----- link to labs.artimbarc.com
The dynamics are similar to our previous experiment. We are using the GLoVE vector representation of words to convert our database of words into vectors. Then we find how “similar” two words are by finding the distance of the vectors, in our case we have used the cosine similarity. This calculates the angle between the two vectors. We can think of two words being very similar if their vector representation have similar orientation which translates to a small angle. In the limit, when the vectors have exactly the same orientation, the angle will be zero and cosine of zero is 1 (in our tool we have scaled this up by a factor of 100). When the angle is 90 degrees, the vectors are said to be “orthogonal” and most “dissimilar” or uncorrelated and cosine similarity is 0. Finally when the angle is 180 degrees the vectors have exactly the opposite orientation and the cosine similarity is -1.
Taking a minute to try to understand what this means for our tool when comparing two words we need to understand a bit about how the GLoVE vector representation was calculated. An in-depth explanation is outside the scope of this post but the intuition is as follows. For a given input word e.g. “surrealism”, the algorithm looks at the words surrounding it in many different examples and then outputs estimates of the probability of having every other word in the database surrounding it, the vector of the word is made up of these probabilities. So for our example of “surrealism”, words like “tractor” or “coffee” are going to have low probabilities whereas “art” and “painting” will have higher probabilities. Equally, taking another example “impressionism”, this will also have low probabilities for “tractor” and “coffee” but high for “art” and “painting” and so the vectors may have similar orientation and we will get a higher score in our tool!
VIEW THESAURUS --- link to labs.artimbarc.com
We are interested to explore the possibility of using this kind of tool to make art more accessible in the same way that a classic thesaurus is used to expand one's vocabulary, could we use an algorithm and the vast resources of art-related text out there to create a tool that helps visitors of cultural institutions expand their knowledge? We hope so.