Final Vision

After weeks of research, development, and prototyping, post-Christmas we finally knew exactly what it was we wanted to build. As I’ve discussed loosely in my previous posts, we took the title of the brief “The Matter of Immaterial” and coupled its meaning with Arthur C Clarke’s famous quote

Any sufficiently advanced technology is indistinguishable from magic.

Following our research into divination and related topics we knew we wanted to build a device that literally blurred the lines between magic and technology through the use of metaphor. In specific, our technology was AI, Big Data, the ubiquitous intelligent systems that are ever so quickly becoming a driving force of our every day lives.

These technologies, as new and ground-breaking as they are, it is hidden from the general population. Abstracted from people in order to make everything seem as smooth and natural as possible. AI may still feel like a long way off, but systems like IBM’s Watson are already being applied and utilised through many different industries and sectors of life, from medicine (Deloitte, 2015) to automobiles (IBM, 2016).

Such abstraction keeps these systems out of view of people, removing the need to understand them, perhaps leading back to Clarke’s idea of “magic”. Our idea stemmed from these ideas; the idea to present AI systems as a form of magical device, utilising the research we had done into divination and building a device that felt as if one was communicating with an ethereal being through almost religious-like ritual.

AI, technology, is our ethereal being, and people will communicate with it the same way as seers or diviners would once have contacted gods or demons.

And it works!

As of this week, we have finally finished a project we set out on many weeks and many designs ago.

Aesthetic

Of course, following our research, we knew we wanted to build a font in the style familiar in churches and to provide something around which people can congregate in order to have their discussions. Fortunately, we were able to acquire a wooden plinth onto which we placed half of our technology. The height and size of the plinth was perfect for what we envisioned, it was a great place for people to have a discussion around.

Screen Shot 2017-01-13 at 17.31.52We also wanted an ethereal feel to the aesthetic, something which could easily be associated with magic, divination, or the unknown. Thus we opted to use a pair of ultrasonic water misters to produce a misty, smoky effect onto which we were able to project our visuals. We felt that there would already be a strong association between this effect and otherworldliness. After all, magic or illusion is often referred to as “smoke and mirrors“.

Focussing on the plan to build a device indistinguishable from magic, we knew we had to remove all technology from sight or else risk ruining any sense of illusion. We hid half of the technology – the half responsible for controlling the lighting and ultrasonic misters – inside the plinth itself underneath a hard foam case, onto which our water bowl and misters could be placed.

Above the font, we needed our projector. Fortunately, we had a plan to use a simple lamp shade to hide all of the technology behind. Whilst the shade worked and the technology was hidden from view, it was certainly not subtle and perhaps ruined the illusion slightly, drawing many users’ eyes to it rather than the font. Perhaps in a gallery setting or even in a real-world application setting this technology can be better implemented in the same way many conventional projectors are, inside the ceiling.

IMG_0819 IMG_0812 IMG_0832 IMG_0834 IMG_0842 IMG_0844

Function

The project was also a functional success when compared to our initial ambition.

We had aimed for the font to always be listening to users’ conversations, analysing them for hidden meanings, topics, and conversational effectiveness (which I went into greater detail on in a previous post), and present the results of this analysis back to the user in a number of ways for a number of reasons.

We wanted the font to take the topic of your discussion and inject additional context or information into the conversation. We wanted the font to tell you when your conversation is ineffective, thus giving you the ability to reel it back in and communicate more effectively as a team.

And we achieved all of that!

The font listened to our group discussions, found out what we were talking about, and displayed additional information in the form of hazy, yet precise tidbits of information seemingly appearing in the mist. magical!

It also told us exactly when we were having an ineffective conversation through effective use of light and colour, allowing us to actually improve the way we were speaking in an almost game-like attempt to make the lights go green again.

All of this was happening whilst we also stored our whole conversations and their analysis into a database for future use. More on that below though!

Technology

I have gone into some detail about the technologies we employed for the font at different points in previous posts, but here I wish to give a more comprehensive overview of the whole system.

The Server

From the beginning we knew were going to need some kind of server-side application that would act as the “brain” of our entire system. One major reason for this was to allow for implementation of third-party APIs. Almost all APIs, including IBM’s Developer Cloud, require an API key in order to access their services over the internet. However, it is becoming increasingly common for these services to block client-side (i.e. browser-run) scripts from making access requests.

Knowing we needed a robust, scalable server-side system, and knowing from our research that IBM’s Watson Developer Cloud was easily accessible as an NPM package, we opted for a Node.JS system with a plethora of additional node modules to keep our system robust, tidy, and flexible. It was also extremely easy to get our Node system onto a free hosting server provided by Heroku.

Screen Shot 2017-01-15 at 19.44.45We built a RESTful API service in Node for effective communication between our browser “front-end” and our Node system over both HTTP/HTTPS and Websocket protocols.

We took advantage of a MongoDB database for its accessibility in a javascript-based environment thanks to its JSON-based data handling & storage. This database was what we stored all of our conversations – and their analysis – into.

Of course one of the biggest sections of our Node system was our utilisation of the Watson Developer Cloud. I went into greater detail on this in a previous post, but the summary is that these systems allowed us to do real-time analysis on conversations as they were happening around the font, extracting personality insights and topical conversational data.

In addition to our API, we also set up our own websocket client running as a part of our node server, to ensure full control and interception of all communication throughout our system.

Without getting into too much detail, that is the overview of what our server application was designed to do and did very well.

The Font – Plinth

While the server side of things are quite complicated, the rest of the project was technologically rather simple. Inside the plinth, we had a raspberry pi connected to WiFi running its own Node Server with an installation of Johnny-Five, the Javascript Robotics engine.

Using Johnny-Five, we were able to control an Arduino (running a modified Firmata), directly from a javascript Node environment. The advantages of running micro-controllers such as Arduinos in this setup is that you have access to the whole NodeJS environment, which is much more expansive and documented than the Arduino code libraries whilst also being a web technology. We were therefore able to extremely easily and quickly implement additional technologies, particularly websockets, for integration with the rest of our project’s systems.

With the technologies in place, we were able to control an RGB LED strip using input received through our websocket, which would light in different colours and pulse in different speeds depending on the pace of topic changes and the effectiveness of communication in conversations.

And the whole system could be turned off, on, and edited through handy SSH from our own laptops directly into the Pi, without the need to remove it from the plinth.

The Font – Projector

The simplest part of the project was our overhead projector system. We had a second raspberry pi running a Chromium browser instance in kiosk mode, displaying our relatively simple front-end via connection to our Heroku app. The front-end communicated with our server application, and received the topical conversation data and displayed additional related information from the Microsoft Azure Bing Search API, later to be replaced by AI-gathered information from our own database.

Again, we could control this setup remotely via SSH, freeing us from having to dismantle any part of the installation to access the technology.

Issues & Takeaways

As with all technology projects, the development period for the font was not without issue.

We had issues with keeping Raspberry Pis available, as they would often crash without warning or change IP address (which we had no control over due to the network we were using, unfortunately). Perhaps in future it would be worth setting up our own WiFi so that we can enable static IP addresses, and be more aware at the fragility of the Raspbian operating system, as we believe the Pis crashed fairly often due to corruption.

Bigger problems, however, were much farther out of our control.

Firstly, the IBM speech recognition (speech-to-text) system that we were using proved extremely inefficient at recognising complete sentences properly in any accent besides a really strong, almost stereotypical American accent. We, with our various English accents, often proved difficult to understand for IBM’s Watson, thus reducing the effectiveness of our project immensely. We toyed with other voice recognition systems but no other system fit our needs quite as well as IBM’s and, in fact, none seemed much better at recognising accents. The only system we found that was capable of different accents was Google’s speech recognition cloud service, but hadn’t realised until it was too late that we could integrate it into our Node project.

Secondly, we had issues with IBM’s developer cloud being geared towards commercial projects and enterprises. Their system, while free and accessible to anyone, is limited to only 1000 access “tokens” over a 24 hour period. We initially assumed this meant 1000 requests per-account per-day, but after attempting to implement the AlchemyNews API into our project, found that actually a request can cost multiple tokens.

We were confused for some time how we were making 1000 requests a day in some cases, and after trying to use the News API with a request for stories 60 days in the past we were blocking API keys after just one request. We found out, after this, that such requests cost way over the 1000 daily limit, with our other requests likely costing up to 10 tokens each. When making 4-6 requests a minute, and without the funds to pay thousands of dollars a month for proper access to the services, our only solution was to create a Watson account each and be conservative with the API keys we had.

Finally, the biggest issue we faced and one that could either have been averted or was never going to work in the first place was to do with changing topic. We decided, based on our research, that “if a conversation changes topic too quickly, it is an ineffective conversation.” This felt about right, after all how can one communicate an idea effectively when you’re changing topic too fast to do so? The problem was that when I came to implement this feature, I realised I had no idea what constituted changing topic “too quickly.” I had no sources to help me decide and research was turning up a blank, after all conversation is not a simple thing, and people have spent their whole careers researching what makes conversation effective.

In the end, I wrote code that took the topic’s relevancy score, and if that score dropped too fast over subsequent analysis, then the topic was “changing too fast”, and our light system would flash faster as an alert. The aim was that this way, if the members of the conversation didn’t wish to change topic, they would be alerted and would be able to get back on topic, whereas if they had purposefully changed topic they could simply ignore the alert.

Overall, I’ve learned an extremely impressive amount over the past couple of weeks, especially in the technology side of the project. I had always stayed a little bit clear of using NodeJS in projects due to its seemingly bloated nature, but having used it extensively I now know that the “bloat” is in fact flexibility, allowing developers to pick and choose modules and tools that they need, and there’s one for almost any situation, allowing for extremely fast and efficient development.

Elsewhere, though, I am most grateful for the opportunity this project gave me to explore closely the relationship between AI data systems and the every day lives of the tech-accessible world. Using IBM’s Watson as extensively as we did has given me a real appreciation for just what AI in the mid 2010’s is capable of, and it is something I’m sure I will be looking back on in years to come as “something I helped work on, just a little bit”, knowing that through using Watson, I was also teaching it to better itself.

References

Deloitte. (2015) Disruption ahead Deloitte’s point of view on IBM Watson. [Online] Available from: https://www2.deloitte.com/content/dam/Deloitte/us/Documents/about-deloitte/us-ibm-watson-client.pdf [accessed 13 January 2017]

IBM. (2016) Hello, OnStar — Meet Watson. [Online] Available from: https://www-03.ibm.com/press/us/en/pressrelease/50838.wss [accessed 13 January 2017]