AI & Chatbots
Umajin started using AI and machine visions systems to build visual chatbots back in 1998 while. The idea of adding vision, personality and character to make human computer interaction more fluid has always been an important goal. The voice recognition back in 1998 wasn’t great, but having an animated character talking really helps people connect. We did a lot of work on educational materials for kids and the difference was literally the kids participating and learning vs getting bored and checking out.
For a decade or so visual agents were only being considered in the learning and entertainment markets – but it’s very exciting today to have companies of all types engage with us to create these experiences.
The two major innovations in AI over the last decade is the way we can now train models from huge amounts of unstructured data without human effort and that we can optimize these models against themselves (kind of like evolution, breeding a stronger Chess/Go player).
This allows for a growing range of hugely practical but extremely focused use cases – and we are seeing more emerge every day. It might be removing noise from a photograph, detecting cancer cells in a mammogram or recognizing a face in a photograph.
These are each extremely specialised tasks which can automate small jobs that might have required people to do previously.
Now AI is still very special purpose and inflexible. But when it comes to the task it is trained for it can access almost unlimited amounts of prior knowledge and is utterly relentless in performing the one task it is trained for. This is the secret to how AI can do things we don’t expect. An expert human oncologist might be able to have a good intuition looking at a photograph of a skin lesion and determine if the patient needs to go for a biopsy – but an AI can consult 4 million previous know photographs and biopsy results as well as factoring this against the patients age, gender, weight, height, cholesterol and many other factors. The AI also doesn’t get bored or tired and can do this constantly, improving its analysis over time as its database continues to grow with use.
We are still a long way from Strong AI, or computers thinking for themselves, but there are two very exciting things happening over the next decade.
The first is smartphones and computers adding new custom silicon designed just to run these AI models. This will hugely increase the speed and reduce the power usage of running AI tasks on our devices. When you combine this new runtime capability with the big players offering off the shelf cloud based solution to train AI models we will see AI tasks become ubiquitous. This will include things like improved digital agents and with smartphone cameras being able to understand much more of the world around us.
The second is incremental improvements to our basic AI systems, not just making them deeper, but making them slightly more flexible. For example Geoffrey Hinton, one of the very formative AI researchers has just introduced the concept of embedding layers within models (capsule networks) which will make models more resilient and accurate. Incremental doesn’t sound exciting, but once an AI model can eventually do a specific task better than a human – it becomes revolutionary!
Visual Digital Agents
As discussed computer systems for machine learning (pattern matching) are just getting better and better. It’s amazing how well a computer can match words I speak with potentially millions of sound/word pairs that were used to train its model. Unfortunately while computers might be able to match my words, they are still very poor at understanding what I actually mean.
Pattern matching concepts can be applied to tasks like extracting the parts of speech, so we are making baby steps (literally). But computers don’t yet have systems for bad grammar, common knowledge, empathy, context, humour, irony, metaphor, implication, inference and so many other aspects of communication that we use naturally as people. The result is that AI is good for giving commands (“what is the time?”) – but for general conversation, it is embarrassingly poor.
There are several challenges with chatbots we set out to solve by adding visuals. Not just an animated character but a useful interface for discovery and reviewing content. This really unshackles the limitations of traditional chatbots. It would be a very frustrating experience to try and select a new shirt via a voice interface, both trying to describe visual objects by voice and then asking you to listen to long lists of options to make choices.
Because we mix the benefit and fluidity of speaking and people’s ability to quickly review huge amounts of information visually it’s easy to see a list of shirts, pick one by speaking its name, then review any elements you can customize like the color or size. As a result we are using visual chatbots for conversations around purchasing, learning, following processes, customizing things and controlling things.
We have a whole visual design to help customers discover what they can do and to make their choices more clear. This actually makes the AI seem much smarter as it can suddenly handle synonyms and mispronunciations. E.g. If I have a list of 10 things on screen and I say “ate” the chatbot can work out that I mean “pick number 8”.Some great examples of chatbots include;
- A digital concierge to help hotel guests control their in room experience, book restaurants and shows even organise parking or travel
- A digital stylist to help customers of a fashion brand select items, manage their wardrobe, pick outfits using what they have on hand, provide options and advice on hair, makeup and accessories while providing social sharing/voting capabilities for your friends
- A digital chef to help customers of an organic food brand manage their pantry, weekly meal plans, allergies, shopping list, re-ordering – even step them through making recipes including skill break outs like video instructions on how to julienne carrots or make a white sauce