Conversational AI has been on the market for a while now, after all, big companies have come to the conclusion that voice can be denied and personal assistants answer certain questions. We can call this a conversation somewhere, because the communication itself is between human and machine, but there is no real deep conversation yet.
In fact, there is not much difference in basic technology between AI for chat and today's digital assistants. Similarly, the BERT language model (two-way representations of Google's encoders) may be the main one, that is, it does not change, but the true conversation requires more productivity. Not much because digital assistants have little time to answer the user's question, so no matter how quickly the server connected to the device solves the deduction on a neural network, let alone continuous training. Effectively, however, existing technology would be quasi-good for conversational AI, and productivity is not yet available.
NVIDIA has just announced several open source technologies that are essentially addressing the issue. Accelerating the training phase is fundamental, although existing neural networks can be used for conversational AI, but the best solution is the specific task that makes it difficult to avoid continuous training. This, in turn, is demanding and not a little, but a lot. According to NVIDIA, 92 DGX-2H servers deliver good results in just an hour. The point here is that the company has developed an innovative multi-graphic approach to language learning models. This is most important for working with larger models than usual since the typical problem with GPUs is that they do not have enough memory to handle a multi-parameter language model. They just use a processor that provides much more memory but is rather slow. However, NVIDIA's innovations can parallel the language modeling tasks and really accelerate the learning phase. Of course, we're talking about a configuration with power consumption of more than 1 megawatt. Not a big miracle can be expected in the future, but obviously companies that think of different services are aware that the hardware needed for training is quite extreme and of course expensive.
More interesting is the second phase of the problem, the deduction, because so far machines can build strong machines, the really big problem is the continuity of conversations. Such AI has a standard 10 milliseconds for processing tasks based on a trained neural network, because if the workflow takes longer, it can close the conversation. Simply put, the machine does not react fast enough, which can be uncomfortable on a human scale. NVIDIA optimizes BERT with TensorRT 5.1 for faster processing, so the Tesla T4 accelerator system can run in just 2.2 milliseconds.
Not only can the above be useful for AI for chat, but it can even improve search engine performance, the company also announced that Microsoft is rolling out Bing development.