Traditional Culture Encyclopedia - Hotel accommodation - What are the important points in building an intelligent voice interaction system?

What are the important points in building an intelligent voice interaction system?

Preface

With the proposition of artificial intelligence, a large number of call center service providers and integrators based on artificial intelligence have emerged in recent years, and nearly 1 companies are promoting and operating the module of intelligent outbound only. It can be said that the whole market based on artificial intelligence technology has begun to flourish.

briefly introduce what is an intelligent voice interaction platform. In fact, the truth is to integrate ASR, TTS and new call service platforms on the basis of call centers.

so how do we build our own intelligent voice system?

Let's first list the technologies and services needed to build an intelligent outbound system:

Personally, I think:

[if! Support lists] [endif] The first and most important thing is the switch:

[if! SupportLists]1. [endif]PBX is also a switch, and the original commercial equipment includes production hardware switches such as Huawei, Avaya, Cisco and Donghui,

[if! SupportLists]2. [endif] There are also software switches such as FreeSitch, asterrisk and OpenPBX.

[if ! Support lists] [Endif] followed by AI technology: and three technologies including speech recognition, semantic understanding and speech synthesis are the core components. Speech recognition is equivalent to a person's "ear". After receiving a phone call, the speech of a person is processed and escaped into data that can be recognized by the system, and the system processes it to recognize it. Further, it can be escaped as text. Semantic understanding is equivalent to people's "brain", and people's intentions are recognized according to words. Speech synthesis is equivalent to a person's "mouth". After recognizing people's intentions, they reply and guide the dialogue according to a specific way of answering.

[if ! Support lists] [Endif] Then there is the front-end service platform: that is, the website where users log in, configure call processes, establish call tasks, count call data and export call reports. This is the only interface that end users can see and operate.

[if ! Support lists] [Endif] Finally, outbound lines: including the three major operators and other small integrated line providers, whose main purpose is outbound calls or access calls.

Some people may ask, "Isn't artificial intelligence the most important thing in an intelligent voice interaction system, and what does it have to do with switches? "Why do you say that the switch is the most important thing? The reason is that no matter whether we are making an outgoing call or accessing the phone, we need the front-end service platform to send the outgoing call request to the switch and dial it out through the outgoing call line. In other words, the switch is to control the overall outbound situation. Hardware switches, such as Huawei switches, cost tens of thousands to millions. For those who want to build their own intelligent voice interaction system, the price is unaffordable for some small companies, and FreeSitch, a soft switch, greatly facilitates small companies to build their own intelligent voice interaction system.

what is a FreeSwitch?

Freestitch is a softswitch solution for telephone, including a softphone and a softswitch to provide voice and chat product drivers. FreeSitch can be used as switch engine, PBX, multimedia gateway and multimedia server. Support a variety of communication technology standards, including SIP, H.323, IAX2 and GoogleTalk, and can also be easily connected with other open source PBX systems. But also has strong flexibility. It aims to provide routing and interconnection communication protocols for audio, video, text or any other form of media.

typical functions of p>FreeSwitch

[if! Support lists] [endif] Online billing and prepayment functions. ?

[if ! Support lists] [endif] telephone routing server. ?

[if ! Support lists] [endif] voice transcoding server. ?

[if ! SupportLists]·[endif]] a server that supports resource priority and QoS. ?

[if ! Support lists] [endif] multipoint conference server. ?

[if ! Support lists] [endif] IVR, voice notification server. ?

[if ! Support lists] [endif] voicemail server. ?

[if ! Support lists] [endif] PBX application and soft switch. ?

[if ! Support lists] [endif] application layer gateway. ?

[if ! SupportLists]·[endif]] firewall /NAT traversal application. ?

[if ! Support lists] [endif] private server. ?

[if ! Support lists] [endif] SIP internetwork gateway. ?

[if ! Support lists] [endif] SBC and security gateway. ?

The most typical function of p>FreeSwitch is to serve as a server and connect to it with telephone client software. Although FreeSwitch supports many communication protocols, its main protocol is SIP, which initiates a session protocol through SIP relay.

The advantage of using FreeSwitch as a soft switch is that you can set up your own outgoing call center at any time with only one server, and FreeSwitch supports cross-platform operation. It can natively run many 32/64-bit platforms such as Windows, Linux and BSD.

FreeSwitch uses a thread model to handle concurrent requests, and each connection is handled in a separate thread. Different threads mutually exclusive access to * * * shared resources through Mutex, and communicate through messages and asynchronous events. FreeSwitch itself is relatively stable, and it is an excellent open source software. On the other hand, FreeSwitch is more radical, and there will be a lot of new features added in its development branch, so it is easy to be unstable when the test is not comprehensive. In the case of production environment, the stability of the system is the key to whether the system can be used normally. Before, in the process of doing the project, we encountered some unstable situations of FreeSwitch, which led to unsatisfactory outgoing calls. For example, when we make a test call, the voice call is intermittent. Although the front-end service platform can well receive the data transmission, there will be various communication obstacles when we really communicate with people. In order to solve this problem, we spent several months studying the structural characteristics of FreeSwitch. Finally solved this problem. Our project can continue to be promoted, and finally it can be truly deployed and implemented.

Some people may have a question: "Although FreeSwitch soft switching is important, isn't artificial intelligence important since it is an intelligent voice interaction system? ",important, of course! Let me talk about it slowly ~

AI technology

1. Communication principle

First, briefly explain the normal call process

Process: A→PSTN→B

Explanation: PSTN is a public

switched telephone network, which means public * * * switched telephone network, that is, the VoIP of our operators,

. : Person A called the call center 1***6 and heard the recording after dialing. Hello, to dial the manual station, please press the key. After pressing the key, a blind tone appears. After the real connection, the customer service connected the phone.

process: A→PSTN→PBX→IVR→ customer service

explanation: PBX is also called a switch, which is equivalent to the entrance and exit of the whole call center.

IVR is also called interactive/interactive voice response and voice navigation, which is equivalent to consulting business. Please press the button. This link is diverted to customer service according to the business.

How is the intelligent voice interaction platform (intelligent robot) implemented in specific business scenarios?

For example, "Person A wants to reserve a seat in a large hotel".

After dialing, A first hears the voice, "Hello, I'm robot Xiaoyue, can I help you reserve a seat?

Person A said, "I don't want to talk to the robot, find a real person".

then I heard the recording, "I'll transfer you to expensive real-life customer service, waiting in line, please wait a moment".

after a few minutes, it was connected, and the real customer service answered the phone.

process: a → pstn → PBX → IVR (TTS → ASR → NLP → TTS) → ACD → customer service

explanation: in the IVR part, it is no longer necessary to prompt the button, but directly ask the caller what business it needs to handle, and then after recognizing the voice and understanding the intention, it will be transferred to the corresponding business queue according to the user's needs.

the above is the connected process, and the outgoing process is the opposite, so I won't go into details.

2. Application of AI technology in the market now

At present, ASR, TTS and NLP in the market are occupied by giant companies such as Alibaba, Baidu and Iflytek, and these technologies have basically become a foregone conclusion in China. Most of the engine markets like ASR use Alibaba Cloud and Xunfeiyun, or Baidu Cloud. Alibaba Cloud Hexun Feiyun's recognition rate is higher, which can reach about 97%, and Baidu's is worse, and the recognition rate is about 8%. We chose ASR to do the test when we were doing the project. It turns out that Alibaba Cloud has a higher recognition rate and can also recognize dialects. Therefore, when we were doing the project, we chose Alibaba Cloud's

TTS. We chose iFLYTEK's for a simple reason. After all, Iflytek is a giant company in the field of artificial intelligence, and its quality is of course guaranteed.

3. AI capability docking

In the specific implementation, the regular participants in this field usually have one of the call center capability or AI capability, and the main docking point is the AI capability to dock with the call center equipment, while the conventional protocol for docking ASR/TTS with the call center equipment is mainly mrcp/sip.

media resource control protocol (MRCP) is a communication protocol, which is used for voice servers to provide various voice services (such as voice recognition and voice synthesis) to clients. There are two versions of MRCP protocol, version 2 uses SIP as the control protocol and version 1 uses RTSP.

when actually docking, we will encounter many technical problems. When our ASR/TTS engine is deployed in a private cloud, in order to avoid many firewall settings and voice flow delay when the internal and external networks penetrate. This also took a lot of effort when we docked at that time.

Front-end service platform:

The most important thing is to configure the call flow.

This one is easily overlooked, but it is a place where achievements can be made. Generally speaking, a set of best speech templates can be used to defeat one enemy. There must be a psychological foundation. In a word, people who answer the phone can follow their own ideas and achieve their goals as much as possible, so as to form a robot speech template in a specific subdivision, and get the best outgoing call effect (connection rate, call duration, willingness to sell electricity, and willingness to collect money) or the connection effect (satisfaction) < P > The rest is basically something on the web side. What about the specific function points, that is, the user logs in and configures the call process. Establishing call tasks, counting call data, and deriving call reports can be basically realized, because from the perspective of products, the most important value of products is that they can call or connect users' phones, accurately identify users' intentions, and accurately answer users. This is the ultimate goal of intelligent voice interaction system, and it has always been our ultimate goal.

Outgoing line manufacturer:

Generally, if you buy a system, it is for providing lines, and you only need to pay some line fees. If you do your own project, there are a lot of online and Taobao, and the cost can be discussed. It also provides an interface for line docking.

Conclusion

Although there are many intelligent voice interaction systems in the market now, they are generally limited to telephone sales in various industries, and there are few intelligent voice interactions in the real sense. The reason is very simple. Although the principle is not difficult, there are many difficulties when it is actually implemented, almost one pit at a time. Fortunately, it has been really implemented now, and the effect in all aspects is still very good. More than a year's hard work has not been in vain. Haha ~

writing this article tries to give you a brief introduction to the intelligent voice interaction system, but it is only a matter of ignorance, and omissions and inadequacies are inevitable, so it is right to give you a brick to attract jade.

many details are limited to the requirements of theme and length, so I won't describe them in detail. If you have any questions, please feel free to communicate.