Traditional Culture Encyclopedia - Hotel accommodation - Progress in the construction and application of Meituan Brain’s tens of billions of knowledge graphs

Progress in the construction and application of Meituan Brain’s tens of billions of knowledge graphs

Sharing guest: Dr. Zhang Hongzhi, Meituan algorithm expert

Editing and editing: Liao Yuanyuan Meituan Group

Production platform: DataFunTalk

Introduction: Meituan As China's largest online local life service platform, Tuan connects hundreds of millions of users and tens of millions of merchants, and behind it lies a wealth of knowledge related to daily life. Meituan’s knowledge graph team has been focusing on graph construction and using knowledge graphs to empower businesses and improve user experience since 2018. Specifically, "Meituan Brain" builds user, The knowledge association between merchants, products and scenes forms a knowledge brain in the field of life services. At present, "Meituan Brain" has covered billions of entities and tens of billions of triples, and has verified the effectiveness of the knowledge graph in catering, takeout, hotels, comprehensive services and other fields. Today we introduce the construction and application of the life service knowledge graph in Meituan Brain, mainly focusing on the following three aspects:

What is "Meituan Brain"?

The following is the overall RoadMap constructed by "Meituan Brain". It first started building the catering knowledge map in 2018, and conducted preliminary mining of Meituan's rich structured data and user behavior data, and in some important Conduct in-depth mining on the data dimension, such as sentiment analysis of user reviews of meals. In 2019, we focused on in-depth mining of unstructured user comments, represented by tag graphs. After 2020, we will begin to combine the characteristics of each field and carry out in-depth data mining and construction in each field, including commodities, food, wine and travel, comprehensive and cross maps, etc.

In search, users usually need to abstract their intentions into a series of refined search keywords that the search engine can support. The tag knowledge graph uses "tags" to carry user needs, thereby improving user search experience. For example, through the tag knowledge graph, users can directly search for "taking care of children" or "couples dating", and the appropriate merchants/content providers will be returned. From the perspective of information gain, unstructured text such as user reviews contains a large amount of knowledge (such as the scene, crowd, environment, etc. suitable for a certain merchant). Information gain can be achieved by mining unstructured data. The team uses massive review data in the field of life services as the main source of knowledge, and uses key technologies such as tag mining, inter-tag relationship mining, and tag-merchant association to sort out user needs, scenarios and main concerns from the bottom up to complete the map construction.

The construction of tag knowledge graph is divided into the following four parts: knowledge extraction, relationship mining, graph marking and graph application.

① Knowledge extraction

Tag mining adopts a simple sequence tagging architecture, including single span tag mining and skip word tag mining. In addition, it will also be combined with semantic discrimination or context discrimination and use remote supervision. Learn + result voting method to obtain more accurate labels.

②Relationship mining

Synonym mining: Synonym mining is defined as given a pool containing N words, M business tag words, and finding the word in N for each word in M. Synonyms. Existing synonym mining methods include search log mining, encyclopedia data extraction, rule-based similarity calculation, etc., which lack certain versatility. Our current goal is to find a tag synonym mining method that is versatile and can be widely applied to large-scale data sets.

The following is the specific solution for synonym mining given by the author. First, the offline tag pool or online query tags are represented as vectors to obtain the vector index, and then the vector hash recall is performed to further generate the TopN of the tag. Synonym pair candidates, and finally use the synonym discrimination model. The advantage of this solution is that it reduces the computational complexity and improves the operation efficiency; compared with the inverted index candidate generation, it can recall synonyms without overlap, has high accuracy and simple parameter control.

For labeled data, the mainstream label word embedding representation methods include word2vec, BERT, etc. The word2vec method is relatively simple to implement, taking the average of word vectors and ignoring the order of words; BERT can capture richer semantic representations through the pre-training process, but directly takes the [CLS] flag vector, and its effect is equivalent to word2vec. Sentence-Bert has made corresponding improvements to the Bert model. It obtains the tag tagA and tagB representation vectors through the Twin Towers pre-training model, and then measures the similarity of the two vectors through cosine similarity, thereby obtaining the semantics of the two tags. similarity.

For unlabeled data, the representation of sentences can be obtained through contrastive learning. As shown in the figure, the vector similarity of Bert's original model for sentences with different similarities is very high. After adjustment by contrastive learning, the vector similarity can better reflect the text similarity.

Comparative learning model design: First, given a sentence, perturb this sample to generate a sample pair. Generally speaking, add Adversarial Attack to the embedding layer, do Shuffling at the vocabulary level, or drop some words, etc. pair; during the training process, maximize the similarity of the same sample in the batch and minimize the similarity of other samples in the batch. The final results show that unsupervised learning can achieve the effect of supervised learning to a certain extent, and the effect of unsupervised learning + supervised learning is significantly improved compared to supervised learning.

Synonym discrimination model design: splice two tag words into the Bert model, and obtain tags through multi-layer semantic interaction.

Tag hyponym mining: Lexical inclusion relationships are the most important source of mining hyponymy relationships. In addition, mining methods that combine semantics or statistics can also be used. However, the current difficulty is that it is difficult to unify the upper and lower standards, and it is usually necessary to modify the algorithm mining results based on the needs of the field.

③ Map marking: How to build the relationship between tags and merchant supply?

Given a tag set, a threshold is set based on the frequency of the tag and its synonyms appearing in the merchant's UGC/group orders to obtain the candidate tag-POI. A problem that arises is that even if the frequency is high, it is not necessarily related, so a merchant marking and identification module needs to be used to filter bad cases.

Merchant marking considers three levels of information: tags and merchants, user reviews, and merchant Taxonomy. Specifically, at the label-merchant granularity, the label and merchant information (merchant name, merchant third-level category, merchant top label) are spliced ??and input into the Bert model for judgment.

Micro granularity of user comments, determine whether the relationship between each tag and the comments mentioning the tag (called evidence) is positive, negative, irrelevant or uncertain, so it can be regarded as four categories discriminant model. We have two options to choose from. The first is a method based on multi-task learning. The disadvantage of this method is that the cost of adding new labels is high. For example, to add a label, some training data must be added for the label. The author finally adopted a discrimination model based on semantic interaction, inputting labels as parameters, so that the model can be based on semantic discrimination and support dynamically adding new labels.

The discriminant model based on semantic interaction first performs vector representation, then interacts, and finally aggregates the comparison results. This method has a faster calculation speed, while the BERT-based method requires a large amount of calculation but has a higher accuracy. . We strike a balance between accuracy and speed. For example, when a POI has more than 30 pieces of evidence, we tend to use a lightweight method; if a POI has only a few pieces of evidence, a method with a higher accuracy can be used for identification.

From a macro perspective, it mainly depends on whether the tag and category match. There are three main relationships: definitely not, probably yes, and definitely yes. Generally, voting results are determined through merchant-level association results, and some rules will be added. When accuracy requirements are high, manual review can be performed.

④ Graph application: direct application of mined data or knowledge vector representation application

In scenarios related to merchant knowledge Q&A, we answer users based on merchant marking results and evidence corresponding to the tags question.

First, identify the tags in the user query and map them to ids, and then transparently transmit them to the index layer through the search recall or sorting layer, so as to recall the merchants with marked results and display them to the C-end users. A/B experiments show that users’ long-tail demand search experience is significantly improved. In addition, some online experiments have been conducted in the hotel search field. Through supplementary recall methods such as synonym mapping, the search results have been significantly improved.

It is mainly implemented using the GNN model. Two types of edges are constructed in the composition, Query-POI click behavior and Tag-POI related information; Graph Sage is used for graph learning. The learning goal is to determine whether Tag and POI There is an association relationship or whether there is a click relationship between Query and POI, and further sampling is performed based on the association strength. After going online, the results show that when only using Query-POI information to compose pictures, there is no online benefit. After introducing Tag-POI related information, the online effect is significantly improved. This may be because the ranking model relies on Query-POI click behavior information to learn. The introduction of Graph Sage learning is equivalent to a different learning method, with relatively less information gain; the introduction of Tag-POI information is equivalent to the introduction of new knowledge information. , so it will bring significant improvement.

In addition, only adding Query-POI vector similarity does not improve the online effect well, but adding Query and POI vectors has a significant improvement. This may be because the feature dimension of the search is high and the vector similarity feature is easily ignored. Therefore, the feature dimension is increased after splicing the Query and POI vectors.

This task uses the currently known Item to predict the Masked Item clicked by the user. For example, when obtaining the context representation of an Item, the relevant Attribute information is also represented as a vector to determine whether the Item has Attribute information.

In addition, Masked Item Attribute prediction can also be done to integrate the tag's knowledge graph information into the sequence recommendation task. Experimental results show that the accuracy after introducing knowledge information has been improved by orders of magnitude on different data sets. At the same time, we also did online conversion work, using Item representation as vector recall; specifically, we recalled the top N similar Items based on the Items that the user has clicked in history, thereby supplementing the online recommendation results, and on the food list recommendation page There is a significant improvement.

The goal of constructing the dish knowledge map is, on the one hand, to build a systematic understanding of the dishes, and on the other hand, to build a more complete dish knowledge map. Here we start from different Hierarchy to illustrate the construction strategy of the dish knowledge graph.

** * Understanding of dish names**

The dish names contain the most accurate and lowest-cost dish information. At the same time, the understanding of the dish names is also the basis for subsequent explicit knowledge reasoning. the prerequisite for cultural capabilities. First, the essential word/subject dish of the dish name is extracted, and then sequence annotation is used to identify each ingredient in the dish name. Different models are designed for the two scenarios. For the case with word segmentation, the word segmentation symbol is added to the model as a special symbol. The first model is to identify the type corresponding to each token; for the case without word segmentation, Span-Trans needs to be done first. task, and then reuse the module with word segmentation.

Understanding dish names is an important source of information, but the knowledge contained is relatively limited. Therefore, preliminary character inference based on a deep learning model is proposed, which can achieve generalization processing of different literal expressions. However, it performs poorly on cases that require professional knowledge, and cases occasionally appear when the literals match extremely well.

Mining the basic knowledge of certain recipes from knowledge-rich texts to build a source knowledge base; and then mapping it to specific SKUs through generalized reasoning. In the reasoning of ingredients, for example, there are multiple braised pork dishes in the dish. According to statistics, 4 out of 10 pork belly dishes refer to pork belly, and 6 refer to pork belly with skin, so the meat is converted into pork belly with skin. Correspondingly, Buddha Jumps Over the Wall has multiple recipes. By first counting the probability of each ingredient appearing, a threshold can be set, and then the recipe of the recipe is indicated.

Multi-source data mining builds solid knowledge triple based on the dish name understanding results, and also relies on the generalization rules of the dish name understanding results. This strategy is mainly suitable for processing labels such as ingredients, efficacy, and people. The accuracy of this method is OK and it has certain generalization ability, but the coverage rate is low.

There are some relatively easy-to-use training data in the business, such as self-consistent in-store classification trees edited by 10 million merchants. Based on this data, 500 million positive pairs and 30G corpus can be generated.

During model training, the tab/shop of the recipe category will be randomly replaced, and the model will determine whether the tab/shop has been replaced; the shop name will be dropped with a 50% probability, making the model perform robustly when only the dish name is input. At the same time, the model was substantively improved, and the classification labels were trained as bert's words. This method was applied to the downstream model. With 100,000 annotated data, the accuracy of the recipe hyponym/synonym model increased by 1.8%.

First, use ReseNet to compile the recipe images, use the Bert model to encode the recipe text information, and learn the matching information between the text and store dishes through comparative learning loss. The two-tower model is used here. On the one hand, it is more convenient for downstream applications. The single-tower model can be used independently, and the representation of the dish pictures can also be inferred and cached. On the other hand, the picture content is simple, and there is no need for interactive modeling. The training goals are to match images with store dishes, align images with dish names, and align images with tabs.

Can predict dish categories or complete recipe information based on multi-modal information. For example, predicting "pork and cabbage" with picture information will be more intuitive and accurate. Multi-view semi-supervised recipe attribute extraction based on text and view modal information. Taking cooking method extraction as an example, first generate cooking method training samples (braised pork - braised pork); then use CNN model to train and predict recipe cooking methods to guide Bert model Finetune text model or multi-modal model predicts dish cooking methods based on merchant/tab/dish and review information; finally, vote on the two models or splice the two features to make predictions.

In summary, we make a corresponding summary of the construction of the dish knowledge graph. Dishes understanding is more suitable for SKU initialization; deep learning inference models and explicit inference models are more suitable for synonyms, hyponyms, cuisines, etc.; ultimately, we want to solve the problem of incomplete single-modal information through multi-modal + structured pre-training and inference. , multiple attribute dimensions, and the need for a large amount of annotated data, etc., so this method is applied to almost all scenarios.

That’s it for today’s sharing, thank you all.

Sharing guests:

Previous article:Hangzhou two-day tour best raiders free travel
Next article:Which company is the developer of Qiliping in the middle of Meishan?