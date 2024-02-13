



Document Understanding and Text Chunking Vertex AI Search does more than just extract text from documents. It also identifies structural and content elements such as titles, section headings, paragraphs, and tables that define the organization and hierarchy of various documents. This information is used to intelligently segment document objects into smaller retrievable segments (chunks), minimizing noise while maintaining semantic coherence. This segmentation is more effective than the widely used simple text chunking, which often fails to maintain semantic consistency.

Additionally, Vertex AI Search can extract information from each segment as annotations to improve your search experience. Document segments are tokenized and embedded to create indexes for search and ranking.

Once Vertex AI Search identifies relevant document segments through retrieval and ranking, it can further process that content or use it as input to generate responses, resulting in higher quality and more relevant responses. produces high output power.

Annotating Documents and Queries Using the Knowledge Graph As explained in the previous post, keyword search uses keywords to find relevant information, whereas semantic search looks for similarities in the meaning of content. Knowledge graphs use graph relationships between entities to find information. Knowledge graphs are even more useful for extracting entities and their relationships from text and creating structure from text. This approach is similar to embedding, except that the result is structured as a graph, making it easier for humans to understand. As a result, knowledge graphs are another promising option for retrieving information in RAG systems.

Google has been using the Knowledge Graph in Google Search since 2012, allowing Google to add context to search queries by providing information about things, people, and places that Google already knows. Google Search leverages the knowledge graph to leverage existing intelligence and understanding of the web to find and return results related to your search query, including landmarks, celebrities, cities, geographic features, movies, and more.

Knowledge Graph also integrates with Vertex AI Search to power search capabilities such as web search and media search. As documents and queries are processed or summarized, Vertex AI Search uses the knowledge graph to automatically identify relevant entities and add them to annotations.

For example, suppose a document or query contains the keyword “Buffett” and that keyword likely refers to Warren Buffett. Vertex AI Search automatically annotates documents with additional information about Buffett from the Google Search Knowledge Graph and adds relevant Buffett keywords to the original query. This increases the likelihood that this document will be searched using other keywords and topics related to him.

