摘要:
When a message having at least one attachment is obtained for indexing, it is indexed as N+1 separate documents, where N is the number of attached documents. If the message is part of a message thread, then information regarding the last message in the thread is retrieved, and search index attachment meta data for the last message is extracted. A unique identifier is computed for the newly obtained attachments, and used to search for matches in the attachments for the last message in the thread. If there is a match, then the newly obtained attachment is not indexed, but the unique identifier of the previously indexed matching attachment is added to a body index document for the new message. A unique identifier associated with the new message body is also added to a list of parent identifiers associated with the attachment. If a search is subsequently issued that matches the contents of the attachment, all documents whose parent identifiers are listed in the attachment document meta data will be returned as matches. If an attachment is obtained for a message is not part of a previous message thread, or if a newly obtained attachment is not a match with any previously obtained attachment within the message thread to which it belongs, then the attachment is indexed into the search index, and its unique identifier is included in the index document for the newly obtained message body.
摘要:
A method and system for sharing full text index entries across application boundaries in which documents are obtained by a shared, platform level indexing service, and a determination is made as to whether the received documents are duplicates with regard to previously indexed documents. If a document is determined to be a duplicate, the index representation of the previously indexed copy of the document is modified to indicate that the document is also associated with another application or context. If a document is not a duplicate of a previously indexed document, the document is indexed to support future searches and/or other processing. The index representation of a document includes application category identifiers associating one or more applications or contexts with the document. When a document is indexed, one or more category identifiers are generated and stored in association with that document. The category identifiers for an indexed document may, for example, represent an application that received, stored, or otherwise processed that document. The application category identifiers enable category specific searching by applications sharing a common search index. A software category filter may be provided to process search results from the shared search index, so that only documents associated with certain categories are returned. Accordingly, one or more search categories may be determined for a given search query, based on an application generating the search query, or some other context information, and then used to filter the search results provided from the shared search index.
摘要:
A method and system for sharing search index entries across multiple computer systems organized in a peer to peer network, in which unique content is indexed only once, even though the content may be physically duplicated in multiple computer systems in the peer to peer network. When files are obtained by a shared indexing service, and a determination is made as to whether the received files are duplicates with regard to previously indexed files. If a file is determined to be a duplicate, the index representation of the previously indexed copy of the file is modified to indicate that the file is also associated with another computer system in the peer to peer network. If a file is not a duplicate of a previously indexed file, the file is indexed to support future searches. The index representation of a file includes category identifiers associating one or more computer systems with the file. When a file is indexed, one or more category identifiers are generated and stored in association with that file. The category identifiers for an indexed file may represent host computer systems on which copies of the file are stored. The category identifiers enable location specific searching by computer systems in a peer to peer network sharing a common search index. A software category filter may be provided to process search results from the shared search index, so that only files associated with certain categories are returned.
摘要:
A system for full text indexing optimization that operates based on identification of idle and active content in a content source, and by prioritizing indexing of idle content over active content. Active and idle content items are automatically identified, and idle content items are given a higher priority for indexing, while active content items are given a lower priority. Active content items are generally those that are considered relatively more likely to be located by the user without using the full text indexing function, while idle content items are those content items that are relatively more likely to be located through use of the full text indexing function. The specific content item attributes that are used to determine whether a given content item is active or idle may depend on the type content source for which the full text index is being provided. Additionally, the determination of which content items are active and which are idle may be based on predetermined, static criteria, and/or dynamically determined use patterns determined by monitoring operations performed on content items by a user.