摘要:
A distributed collection of web-crawlers to gather information over a large portion of the cyberspace. These crawlers share the overall crawling through a cyberspace partition scheme. They also collaborate with each other through load balancing to maximally utilize the computing resources of each of the crawlers. The invention takes advantage of the hierarchical nature of the cyberspace namespace and uses the syntactic components of the URL structure as the main vehicle for dividing and assigning crawling workload to individual crawler. The partition scheme is completely distributed in which each crawler makes the partitioning decision based on its own crawling status and a globally replicated partition tree data structure.
摘要:
A system for automatically generating user interest profiles and delivering information to users learns a user's interests by monitoring the user's outbound communication streams, i.e., the information that the user produces either by typing (e.g., while a user is composing an e-mail message or editing a word processor document) or by speaking (e.g., while a user is engaged in a phone conversation or listening to a lecture). The system uses the monitored text to build (and possibly update) a user interest profile. The profile is constructed from current text generated by the user, so that the retrieved information reflects present user interests. In addition, the profile may also retain past user interests, so that the profile reflects a combination of past and present user interests. The system then automatically queries diverse databases for information relevant to the interest profile. The databases may include internet web pages, files stored on the user's local network, and other local or remote data repositories. The queries may use a combination of internet search engines, the specific selection of which may depend upon the nature and/or content of the queries. The information retrieved in response to the queries is then presented to the user. The retrieved information may contain, for example, answers to questions that the user might ask and/or data related to the user's current and continuing interests. Because a user's current speech or typed text is highly correlated with the user's current interests, the retrieved information will be relevant to the user's actual interests. The communication stream monitoring, interest profile building, data base querying, and presentation of retrieved information are all performed automatically, in real time, and in the background of current user activities.
摘要:
A system generates user interest profiles by monitoring and analyzing a user's access to a variety of hierarchical levels within a set of structured documents, e.g., documents available at a web site. Each information document has parts associated with it and the documents are classified into categories using a known taxonomy. The user interest profiles are automatically generated based on the type of content viewed by the user. The type of content is determined by the text within the parts of the documents viewed and the classifications of the documents viewed. In addition, the profiles also are generated based on other factors including the frequency and currency of visits to documents having a given classification, and/or the hierarchical depth of the levels or parts of the documents viewed. User profiles include an interest category code and an interest score to indicate a level of interest in a particular category. The profiles are updated automatically to accurately reflect the current interests of an individual, as well as past interests. A time-dependent decay factor is applied to the past interests. The system presents to the user documents or references to documents that match the current profile.
摘要:
A method and apparatus for efficiently matching a large collection of user profiles against a large volume of data in a webcasting system. The invention generally includes in one embodiment four steps to parallelize the profiles. First, an initial profile set is partitioned into several subsets also referred to as sub-partitions using various heuristic methods. Second, each sub-partition is mapped onto one or more independent processing units. Each processing unit is not required to have equal processing performance. However, for best performance results, subset data should be mapped in one embodiment where the subset with a highest cost is mapped to a fastest processor, and the next highest cost subset mapped to the next fastest processor. Where appropriate, the invention evaluates the relative subset processing speed of each processor and adjusts future subset mapping based upon these evaluations. For each information item I that needs to be matched with a profile predicate, a third and a fourth step are executed. The third step broadcasts I to all processing units, and a fourth step performs a sequential profile match on I.
摘要:
A tactile notification device that can be embodied in, e.g., a wristwatch, communicates via wireless link with plural personal computing devices, including cellular telephones, pagers, and palm top computers, of the person wearing the notification device. When one of the personal computing devices alerts, e.g., when the telephone receives an incoming call, the pager receives a page, or the palm top computer receives an email, the personal computing device sends a signal to the notification device, which generates a discrete tactile signal against the person's skin. The notification device can generate different tactile signals, and each tactile signal can be correlated as desired by the user to one of the personal computing devices. In one embodiment, opposed pinch bars are provided on the skin-facing tactile surface of a wristwatch to gently pinch the skin and thereby establish a first tactile signal that can be correlated to, for example, an alert for an incoming phone call. Also, a rotating bar can be provided on the tactile surface of the wristwatch, and the tactile signal that corresponds to, e.g., an incoming page can be established by rotating the bar against the skin.
摘要:
A search engine that forms a compact representation of a plurality of user queries to efficiently find desired information in an information network. The search engine comprises a profile processor having logic to receive the queries from the users and a search module. The search module is coupled to the profile processor and has logic to receive the information content, to combine the user queries into a master query, and to match the master query with the information content to determine matching content. The search engine also includes logic to analyze the matching content to determine if any of the queries has been satisfied.
摘要:
A pointing device for entering data into an information processing system for a 3-dimensional graphical user interface. The pointing device comprises: a switch mounted on a pointing device for producing a first signal during actuation by a vertical downward pressure on a first region of the switch; a circuit for coupling the first signal to a Z-axis on a display attached to the information processing system so as to control movement of information along the Z-axis presented on the display. In another embodiment, the switch produces a second signal by a vertical downward pressure on a second region of the switch; and a circuit for coupling the second signal to a Z-axis on the display so as to move information present on the display along the Z-axis in a direction opposite to that of the movement along the Z-axis in response to the first signal.
摘要:
A stylus includes a wireless transceiver, a processor controlling the transceiver, and a data storage device. Data can be selected on a first computer such as a first personal digital assistant (PDA) and then transmitted via wireless link to the stylus, when the user manipulates a button on the stylus to signal to the operating system of the first PDA that it is ready to receive data. The data is transmitted to the stylus and stored therein. Then, the stylus is aimed at a second PDA and the button is manipulated to cause the stylus to transmit the data to the second PDA via wireless link. With this invention, users of the PDAs can, e.g., quickly and efficiently exchange business cards electronically, without excessive manual data entry and without resorting to connecting their PDAs to a network.
摘要:
A method and apparatus for efficiently matching a large collection of user profiles against a large volume of data in a webcasting system. The method removes redundant patterns in user profiles and information content to improve matching performance based on a Boolean-based query language. Users can select desired information content by choosing a set of predicates to assert the properties for each cyberspace document desired. Boolean operators of AND, OR and NOT connect predicates together on the information items that will be pushed to them. The method includes dynamic cost/credit adjustment based profile indexing and matching.
摘要:
A method for automatically limiting access of a client computer to data objects accessed through a server computer dynamically prevents robots or webcrawlers from obtaining too much of the server database and from dramatically reducing server performance. The method includes the steps of receiving a request for a data object, recording a log entry for the request, calculating client request values, and refusing the request if a client request value exceeds one of a set of corresponding predefined maximum request values. Each log entry contains a client identifier, timestamp, and at least one data object identifier for the request. The client request values preferably include a request frequency, which is compared with a predefined maximum request frequency, and a cumulative data request, which is compared with a data access threshold. If the client is refused access, the client identifier is added to a deny list, and future requests from the client are automatically denied. The calculated cumulative data request may be for a single client, or it may be for all clients, in order to detect a robot that is divided among multiple client identifiers. The cumulative data request check may consider the total percentage of server resources being given away, or a pattern in the requests. Also provided is a data protection system containing a log file, a request analyzer, and a dynamically-generated deny list. Requests to the server are intercepted and sent to the data protection system first.