摘要:
A method for retrieving video data from a video server, the video data having been stored on a plurality of disks based on a disk striping technique. In accordance with one illustrative embodiment, the method comprises the steps of retrieving a predetermined number of bits from the plurality of disks in the video server, and storing that predetermined number of bits in a buffer memory, wherein the number of bits retrieved and stored is based on the number of disks and on the capacity of the buffer memory. These steps, which together may illustratively constitute one round of the video retrieval process, may be repeated until the entire video has been retrieved and, for example, transmitted to the intended recipient(s) at a required transmission rate.
摘要:
A method of scheduling the retrieval of both continuous and non-continuous data retrieves continuous data streams at a predetermined rate. At least one server receives one or more requests for the retrieval of a stream of media data by at least one terminal. Each requested media stream is characterized by a playback rate r.sub.i. A common retrieval time period is established for each requested media stream. The common retrieval time period is a function of the playback rate. The retrieval of the requested media stream is scheduled in the order in which each request is received by the server.
摘要:
An example of a method includes determining features of a first type for a web page of a plurality of web pages. The method also includes electronically determining a plurality of rules for an attribute of the first web page, wherein the plurality of rules are determined based on features of the first type. The method also includes electronically identifying a first rule, from the plurality of rules, which satisfies a first predefined criterion. The first predefined criteria include at least one of a first threshold for a precision parameter, a second threshold for a support parameter, a third threshold for a distance parameter and a fourth threshold for a recall parameter. The method further includes storing the first rule to enable extraction of value of the attribute from a second web page.
摘要:
Improved techniques are disclosed for processing data stream queries wherein a data stream is obtained, a set of aggregate queries to be executed on the data stream is obtained, and a query plan for executing the set of aggregate queries on the data stream is generated. In a first method, the generated query plan includes generating at least one intermediate aggregate query, wherein the intermediate aggregate query combines a subset of aggregate queries from the set of aggregate queries so as to pre-aggregate data from the data stream prior to execution of the subset of aggregate queries such that the generated query plan is optimized for computational expense based on a given cost model. In a second method, the generated query plan includes identifying similar filters in two or more aggregate queries of the set of aggregate queries and combining the similar filters into a single filter such that the single filter is usable to pre-filter data input to the two or more aggregate queries.
摘要:
Web pages are efficiently categorized in a data processor without analyzing the content of the web pages. According to at least one embodiment, data is maintained that represents sample URLs grouped into a plurality of clusters. The sample URLs of a cluster are used to produce a URL regular expression pattern (“URL-regex”) that differentiates the sample URLs of the cluster from the sample URLs of other clusters and that covers at least a specified percentage of the sample URLs in the cluster. The process of producing a URL-regex is repeated for each of the clusters producing a URL-regex for each cluster. Web pages are then categorized into one of the clusters by determining which of the URL-regex patterns produced for the clusters match URLs that refer to the web pages. Thus, a web page may be categorized based on a URL that refers to the web page without having to obtain and analyze the content of the web page.
摘要:
A number of configuration elements are associated with a number of devices. Information about input configuration elements is accessed. An input configuration element is associated with one or more input rules. It is determined which of the configuration elements could be accessed by the input rules and any call chains emanating from the rules. Output rules are determined by using the accessed configuration elements, the input rules, and the way the input rule manipulates its accessed configuration elements. Each output rule may be derived from an input rule and corresponds to the same input configuration element associated with that input rule. An executable module is generated that is adapted to access at least a given one of the input configuration elements and to trigger one or more of the output rules corresponding to the given input configuration element. Read and write sets for rules are determined, and the triggered output rules ensure that restrictions associated with a configuration element are not violated.
摘要:
A method, a system and a computer program product for maximizing content spread in a social network are provided. Samples of edges are generated from an initial candidate set of edges. Each edge of the samples of edges has a probability value for content flow. Further, a subset of edges is determined from the samples of edges based on gain corresponding to each edge. Also, each node of the subset of edges is having at least one of less than ‘K’ or equal to ‘K’ incoming edges. Further, the probability of each edge, of the subset of edges, may be incremented. Furthermore, a final set of edges may be determined by ensuring ‘K’ incoming edges. The ‘K’ incoming edges may be ensured by removing one or more incoming edges when a number of the incoming edges for a node of the final set is greater than ‘K’ incoming edge.
摘要:
Techniques for high precision web extraction using site knowledge are provided. Portions of repeating text are identified in unlabeled web pages from a particular web site. Based on the portions of repeating text, the unlabeled web pages are partitioned into a set of segments. Multiple labels are assigned to respectively corresponding multiple attributes in the set of segments, where assigning the multiple labels comprises applying a classification model to each separate segment in the set of segments. First one or more labels are identified that were erroneously assigned to one or more attributes in the set of segments. Second one or more correct labels for the one or more attributes are determined. The first one or more labels in the set of segments are corrected by assigning the second one or more labels to the one or more attributes.
摘要:
The invention provides methods and systems for summarizing multiple continuous update streams such that an approximate answer to a query over one or more of the continuous update streams (such as a Query requiring a join operation followed by a duplicate elimination step) may be rapidly provided. The systems and methods use multiple (parallel) Join Distinct (JD) Sketch data structures corresponding to hash buckets of at least one initial attribute.
摘要:
A system for, and method of, determining a physical topology of a network having multiple subnets. In one embodiment, the system includes: (1) a skeleton path initializer that uses addressing information from elements in the network to develop a collection of skeleton paths of direct physical connections between labeled ones of the elements, the skeleton paths traversing multiple of the subnets and (2) a skeleton path refiner, coupled to the skeleton path initializer, that refines the collection by inferring, from the direct physical connections and path constraints derived therefrom, other physical connections in the skeleton paths involving unlabeled ones of the elements.