摘要:
Techniques are disclosed for efficiently and automatically classifying textual documents or files. In some embodiments, the classification process is integrated into or otherwise made part of the storage function, such that when the user initiates a save process for a given file, the file is processed through a classifier prior to (or contemporaneously with) completing the save function. In some such embodiments, textual content of the file is analyzed using natural language processing to identify a main or substantial concept discussed in the file, and one or more corresponding tags are then assigned to that file. Subsequently, the user can access that file based on the one or more tags, for instance, through a user interface that allows the user to select one or more content categories associated with the assigned tags. The files can be text-based, but may include other content as well, such as images, video, and audio.
摘要:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for associating one or more of a plurality of metadata collections with one or more respective identifiers, wherein each metadata collection includes one or more pairings of metadata attributes with metadata values, and wherein each identifier is one of a project identifier, a tag identifier or an instance identifier; identifying, based on identifier information associated with a virtual machine instance, one or more metadata values to be provided to the virtual machine instance, wherein the identifier information specifies one or more of a project identifier, a tag identifier and an instance identifier, and wherein each identified metadata value belongs to a metadata collection associated with an identifier that is specified in the identifier information; and providing, to the virtual machine instance, the identified one or more metadata values.
摘要:
A system, method, and apparatus are provided for supporting and/or executing count-distinct queries. A large set of data (e.g., tens or hundreds of millions of event records) is condensed daily to generate presence bitmaps to reflect the distinctiveness of a selected data dimension S (e.g., user ID) for one or more key dimensions g1, g2, . . . (e.g., advertisement ID, campaign ID, advertiser ID). The condensation process eliminates duplication and yields a single value (e.g., 1 or 0) for each tuple [S, g1, . . . ] to represent the distinctiveness of each value in the S dimension to each combination of values in the grouping dimensions. On a monthly basis, the daily values are condensed to yield a single value for the month, and a similar process is applied on any other desired time granularities (e.g., year). The condensed data may be generated for any combination of selected dimension(s) and grouping dimension(s).
摘要:
Improved user interface features to manage a large number of files and their application to management of a large number of test scripts. Various features related to selection of files of interest, locating files matching (or not containing) search strings potentially of several lines, highlighting the occurrences of desired strings in the content of a file, finding and replacing strings of interest potentially of several lines, are disclosed.
摘要:
In one embodiment, non-transitory computer-readable medium stores instructions for implementing tagged management of stored items, wherein an embodiment can receive an input indicating the selection of a graphical representation of a file in the GUI of an operating system, and can also receive an input indicating the intent to attach a tag to the file. The system can perform an automatic search through the metadata of files associated with the user and the user account to find the set of files having the tag, responsive to the request to display the set of files. Having located the set of files, an operation can be performed to display the set of files having the requested tag, regardless of the storage location of the files.
摘要:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for associating one or more of a plurality of metadata collections with one or more respective identifiers, wherein each metadata collection includes one or more pairings of metadata attributes with metadata values, and wherein each identifier is one of a project identifier, a tag identifier or an instance identifier; identifying, based on identifier information associated with a virtual machine instance, one or more metadata values to be provided to the virtual machine instance, wherein the identifier information specifies one or more of a project identifier, a tag identifier and an instance identifier, and wherein each identified metadata value belongs to a metadata collection associated with an identifier that is specified in the identifier information; and providing, to the virtual machine instance, the identified one or more metadata values.
摘要:
Systems and methods according to embodiments provide elasticity for complex event processing (CEP) systems. Embodiments may comprise at least the following three components: (1) incremental query optimization, (2) operator placement, and (3) cost explanation. Incremental query optimization allows avoiding simultaneous computation of identical results by performing operator-level query reuse and subsumption. Using automatic operator placement, a centralized CEP engine can be transformed into a distributed one by dynamically distributing and adjusting the execution according to unpredictable changes in data and query load. Cost explanation functionality can provide end users with near real-time insight into the monetary cost of the whole system, down to operator level granularity. Combination of these components allows a CEP system to be scaled up and down.
摘要:
A cost monitoring system can monitor a cost of queries executing in a complex event processing system, running on top of a pay-as-you-go cloud infrastructure. Certain embodiments may employ a generic, cloud-platform independent cost model, multi-query optimization, cost calculation, and/or operator placement techniques, in order to monitor and explain query cost down to an operator level. Certain embodiments may monitor costs in near real-time, as they are created. Embodiments may function independent of an underlying complex event processing system and the underlying cloud platform. Embodiments can optimize a work plan of the cloud-based system so as to minimize cost for the end user, matching the cost model of the underlying cloud platform.
摘要:
Methods, systems and computer readable media which use permissions checking when deciding whether to allow access to a file are described. In one exemplary embodiment, a method includes receiving a notification of a change of permissions of a directory in a hierarchical file system and determining, in response to the notification, whether to update partially a permissions cache which is used in screening access based on permissions, such as access to search results. The determining may include a comparison of an identifier of the directory to a data structure of cached directories which have files represented in the permissions cache.
摘要:
Systems and methods are provided for a user interface with an automatic search menu. The interface exposes commands to the user as instantly searchable hierarchy. Visually, this is represented as a tree view with an edit box above it. There is no “Search” or “Go” button to press. One second after any character is entered in the edit box, the computer reduces a displayed hierarchy down to only those items that match the keyword entered. Entering another character before one second expires resets the timer. This allows the user to type in as little or as much of the keyword as necessary to reduce the hierarchy to a few items, one of which can then be mouse-clicked. This method scales to large number of commands.