摘要:
A method and apparatus for automatically detecting and extracting information from dynamically generated web pages are disclosed. For example, the present method stores user provided information that is entered into a form interface of a web page for a first query. Responsive to the first query, a first response web page is received and stored. The present method then automatically generates a second query to acquire a second response web page that is responsive to the second query. Finally, the present method compares the first response web page and the second response web page. In one embodiment, the present invention extracts information that is dissimilar between the first response web page and the second response web page. This extracted information is deemed to be the pertinent information requested by the user.
摘要:
A website mining tool is disclosed that extracts information from, for example, a company's website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information.
摘要:
Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method.
摘要:
Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method.
摘要:
A website mining tool is disclosed that extracts information from, for example, a company's website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information.
摘要:
A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.
摘要:
A system and method of generating and operating a spoken dialog service for a web-site are disclosed. The system parses web-site data and organizes the web-site data in a task knowledge data bank. The system receives text associated with a user query; processes the received text in a spoken language understanding (SLU) module, the SLU module using the web-site data from the task knowledge data bank; generates a ranked list of relevant responses to the user query; generates a hierarchical tree using the web-site data and the ranked list of relevant responses to the user query, generates a response to the user query using the hierarchical tree; and presents the response to the user.
摘要:
Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method.
摘要:
A method and system are disclosed for providing a dialog interface for a website. The method comprises at each node in a website, computing a summary, a document description and an alias. A dialog manager within a spoken dialog service utilizes the summary, document description and alias for each website node to generate prompts to a user, wherein nodes in the website are matched with user requests. In this manner, a spoken dialog interface to the website content and navigation may be generated automatically.
摘要:
A method and apparatus for automatically detecting and extracting information from dynamically generated web pages are disclosed. For example, the present method stores user provided information that is entered into a form interlace of a web page for a first query. Responsive to the first query, a first response web page is received and stored. The present method then automatically generates a second query to acquire a second response web page that is responsive to the second query. Finally, the present method compares the first response web page and the second response web page. In one embodiment, the present invention extracts information that is dissimilar between the first response web page and the second response web page. This extracted information is deemed to be the pertinent information requested by the user.