专利检索 ap:("Yaxin Zhang" OR "Jianming Song" OR "Anton Madievski") AND inv:"Yaxin Zhang" 第 1 页

1.

发明授权
Voiced/unvoiced speech classifier 有权
标题翻译：有声/无声语音分类器

公开(公告)号：US06640208B1

公开(公告)日：2003-10-28

申请号：US09659318

申请日：2000-09-12

申请人： Yaxin Zhang , Jianming Song , Anton Madievski

发明人： Yaxin Zhang , Jianming Song , Anton Madievski

IPC分类号： G10L1106

CPC分类号： G10L25/93

摘要： A voiced/unvoiced speech classifier (30) includes a speech segmentor (34) which segments an input digitized speech waveform into frames of speech and a band-pass filter (36) which filters the frames of speech. A relative energy generator (38) generates a relative energy value for each filtered frame of speech and a decision parameter generator (52) including an autocorrelation calculator (54) and a pitch calculator (56) generates a decision parameter based on an autocorrelation function and a pitch frequency index for the filtered frames of speech. A normalized energy calculator (46) adjusts the threshold and then normalizes the relative energy. A comparator (60) provides a signal indicative of whether a frame of speech is voiced speech or unvoiced speech depending on a comparison of the decision parameter and the normalized relative energy value for each filtered frame of speech.

摘要翻译： 有声/无声语音分类器（30）包括将输入的数字化语音波形分成语音帧的语音分割器（34）和对语音帧进行滤波的带通滤波器（36）。相对能量发生器（38）为每个经滤波的语音帧产生相对能量值，并且包括自相关计算器（54）和音高计算器（56）的判定参数发生器（52）基于自相关函数产生决策参数，并且用于滤波的语音帧的音调频率索引。归一化能量计算器（46）调整阈值，然后使相对能量归一化。比较器（60）根据决定参数与每个被滤波的语音帧的归一化相对能量值的比较，提供指示语音帧是语音语音还是无声语音的信号。

2.

发明授权
Tone based speech recognition 有权
标题翻译：基于语音识别

公开(公告)号：US06553342B1

公开(公告)日：2003-04-22

申请号：US09496868

申请日：2000-02-02

申请人： Yaxin Zhang , Jianming Song , Anton Madievski

发明人： Yaxin Zhang , Jianming Song , Anton Madievski

IPC分类号： G10L1502

CPC分类号： G10L15/02 , G10L25/15

摘要： A method and apparatus for speech recognition involves classifying (38) a digitized speech segment according to whether the speech segment comprises voiced or unvoiced speech and utilizing that classification to generate tonal feature vectors (41) of the speech segment when the speech is voiced. The tonal feature vectors are then combined (42) with other non-tonal feature vectors (40) to provide speech feature vectors. The speech feature vectors are compared (35) with previously stored models of speech feature vectors (37) for different segments of speech to determine which previously stored model is a most likely match for the segment to be recognized.

摘要翻译： 用于语音识别的方法和装置涉及根据语音段是否包括有声或无声语音来分类（38）数字化语音段，并且当语音被语音时利用该分类来生成语音段的音调特征向量（41）。然后将音调特征向量与其他非音调特征向量（40）组合（42）以提供语音特征向量。将语音特征向量与先前存储的用于不同语音段的语音特征向量（37）的模型进行比较（35），以确定先前存储的模型是否将被识别的段最可能匹配。

3.

发明授权
Method for chinese point-of-interest search 有权
标题翻译：中国兴趣点搜索方法

公开(公告)号：US08521539B1

公开(公告)日：2013-08-27

申请号：US13429877

申请日：2012-03-26

申请人： Jianzhong Teng , Yaxin Zhang

发明人： Jianzhong Teng , Yaxin Zhang

IPC分类号： G10L21/00

CPC分类号： G10L15/32 , G01C21/3608 , G01C21/3679 , G10L15/30

摘要： Techniques disclosed herein include systems and methods of automated speech recognition (ASR) for voice destination entry (VDE) include open voice searching (natural language searching) of destinations. A first part uses a server-based automated speech recognizer. The second part is client-based automatic speech recognition (ASR) processing. Thus, techniques include a hybrid VDE solution that provides users with an accurate and flexible way to use speech recognition technologies. A server-based speech recognizer executes the open-search task, while a client-based recognizer refines the results from the server to deliver an optimized result. This system and method significantly improves recognition accuracy for dictation engine based POI search of Chinese Mandarin input and input from other languages. Moreover, the methods herein largely improve the user experience by allowing users to say a partial POI name, and abbreviation, or even say a POI name in a reversed word order.

摘要翻译： 本文公开的技术包括用于语音目的地输入（VDE）的自动语音识别（ASR）的系统和方法包括目的地的开放语音搜索（自然语言搜索）。第一部分使用基于服务器的自动语音识别器。第二部分是基于客户端的自动语音识别（ASR）处理。因此，技术包括混合VDE解决方案，为用户提供使用语音识别技术的准确灵活方式。基于服务器的语音识别器执行打开搜索任务，而基于客户端的识别器从服务器中精炼结果以递送优化结果。该系统和方法显着提高了基于听写引擎的POI搜索中国汉语输入和其他语言输入的识别精度。此外，这里的方法通过允许用户说出部分POI名称，缩写或甚至以反转的字顺序说出POI名称来大大改善用户体验。

4.

发明申请
Open vocabulary speech recognition 审中-公开
标题翻译：开放词汇语音识别

公开(公告)号：US20050049870A1

公开(公告)日：2005-03-03

申请号：US10925601

申请日：2004-08-24

申请人： Yaxin Zhang , Xin He , Xiao-Lin Ren , Fang Sun

发明人： Yaxin Zhang , Xin He , Xiao-Lin Ren , Fang Sun

IPC分类号： G10L15/00 , G10L15/10

CPC分类号： G10L15/10

摘要： There is described a method 300 for open vocabulary speech recognition performed by an electronic device (100). The method (300) includes receiving an utterance waveform (320) and Processing the waveform (350) to provide feature vectors representing the waveform. Then a step of comparing (360) is effected, the comparing compares the feature vectors with concatenated isolated word acoustic models from a concatenated isolated word acoustic model list to select a suitable concatenated isolated word acoustic model. Then a providing a response step (370) provides a response depending on the suitable concatenated isolated word acoustic model. The response typically is a control signal for activating a function of the device (100).

摘要翻译： 描述了由电子设备（100）执行的用于开放词汇语音识别的方法300。方法（300）包括接收发声波形（320）和处理波形（350）以提供表示波形的特征向量。然后，进行比较（360）的步骤，比较将特征向量与来自级联的隔离词声模型列表的级联隔离词声模型进行比较，以选择适当的级联隔离词语模型。然后，提供响应步骤（370）根据适当的级联隔离词语音模型提供响应。响应通常是用于激活设备（100）的功能的控制信号。

5.

发明授权
Method for estimating a confidence measure for a speech recognition system 有权
标题翻译：用于估计语音识别系统的置信度量度的方法

公开(公告)号：US06735562B1

公开(公告)日：2004-05-11

申请号：US09588163

申请日：2000-06-05

申请人： Yaxin Zhang , Ho Chuen Choi , Jian Ming Song

发明人： Yaxin Zhang , Ho Chuen Choi , Jian Ming Song

IPC分类号： G10L1514

CPC分类号： G10L15/01

摘要： A method of estimating a confidence measure for a speech recognition system, involves comparing an input speech signal with a number of predetermined models of possible speech signals. Best scores indicating the degree of similarity between the input speech signal and each of the predetermined models are then used to determine a normalized variance, which is used as the Confidence Measure, in order to determine whether the input speech signal has been correctly recognized, the Confidence Measure is compared to a threshold value. The threshold value is weighted according to the Signal to Noise Ratio of the input speech signal and according to the number of predetermined models used.

摘要翻译： 一种估计语音识别系统的置信度测量的方法，包括将输入语音信号与可能的语音信号的多个预定模型进行比较。然后使用表示输入语音信号与每个预定模型之间的相似程度的最佳分数来确定用作置信度量的归一化方差，以便确定输入语音信号是否已被正确识别，将置信度与阈值进行比较。阈值根据输入语音信号的信噪比和根据所使用的预定模型的数量进行加权。

6.

发明授权
Automatic creation of an interactive log based on real-time content 有权
标题翻译：基于实时内容自动创建交互式日志

公开(公告)号：US07844460B2

公开(公告)日：2010-11-30

申请号：US11675139

申请日：2007-02-15

申请人： Michael L. Charlier , Sergey N. Baranov , Joong Thye Lee , Carlton J. Sparrell , Yaxin Zhang

发明人： Michael L. Charlier , Sergey N. Baranov , Joong Thye Lee , Carlton J. Sparrell , Yaxin Zhang

IPC分类号： G10L15/18

CPC分类号： G10L15/1822 , G10L2015/088

摘要： A system [100] includes an audio reception device [105] to receive audio from a person speaking and convert the audio to a text format. An intelligent agent [110] receives the text format and detects at least one key term in the text format based on predetermined criteria. A logic engine [115] compares the at least one key term with a listener knowledge base [125] corresponding to a listener to determine context information corresponding to the at least one key term. A search device [135] searches for multimedia content corresponding to the context information. A communication device [150] communicates display content comprising at least one of: the multimedia content, and a link to the multimedia content to an electronic display device [155] adapted to display the display content.

摘要翻译： 系统[100]包括音频接收装置[105]，用于从说话者接收音频并将音频转换为文本格式。智能代理[110]接收文本格式并基于预定标准检测文本格式中的至少一个关键词。逻辑引擎将至少一个关键词与对应于收听者的收听者知识库[125]进行比较，以确定与至少一个关键词对应的上下文信息。搜索设备[135]搜索与上下文信息相对应的多媒体内容。通信设备[150]将包括以下至少一个的显示内容传送：多媒体内容和到多媒体内容的链接到适于显示显示内容的电子显示设备155。

7.

发明申请
INTELLIGENT GROUP MEDIA REPRESENTATION 审中-公开
标题翻译：智能集团媒体代表

公开(公告)号：US20080214145A1

公开(公告)日：2008-09-04

申请号：US11681763

申请日：2007-03-03

申请人： Jason N. Howard , Thomas J. Weigert , Thomas S. Babin , Sergey N. Baranov , Yaxin Zhang , Chung Kwang Chou

发明人： Jason N. Howard , Thomas J. Weigert , Thomas S. Babin , Sergey N. Baranov , Yaxin Zhang , Chung Kwang Chou

IPC分类号： H04Q7/22

CPC分类号： H04L67/306 , H04L67/04

摘要： A method, apparatus, and electronic device for optimizing a media presentation to a group. A memory may store a personal media user profile for a user. A processor may create a group media user profile from the personal media user profile and associated individual media user profiles. A network interface may send a request to a digital media content source for a set of digital media content with a digital media content profile that matches the group media user profile.

摘要翻译： 一种用于优化到组的媒体呈现的方法，装置和电子设备。存储器可以存储用户的个人媒体用户简档。处理器可以从个人媒体用户简档和相关联的各个媒体用户简档创建组媒体用户简档。网络接口可以通过与组媒体用户简档匹配的数字媒体内容简档向数字媒体内容源发送一组数字媒体内容的请求。

8.

发明申请
METHOD AND APPARATUS FOR AUTOMICATION CREATION OF AN INTERACTIVE LOG BASED ON REAL-TIME CONTENT 有权
标题翻译：用于根据实时内容自动创建交互式日志的方法和装置

公开(公告)号：US20080201142A1

公开(公告)日：2008-08-21

申请号：US11675139

申请日：2007-02-15

申请人： Michael L. Charlier , Sergey N. Baranov , Joong Thye Lee , Carlton J. Sparrell , Yaxin Zhang

发明人： Michael L. Charlier , Sergey N. Baranov , Joong Thye Lee , Carlton J. Sparrell , Yaxin Zhang

IPC分类号： G10L15/00

CPC分类号： G10L15/1822 , G10L2015/088

摘要： A system [100] includes an audio reception device [105] to receive audio from a person speaking and convert the audio to a text format. An intelligent agent [110] receives the text format and detects at least one key term in the text format based on predetermined criteria. A logic engine [115] compares the at least one key term with a listener knowledge base [125] corresponding to a listener to determine context information corresponding to the at least one key term. A search device [135] searches for multimedia content corresponding to the context information. A communication device [150] communicates display content comprising at least one of: the multimedia content, and a link to the multimedia content to an electronic display device [155] adapted to display the display content.

摘要翻译： 系统[100]包括音频接收装置[105]，用于从说话者接收音频并将音频转换为文本格式。智能代理[110]接收文本格式并基于预定标准检测文本格式中的至少一个关键词。逻辑引擎将至少一个关键词与对应于收听者的收听者知识库[125]进行比较，以确定与至少一个关键词对应的上下文信息。搜索设备[135]搜索与上下文信息相对应的多媒体内容。通信设备[150]将包括以下至少一个的显示内容传送：多媒体内容和到多媒体内容的链接到适于显示显示内容的电子显示设备155。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类