Method for segmenting communication transcripts using unsupervised and semi-supervised techniques
    1.
    发明授权
    Method for segmenting communication transcripts using unsupervised and semi-supervised techniques 有权
    使用无监督和半监督技术分割沟通成绩单的方法

    公开(公告)号:US07912714B2

    公开(公告)日:2011-03-22

    申请号:US12060469

    申请日:2008-04-01

    IPC分类号: G10L15/06

    CPC分类号: G06F17/3071 G10L15/04

    摘要: A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a set of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence clusters within sequences of the collection.

    摘要翻译: 提供了一种用于从事务通信的通信转录语料库形成一个或多个顺序句子的离散段聚类的方法,其包括将语料库的通信记录分成由呼叫者说出的第一组句子和第二组句子 由答复者 通过使用无监督分数聚类方法,根据词汇相似度的度量,对第一和第二组句子进行分组,从而生成一组句子群; 通过为每个句子集分配不同的句子类型并以分配给句子分组的句子集合的句子类型表示语料库的每个通信录音的每个句子来生成句子序列的集合; 以及通过根据在集合的序列内分配给句子集群的句子类型之间的基于邻近度的度量连续地合并语句集群来生成指定数量的离散分段集群。