发明名称 CORPUS GENERATION DEVICE, CORPUS GENERATION METHOD AND CORPUS GENERATION PROGRAM
摘要 A corpus generation device according to an embodiment includes a web page acquisition unit, a reference word acquisition unit, an attachment unit and an output unit. The web page acquisition unit acquires a web page including description sentence data regarding a presentation target. The reference word acquisition unit acquires a reference word that is an attribute value regarding the presentation target from the web page. The attachment unit extracts a broader word belonging to a layer above the reference word acquired by the reference word acquisition unit from a storage unit that stores hierarchical relationship information indicating a hierarchical relationship between attribute values, and attaches an attribute tag corresponding to the reference word to the broader word included in the description sentence data. The output unit outputs, as corpus data, the description sentence data to which the attribute tag is attached by the attachment unit.
申请公布号 US2016041951(A1) 申请公布日期 2016.02.11
申请号 US201314420424 申请日期 2013.09.30
申请人 RAKUTEN, INC. 发明人 SHINZATO Keiji
分类号 G06F17/21;G06F17/27 主分类号 G06F17/21
代理机构 代理人
主权项 1. A corpus generation device comprising: a web page acquisition unit that acquires a web page including description sentence data regarding a presentation target; a reference word acquisition unit that acquires a reference word that is an attribute value regarding the presentation target from the web page; an attachment unit that extracts a broader word belonging to a layer above the reference word acquired by the reference word acquisition unit from a storage unit that stores hierarchical relationship information indicating a hierarchical relationship between attribute values, and attaches an attribute tag corresponding to the reference word to the broader word included in the description sentence data when the broader word is included in the description sentence data; and an output unit that outputs, as corpus data, the description sentence data to which the attribute tag is attached by the attachment unit.
地址 Tokyo JP