发明名称 Creation of a category tree with respect to the contents of a data stock
摘要 Methods for the automatic creation of a category tree with respect to the contents of a data stock, wherein a taxonomy of the data stock will be created on the base of co-occurrences. Another object of the present invention is furthermore a data processing system comprising data which represent information in at least one data stock which is accessible via at least one data source, which is designed and/or adapted to at least partially carry out a method according to the invention. Another object of the present invention is furthermore a data processing device for the electronic processing of data, comprising a control and/or computer unit, an input unit and an output unit, which is designed and/or adapted to at least partially carry out a method according to the invention, preferably using at least a part of a data processing system according to the invention.
申请公布号 US8745069(B2) 申请公布日期 2014.06.03
申请号 US20100941818 申请日期 2010.11.08
申请人 IQser IP AG 发明人 Wurzer Joerg;Magnus Christian
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A system for analyzing data to establish a category tree comprising: a data source; an inventory representation of data in communication with the data source; a computer unit having a processor in communication with said data source and said inventory representation of data; software executing on said processor to: 1. create a list of words of each element within the inventory representation of data;2. filter out stop words in each of said list of words;3. calculate a significance value for each word remaining in each said list of words;4. sort said list of words in descending order according to the significance values to create a sorted list of words;5. reduce said sorted list of words to a maximum number of top elements to create a reduced list of words;6. store said reduced list of words in a persistent memory;7. detect co-occurrences within the stored reduced list of words;8. store said co-occurrences as a table in the persistent memory;9. retrieve words from the stored reduced list of words which have the highest significance values but which have no co-occurrences with each other;10. establish a first level of the category tree using said retrieved words;11. retrieve a list of co-occurrences for each word of said first level from said stored reduced list of words;12. create a corresponding list of words for each said list of co-occurrences having no co-occurrences with each other;13. calculate a frequency of co-occurrences for each of said corresponding list of words;14. sort said corresponding list of words in descending order according to the frequency to create a sorted corresponding list of words;15. reduce said sorted corresponding list of words to a predetermined maximum number of top elements to create a reduced corresponding list of words;16. establish a subordinate level of the category tree using said reduced corresponding list of words; and,17. iteratively repeat steps 11 through 16 while no further co-occurrences can be retrieved from said persistent memory for a set of superior categories, wherein in step 11 the retrieved co-occurrences exists for all superior categories in said category tree;wherein the category tree is consolidated for display on a display device.
地址 CH