发明名称 |
Creation of a category tree with respect to the contents of a data stock |
摘要 |
Methods for the automatic creation of a category tree with respect to the contents of a data stock, wherein a taxonomy of the data stock will be created on the base of co-occurrences. Another object of the present invention is furthermore a data processing system comprising data which represent information in at least one data stock which is accessible via at least one data source, which is designed and/or adapted to at least partially carry out a method according to the invention. Another object of the present invention is furthermore a data processing device for the electronic processing of data, comprising a control and/or computer unit, an input unit and an output unit, which is designed and/or adapted to at least partially carry out a method according to the invention, preferably using at least a part of a data processing system according to the invention. |
申请公布号 |
US8745069(B2) |
申请公布日期 |
2014.06.03 |
申请号 |
US20100941818 |
申请日期 |
2010.11.08 |
申请人 |
IQser IP AG |
发明人 |
Wurzer Joerg;Magnus Christian |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A system for analyzing data to establish a category tree comprising:
a data source; an inventory representation of data in communication with the data source; a computer unit having a processor in communication with said data source and said inventory representation of data; software executing on said processor to:
1. create a list of words of each element within the inventory representation of data;2. filter out stop words in each of said list of words;3. calculate a significance value for each word remaining in each said list of words;4. sort said list of words in descending order according to the significance values to create a sorted list of words;5. reduce said sorted list of words to a maximum number of top elements to create a reduced list of words;6. store said reduced list of words in a persistent memory;7. detect co-occurrences within the stored reduced list of words;8. store said co-occurrences as a table in the persistent memory;9. retrieve words from the stored reduced list of words which have the highest significance values but which have no co-occurrences with each other;10. establish a first level of the category tree using said retrieved words;11. retrieve a list of co-occurrences for each word of said first level from said stored reduced list of words;12. create a corresponding list of words for each said list of co-occurrences having no co-occurrences with each other;13. calculate a frequency of co-occurrences for each of said corresponding list of words;14. sort said corresponding list of words in descending order according to the frequency to create a sorted corresponding list of words;15. reduce said sorted corresponding list of words to a predetermined maximum number of top elements to create a reduced corresponding list of words;16. establish a subordinate level of the category tree using said reduced corresponding list of words; and,17. iteratively repeat steps 11 through 16 while no further co-occurrences can be retrieved from said persistent memory for a set of superior categories, wherein in step 11 the retrieved co-occurrences exists for all superior categories in said category tree;wherein the category tree is consolidated for display on a display device.
|
地址 |
CH |