发明名称 Automated language detection for domain names
摘要 Methods and systems for automated language detection for domain names are disclosed. In some embodiments, a method for detecting a language of an Internationalized Domain Name (IDN) comprises receiving, by an I/O interface, a string of characters for the IDN; receiving training data, including a plurality of multi-gram analysis for a set of languages; analyzing, by a processor, the string of characters based on the training data, wherein the analyzing includes extracting a set of multi-grams from the string of characters and comparing the extracted set of multi-grams with the training data; detecting the language of the IDN based on results of the analyzing. In some embodiments, the method further comprises comparing the detected language of the IDN with a user selected language and using the IDN to generate a domain name, if the comparing indicates that the detected language of the IDN is consistent with the user selected language.
申请公布号 US9218335(B2) 申请公布日期 2015.12.22
申请号 US201213648645 申请日期 2012.10.10
申请人 VERISIGN, INC. 发明人 Hoskinson Ronald Andrew;Arians Lambert;Anderson Marc;Jain Mahendra
分类号 G06F17/27;G06F17/22;H04L29/12 主分类号 G06F17/27
代理机构 MH2 Technology Law Group, LLP 代理人 MH2 Technology Law Group, LLP
主权项 1. A method for detecting a language of an Internationalized Domain Name (IDN), the method comprising: receiving, by an I/O interface, a string of characters for the IDN; receiving a user selected language, via the I/O interface, corresponding to the IDN; determining a plurality of candidate languages based on the user selected language, wherein the plurality of candidate languages comprises the user selected language and other languages that share some or all characters with the user selected language or that belong to the same language family as the user selected language; receiving training data, comprising a plurality of multi-gram analyses for each language of the plurality of candidate languages; analyzing, by a processor, the string of characters based on the training data, wherein the analyzing includes extracting a set of multi-grams from the string of characters and comparing the extracted set of multi-grams with the training data; detecting the language of the IDN based on results of the analyzing; determining that the language of the IDN that was detected does not match the user selected language; rejecting the IDN for generating a domain name in response to the determination that the language of the IDN that was detected does not match the user selected language, wherein rejecting the IDN for generating a domain name comprises transmitting a warning to a user; receiving, in response to the warning, an indication from the user, via the I/O interface, to use the IDN to generate a domain name; and using the IDN to generate a domain name in response to receiving the indication from the user.
地址 Reston VA US