摘要 |
Listings and reviews of listings can be processed to identify descriptive attributes for locations associated with the listings. To do this, a corpus of words is generated for various locations based on listings in the locations and reviews of those listings. An expected frequency, and per-location frequency for each word is determined. These numbers are in turn used to determine a number of high frequency listing locations, and a number of below expected frequency listing locations for each word. Based on a comparison of the number of high frequency listing locations and the number of below expected frequency listing locations of a word with an attribute reference number, the word can be identified either as an attribute that is likely descriptive of the location, or not. |
主权项 |
1. A method comprising:
generating a corpus of words present in listings and reviews of the listings, the listings describing goods or services, each listing associated with one of a plurality of locations; for each of the words in the corpus:
computing an expected frequency for a word to appear in the corpus,determining, for each of the locations, a per-location frequency for the word,determining a number of high frequency listing locations comprising locations where the per-location frequency of the word is a first multiple greater than the expected frequency,determining a number of below expected frequency listing locations comprising locations where the per-location frequency of the word is a second multiple smaller than the expected frequency, anddetermining a descriptiveness metric for the word based on the number of high frequency listings locations and the number of low frequency listings locations; and identifying, as attributes, one or more words in the set of words having a descriptiveness metric within a threshold range of an attribute reference number. |