Invented by David R. Bailey, Todd J. Feldman, Anand Rajaraman, A9 com Inc
The A9 com Inc invention works as followsA search engine system helps users find web pages where they can purchase products specified by the user. A crawler program scores web pages based on rules indicating the likelihood that they contain an online product. A query server searches an index of scored web pages in order to find pages that are both responsive and likely to contain a product offer. In one embodiment, responsive web pages and products that answer the query are displayed on a composite results page.
Background for System and Method for Locating Web-Based Product Offerings
In the electronic commerce field, it’s common for online merchants sell products in many different categories. Amazon.com, Inc., assignee of this application, sells items in the following categories: books, music, videos & DVDs, toys & Games, electronics, home improvements, and auctions. Users are usually presented with predefined categories, and the products that go along with them. This is done in a browse tree. Many merchants also provide a search tool for searching for products.
Online merchants often have trouble presenting groups of products that are related and span multiple categories. The large number of categories and products, as well as the layout of the website, may make it difficult for users to determine the relationships between the products. Imagine, for example, that a web site user is a big fan of American humorist Mark Twain. A user can search for Mark Twain’s books in the book section on an online commerce website. This method of browsing will likely reveal many books written by or about Mark Twain. However, the user may not be aware that other products are also available on the website that could be of interest to Mark Twain fans. A video section on the same website may have video biographies and video adaptations for many of Mark Twain’s classic books. Meanwhile, a music section could include songs inspired by Mark Twain. A section on the site that contains auctions may include products for sale by other parties, including Mark Twain’s memorabilia. The search engine on the website may show some of these products. However, users must usually review the long list of results to find the items or categories that interest them.
Another problem that arises in online commerce is finding a website from which to purchase a specific product. The problem can arise when online merchants that the consumer knows do not sell the product. A consumer can use an Internet search engine like ALTAVISTA, or EXCITE in this situation to find a website that sells the item. A general search can be so broad that only a fraction of the many web sites found actually sells the product. The search could include many sites that only provide information, such as reviews, technical assistance, specifications or other details about the product. The sites that are most relevant to consumers will be hidden deep in a list.
The present invention seeks solutions to these and other problems.
The present invention offers various features to assist users with conducting online searches. These features can be implemented alone or in combination with a search engine for an online merchant, a web search engine or any other type of search system.
A feature of the invention is a method of displaying results of a search in multiple categories according to the importance of each category to the user’s query. This method can be used for displaying the results of any type of search, whether it is for products or other types of items. In a preferred implementation, the method involves receiving from the user a search query and identifying within each of the multiple item categories a set that matches the query. These sets of items can then be used to calculate a score for each category that reflects the relevance or significance of the category in relation to the search. The scores can be based on, for example: the number of hits within each category in relation to the total items in the category, the popularity of the items that satisfy the search, or any combination thereof.
The categories and items associated with them are then displayed to the user, in an order based on the scores. Preferably from the highest significance down. The display order can also be selected using other significance criteria such as the category preference profile of a user. Other display methods to highlight the highest ranked categories can be used in addition or alternatively. This method increases the probability that categories of interest to users will be displayed near the top or other ways to draw the user’s attention. In order to provide the user with a quick overview of the items located and their categories, it is preferable that no more than N items (e.g. the three most highly ranked items) are displayed on the first search results page.
Another feature is a system or method for assisting the user in finding web pages where products specified by the user can be purchased. In a preferred embodiment of the invention, each page found by a crawler is evaluated according to a content-based set of rules to determine a score indicating a likelihood that it contains a product. Scores may also be based on criteria such as content of other pages within the same website. The keyword index stores representations of all or some of the scored pages. It maps keywords to the addresses (URLs). A query server uses the keyword index to find web pages that are relevant to a search query of a user and likely to contain a product. This can be achieved, for instance, by restricting the scope of a search to only web pages that meet a certain threshold.
In one embodiment, these features are combined within the search engine of an online merchant. A user can launch a search for “All Products” from this website. A user can initiate a search of?All Products? that covers multiple product categories. The search query submitted is used to identify products that meet the query. A set of web pages are also identified that match the query as well as likely contain product offerings. The search results are displayed using a composite page that lists at least some products located and at least a few of the web pages located. The products are displayed preferably in conjunction with the respective product categories, according to the category ranking and display methods described above.
One feature of the invention is a method that identifies and displays product information derived by multiple product categories in response to an inquiry submitted by a user to a search engine. A second feature of the invention is a method for users to find web pages where specific products can be bought. These two features can be embodied in a search engine system. As will be evident, they and other features of this invention can be used separately and therefore may be considered separate inventions. To make the description easier, we will use the term “invention” instead. “Invention” is used in this document to refer to all the inventive features that are disclosed.
The description will refer to the drawings and describe a preferred embodiment of the invention. This description will refer to various details about the invention on the AMAZON.COM website. These details are provided to illustrate the invention, not to limit it. “The appended claims are the only ones that define the scope of the invention.
A. Overview Web Site and Search Engine
FIG. The AMAZON.COM website 130 is shown in FIG. 1, including the components that implement the search engine according to the invention. The AMAZON.COM website includes functionality that allows users to browse and purchase items from an online catalog of music, book titles and other types via the Internet 120. This is known in the Internet commerce art. The catalog is made up of millions of items. It’s important to have a site that provides a mechanism for users to locate items.
As shown in FIG. The web site 130 is comprised of a web application 132 (web server?) The web server application 132 (?web server?) processes requests from the user computers 110 over Internet 120. These requests include searches submitted by users for the online catalog. “The web server 132 keeps a log of all user transactions including queries submitted by users.
The web site includes a server that searches databases 141-147 to process queries. The Books database 141 and Music database 142 and Videos database 143 include product identifiers that are used to identify books, music products, and multimedia items, which users can purchase directly through the web site 130. AMAZON.COM includes additional categories of products that can be purchased directly from the website, including Electronics and Toys & Games. These are not included in FIG. In the interests of clarity, FIG. 1 is omitted. The Books, Music, and Videos database 141-143 is intended to represent all databases on the website 130 that are associated with products sold directly by the site merchant.
The Auctions database 144 in FIG. The web site 130 hosts third-party online auctions. AMAZON.COM also hosts third-party fixed-price offerings known as ‘zShops.’ This corresponds to an online version of a ‘flea market.’ In the zShops section, there is a database that is analogous to Auctions database 144. This database has been omitted in FIG. In the interests of clarity, FIG. The Auctions database in FIG. The Auctions database 144 in FIG.
The Affiliated Merchant database labels Software 145, and Electronics 146, contain information about the software and electronic products that are sold on independent web sites affiliated to the host website 130. AMAZON.COM includes products from other categories, including Sports & Outdoors, Toys & Games and affiliated independent web sites. These are not included in FIG. In the interests of clarity, FIG. 1 is omitted. “The Software and Electronics database 145,146 is intended to represent all databases related to products sold by independent merchants affiliated with web site 130.
The Product Spider database (147) contains information about websites that are not affiliated with the host website 130 but have been identified to be selling products. This database is especially useful because it allows the host website 130 to assist a consumer in finding product offerings that are not sold on the host site 130 or affiliated online merchants.
Each database 141-147 contains data tables that are indexed by keywords to make it easier to search for answers to queries. To make things easier, we will refer to the multiple databases that contain the product offerings as ‘categories.’ As an example, in FIG. 1.
The web site 130 includes a database with HTML (Hypertext markup language) content, which includes, for example, product information pages, which show and describe the products that are associated with web site 130.
The query server 140 has a ranking system 150 which prioritizes the search results across the different databases 141-147. Prioritization is determined by a user’s search query and the importance of each category. The query server includes a spell-checker 152 to detect and correct misspellings during search attempts and a search tool (154), which can generate search results from databases. The Books database 141 is accessed in response to the query entered by a user. The search tool (154), depending on the database that was used to perform the search, prioritizes items in a search result based on different criteria. The Product Spider database 147 uses a ranking method known as ‘term frequency inverse document frequencies’ to rank the search results. The (TFIDF), which is an approach that uses inverse weighting for each term in a query with multiple terms, is used to rank the search results. The term that appears the least in a database is used in the query. The Product Spider database 147 is considered the most discriminating word in the query and is therefore given the highest weight by the search engine 154. The algorithms for implementing this method are well-known and can be found in the software development kits that are associated with commercial search engine such as ALTAVISTA or EXCITE.
Click here to view the patent on Google Patents.