Invented by Allan Henry Vermeulen, Alan B. Atlas, David M. Barth, John David Cormie, Ami K. Fischman, James Christopher Sorenson, III, Eric M. Wagner, Amazon Technologies Inc
The Amazon Technologies Inc invention works as follows
A distributed web-services storage system.” A system can include a Web Services interface configured to receive a client request to access a data object according to a protocol for web services, where the request includes a key value that corresponds to the object. The system can also include storage nodes configured for storing replicas of objects, each replica being accessible through a unique locator value. A keymap instance is configured to store keymap entries for each object. The keymap entry for a given object includes the key value as well as each locator value that corresponds to replicas. “A coordinator can receive a client request via the web services interface and then access the keymap to identify the locator values that correspond to the key value. For a specific locator value, the coordinator will retrieve the replica from the corresponding storage node.Background for Distributed Storage System with Web Services Client Interface
Field of Invention
This invention relates to data-storage systems, and more specifically, storage systems configured to allow access to storage via a web service.
Description of Related Art
Many different computing apps rely on a particular type of storage medium to store various types of application data. Office applications and multimedia programs, for example, generate and use data in various formats and types, including documents, spreadsheets and still images. Such data is often stored to be accessed or used by a user again and again. “For example, the user may want to work on a variety of documents over time and expect that they will be available in a consistent state.
In conventional computing systems, applications use a hard drive or magnetic fixed drive to store persistent application data. Although optical and solid state storage devices are used, they can also be integrated into a computer system that executes the applications. These devices can be integrated into the computer system that runs the application or they can be accessed by that system through a local peripheral or network interface. Devices that are used as storage for applications are typically managed by an OS that controls device behavior in order to provide a consistent interface to the various applications that need storage.
This conventional model of application data storage has several limitations. It generally limits application data accessibility. If application data is saved on the local drive of a computer system, then it may not be accessible to other applications. Applications that run on systems that are not part of the immediate network, may be unable to access the device even if it is network accessible. Enterprises restrict access to local area networks for security purposes, so that external systems cannot access resources or systems within the enterprise. Applications that run on portable devices such as mobile phones, PDAs, laptops or handheld computers are affected. Data that is persistently linked to fixed systems or networks may be difficult to access.
The conventional application storage model may also fail to ensure the reliability and integrity of stored data. By default, operating systems store application data in a single copy on a storage device. If data redundancy was desired, users or applications would have to create and manage their own copies. Although individual storage devices and third-party software can provide a certain degree of redundancy to applications, these features are not always available, since the storage resources that applications have may vary greatly. The operating-system-mediated conventional storage model may also limit the cross-platform accessibility of data. “For example, different operating platforms may store data in different formats for the same application, making it difficult for users to access data from applications that are running on other platforms.
The disclosure includes “various embodiments” of a web-services-based distributed storage system. In one embodiment, the system includes a web service interface that is configured to receive client requests to access data objects according to a protocol for web services. Client requests for data objects may include key values corresponding to that data object. The system can also include a set of storage nodes that are configured to store replicas, each replica being accessible through a unique locator value. The system can also include a keymap that stores a particular keymap entry per data object. For each data object, this keymap entry contains the key value, and the locator values corresponding to the stored replicas of the given object. The system can also include a coordinator configured for receiving client requests from the web service interface. The coordinator can be configured to respond to a client request by accessing the keymap instance in order to identify one of more locator values that correspond to the key value. For a specific locator value, it may access a storage node corresponding to this value to retrieve the replica.
In a specific implementation of the system the web services API may be configured to further receive, in accordance with the web services protocol client requests to save data objects. A particular client request for a data object to store includes a key value that corresponds to the data object. The coordinator can be further configured so that it receives the client requests from the web service interface to store data objects. In response to the specific client request, the Coordinator may be configured to save one or multiple replicas of the data object in one or several storage nodes. “In response to storing one or more replicas of a particular data object at a storage node, the coordinator may receive a locator value for the replica.
Introduction
As computing applications become increasingly data-intensive and geographically dispersed the need for reliable access to application data that is independent of location increases. Multimedia applications such as playback, authoring and storage applications require increasing amounts of storage space as multimedia content becomes more and more sophisticated. It may also be desirable to have access to application data in a number of different locations, regardless of where the device is located. “While many computers have large amounts of storage on disks, it is difficult to access this storage in a uniform, convenient and secure manner.
As opposed to configuring each computer to rely on its own internal storage resources, or provisioning network-based local storage resources (e.g. Network Attached Storage(NAS), Storage Area Network(SAN) etc.), we can configure a data storage service that is connected to the internet to provide generic storage services to clients via Internet-based protocols. An Internet-connected data-storage service can be configured to offer generic storage services via Internet-based protocols such as the web services (WS), for example. Internet-based protocols such as web services protocols are typically platform-independent, in that they typically function independently of underlying software or hardware. Web services can provide applications with access to large amounts of data, regardless of whether the storage is located on the host system or local network. Web service-accessible data storage can be accessed from anywhere that has Internet access. Web service-accessible data storage can facilitate a variety of computing features. These include remote access by devices and applications to data that is widely distributed, remote access by applications to data during execution of the application, sharing and/or accessing data between distributed users who are working together, dissemination of result data to distributed users and many others.
In the discussion that follows, a possible data-storage model that could be used in an web services-based system of storage is described. Then, a storage system which can be configured to offer storage services in accordance with the data storage model, is described.
Overview of Storage Service and Storage Model User Interface
In FIG., we see “One embodiment” of a model that provides data storage as a service to users, for example, as a Web service. 1. Storage service interface 10 in the model is a user-facing interface for the storage service. The model that interface 10 presents to the user may show the storage service as a collection of buckets (20 a-n) accessible through interface 10. Each bucket 20 can be configured to hold an arbitrary number 30 a-n of objects, which may in turn store data specified by the user of a storage service.
As described below in more detail, storage service interface 10 can be configured in certain embodiments to facilitate interaction between a storage service and its clients using a web service model. For example, in one embodiment, interface 10 may be accessible by clients as a web services endpoint having a Uniform Resource Locator (URL), e.g., http://storageservice.domain.com, to which web services calls generated by service clients may be directed for processing. A web service is any computing service made available via an interface to a client that uses one or more Internet application layer data transfer protocols such as HTTP or another suitable protocol.
Web services can be implemented using different architectural styles and enabling service protocol. In a Representational States Transfer (REST), web services architecture the parameters relevant to a web service call (e.g. specifying the type or service requested, credentials of the user, data that is to be processed, etc.) are specified. The parameters that are relevant to a web services call (e.g., specifying the type of service requested, user credentials, user data to be operated on etc.) can be specified in the data transport command which invokes the web service call to the endpoint web services such as HTTP GET or HTTP PUT commands. REST-style architectures for web services may be stateless in some implementations. Each web services call can contain all of the information needed to process it without any external state information. Document-based and message-based architectures are different from REST-style architectures. They encode parameters and data relevant to a call for web services in a document which can be sent to an endpoint, decoded, and acted on by the endpoint. Formatting the web services document may, for example, be done using a version of eXtensible markup Language (XML), or another suitable markup. In some embodiments the markup used to format the document can delimit parameters which control the processing of a request. However, in other embodiments features of the markup (e.g. certain tags) control certain aspects of request handling. In some embodiments, a resulting document can be encapsulated in another protocol such as a SOAP version to make it easier for the endpoint to process the web services request.
Other protocols can also be used within different embodiments of web service architectures. A web services endpoint may, for example, use a version Web Services Description Language to communicate its interface requirements to clients. A directory protocol, such as the Universal Description, Discovery and Integration protocol (UDDI), can be used by web services endpoints to make themselves known. There are many other protocols that relate to the provisioning of computing services through web services interfaces. Any given web service implementation can use any combination of these protocols.
It is contemplated in some embodiments that interface 10 can support other interfaces than web services, either instead or in addition to the web services interface. An enterprise can implement a storage system for clients outside the enterprise who access the service using web services protocols. Users within the enterprise may also use another type of interface, such as a proprietary interface tailored to the intranet. In some embodiments interface 10 can support all the different types of protocols that users of the storage service use to access the service. In some embodiments, interface 10 can be implemented in different ways for each interface. In some embodiments, interface 10 may include aspects for handling client interactions (e.g. receiving and responding service requests), which are implemented separately from aspects for implementing the architecture of the storage services (e.g. the organization of the service in a hierarchy buckets and objects). In certain embodiments, the interface 10 portion relating to client interactions (e.g. via web services protocol) can be bypassed or ignored by some users, for example those within an enterprise. This is described more fully below, in conjunction with a description of FIG. 2.
As shown in FIG. Interface 10 allows storage service users to access buckets 20. In general, a bucket can function as the root namespace of an object that is linked to a storage service user. A bucket 20 can be compared to a folder or directory in a filesystem. In some embodiments individual buckets may be used to account for storage service usage. “For example, for billing purposes a user can be assigned to one or more buckets 20, and this user will be charged for storage resources (e.g. storage of objects 30), which are hierarchically located within the namespace created by these buckets.
Metadata 21 can include information such as the date of creation of a bucket, its creator’s identity, whether it has any objects 30 associated with it, or other suitable information. Metadata 21 can include, for example, information about the date a bucket was created, its creator’s identity, whether it has objects 30 associated with the bucket, or any other information. In certain embodiments, metadata may contain information indicative of the usage characteristics of a container 20, such as information about the size of all objects 30 that are associated with a bucket 20, information regarding access histories of users to bucket 20 or its associated objects 30, information concerning billing history for bucket 20, and any other information relevant to current or historic usage of bucket 20. In one embodiment, buckets 20 are assigned a unique identifier. This identifier may be provided by the user or assigned automatically by the storage service. The unique identifier can be stored in metadata 21 or as an additional property or field for bucket 20. In some embodiments, bucket 20 may not contain explicit references, pointers, or other information that corresponds to the objects 30 associated therewith. As described below in greater detail, the location and selection can be done by using a separate mapping tool referred to as a “keymap.