Invented by Manoj Kumar Vijayan, Ho-Chi Chen, Deepak Raghunath Attarde, Hetalkumar N. Joshi, Commvault Systems Inc
The Commvault Systems Inc invention works as followsThe method and system provide information management for data from hosted services. It receives the information management policies of a host account, requests data related to the account, receives the data, and then provides a preview of the data received on a computing device. In some cases, the system indexes received data in order to associate it with an information management user, or provides index information about the data received to the computing device.
Background for Information management of data associated multiple cloud services
People are increasingly generating data and metadata on multiple computing devices, and using multiple hosted services. In a single day, for example, an individual may use a desktop computer, laptop computer, tablet computer and smartphone to view and edit data, emails or other objects. Another example is using hosted solutions like Facebook, Gmail and Google Docs to conduct business and communicate with others. The files may be scattered over multiple devices and hosted sites, making it difficult for a person to easily access them. Information management systems that are currently used tend to focus on the organization, protection, and recovery of data stored in fixed computing devices such as desktops or servers. In this way, the data of a person that is hosted and their mobile data are not managed or backed up by conventional information management software. In the event of a lost or damaged mobile device or a disruption in a hosted service, a person may lose their critical data without being able to recover it.
Systems and methods are needed that can overcome the problems above, and also provide benefits that go beyond them.” The examples of prior or related methods and systems, and the limitations they have are meant to be indicative and not exhaustive. Those skilled in the art will be able to discern other limitations of prior or existing systems and methods after reading this detailed description.
The headings are provided for convenience and do not affect the scope or the meaning of the disclosure.
A software, firmware and/or hardware for comprehensive information management has been disclosed. The system collects, manages and distributes data and meta-data from multiple sources in an unified manner, including data from mobile devices and hosted services. The system allows a user to have a unified view across multiple devices, and can also maintain data synchronization across many computing devices. Users can define simple or complex policies for data distribution to drive data distribution across devices. The system allows users to browse files on mobile devices in real time. The system allows a user of a mobile device or other limited-feature device to execute full-featured applications installed on a remote computing device (e.g., if it is paired with s desktop or laptop). The system allows a user to use a mobile phone or another limited-feature device in order to run full-featured software installed on a remote computer (e.g. a laptop or desktop device), and interact with that full-featured software via the input/output hardware.
The system has many benefits. Users can search and browse files on any device or service hosted by the system from a single user interface. The system also provides a secure corporate collaboration environment, where data objects can be exchanged, synchronized, and shared across multiple devices. However, this is done using a trusted private information management service, rather than a third-party untested or untrusted service. The copies of data objects belonging to an organization are therefore not exposed to third parties, as they do not have to be controlled or stored by them. The system allows an organization to comply better with data retention laws and other regulations, by capturing all the data of the users, and not just those data which originate from fixed computing devices. The system allows an organization to respond better to unexpected data loss, such as a lost mobile device or service interruption by a hosted service. This is because the system actively manages copies of hosted data and mobile data. There are many other benefits that can be gained.
The following will describe various examples of the invention. This description contains specific details to enable a full understanding of the examples and provide a detailed description. A person skilled in the art will be able to understand that many of these specific details are not necessary. A person skilled in the art will understand that many obvious features of the invention are not described herein. In order to avoid obscuring the description, certain well-known functions or structures may not be described or shown in detail.
The terminology below should be interpreted in the broadest possible sense, even if it is used to describe certain examples of an invention in detail. In fact, some terms may be highlighted below, but any terminology that is intended to be restricted will be explicitly defined in the Detailed Description section.
Information Management Environment
Aspects” of the technologies described in this document can be implemented within an information management system 100. This will be explained now with reference to FIG. 1. As shown in FIG. As shown in FIG. Computing devices can include: personal computers 110, servers 105 (such a file servers, database server, print servers and web servers), workstations or other fixed computing systems, such as mainframes and minicomputers. Servers 105 can include network-attached file servers (NAS).
The environment 100 can include virtualized computing resource, such as a a virtualized machine 120 that is provided to an organization by a cloud service provider or a a virtualized machine 125 that runs on a host virtual machine 130 operated by the organisation. The organization could use one virtual server 125A to serve as a database and another virtual server 125B to serve as a mailbox. The environment 100 can also include mobile computing devices like laptops 135, tablets 140, personal data assisters 145, smartphones 152, and other mobile computing devices.
Of Course, other types computing devices can be part of the 100. Each of these computing devices, as part of its function, creates, modifies and writes production copies of metadata and data that are stored on persistent storage media with fast I/O speeds. Each computing device, for example, may access and modify data and metadata files stored on a semiconductor memory, local disk drive, or network-attached device. These computing devices can access data and metadata through a file system that is supported by the operating system.
The environment 100 can also include hosted services (e.g. the organization’s employees, independent contractors or departments) that provide online services for the organization. Social networking services, hosted email services, or other hosted applications, such as Microsoft Office 365 or Google Docs. Hosted services may include software-as-a-service (SaaS), platform-as-a-service (PaaS), application service providers (ASPs), cloud services, and all manner of delivering computing or functionality via a network. Each hosted service can generate ‘hosted data and meta-data’ as it delivers services to its users. Each user is assigned a unique identifier. Facebook, for example, may create and store photos and wall posts as well as videos and notes that are linked to a specific Facebook account.
The organization uses an information management system 150 directly or indirectly to protect and manage data and metadata from the computing devices within the environment 100, and data and meta data that is maintained on behalf of the users by hosted services. The CommVault Simpana information management system, offered by CommVault Systems, Inc. in Oceanport, N.J., is one example. The information management system manages nonproduction copies of data and meta data in order to achieve information management goals. These include: allowing the organization to recover data from an earlier time, complying with electronic discovery and regulatory data retention (?e-discovery?) requirements, and complying to regulatory data storage and electronic discovery (?? e-discovery??) The information management system creates and manages non-production copies of the data and metadata to meet information management goals, such as: allowing an organization to restore data or metadata if an original copy is lost (e.g. by deletion, corruption, disasters or service interruptions caused by a hosted service); allowing data recovery from a previous time; complying with regulatory data retention and electronic discovery (?e-discovery?) Or other data retention policies. The information management system may create additional non-production copy of the data or metadata on any non-production storage media, such as magnetic disks or magnetic tapes, other storage media, such as optical disks or solid-state devices, or cloud data storage sites (e.g. Third-party vendors may operate cloud data storage sites 170. The assignee?s U.S. Patent application Ser. No. No. 12/751,850 filed Mar. 31 2010, entitled DATA OBJECT STORE and SERVER FOR A CLOUDS STORAGE ENVIRONMENT INCLUDING DATA DEDUPLICATION ACROSS MULTIPLE STORAGE SITES. Now U.S. Publication Number 2010 0332456, is hereby incorporated herein by reference in its entirety.
FIG. “FIG. The data management environment 100 has a number of differences between ‘production copies’ Data and metadata are stored in the data management system 100. Each computing device 205 within the environment 100 is shown to have at least one operating systems 210 and one or more application 215A-D installed, including mail server applications (or file server applications), mail client applications (or mail client applications), database applications (or word processing applications), spreadsheet applications (or presentation applications), browser applications and entertainment applications. The operating system 210 can be used by each application to access and modify the various production copies stored on a production data-storage medium 218. This may be a Hadoop file system or Open VMS filesystem, or any other type of distributed storage system. Production copies of files can include structured data, unstructured (e.g. documents), or semi-structured (e.g. video files, image files, email mailboxes, etc.). Thus, they may include documents 220A-220B, spreadsheets, presentations, 230, video, image, or database files. Operating system 210 can also modify production copies of data and files, including files on a boot volume or system volume. Hosted data and metadata are ‘production copies’. The hosted service modifies and accesses the data and metadata of the users as part of the services. Production copies of data can include files and subsets of them, which are treated by the operating system or a related application as a separate functional unit. However, they do not appear in the file system associated with the files. A single email mailbox may contain multiple emails 245A-C as well as email headers and attachments. A single database 240 can include multiple tables. A data object can be divided into one or more data blocks, each of which is a collection of bits within the data object that may not have any particular function for an application or operating system. Data objects can be broken down into data blocks, which are a collection data bits that have no particular function to a specific application or operating system. In addition to data objects, the operating system 210 and applications 215A-D may also access and modify production copies of metadata, such as boot sectors, partition layouts, file or data object metadata (e.g., file name, file size, creation/modification/access timestamps, file location within a file folder directory structure, user permissions, owners, groups, access control lists (?ACLs? Access control lists (?ACLs?) and system metadata, such as registry information, are also accessible by the operating system 210. Some applications, in addition to metadata related to operating systems or file systems, maintain indexes of production metadata, such as metadata associated with email messages. As shown in FIG. As shown in FIG. This can be either file system metadata or application-specific metadata.
The information management system 150 receives or accesses the various production copies and metadata. It then creates nonproduction copies (such as backup operations, archive operations, or snapshot operations) of these data items and metadata. These nonproduction copies are often stored on one or more nonproduction storage media 265 that differ from the production storage medium 218, where the production copies and metadata are located. Non-production copies of data objects represent the production data object with its associated metadata as it existed at a certain point in time. The information management system 150 can create and manage several non-production copies for a specific data object, as a production copy changes over time, depending on whether it has been modified by the application 215, the hosted service 122 or the operating system. Since a copy of a particular data object can be removed from the original file system and production data storage medium, the information system 150 may manage multiple non-production copies of that object. Each copy represents the state of the production data object or metadata at a specific point in time.
The information management system may create non-production copies of discrete data objects stored in a virtual disk file (e.g., documents, email mailboxes, and spreadsheets) and/or non-production copies of the entire virtual disk file itself (e.g., a copy created by the information management system). The information management system can create non-production versions of discrete data objects (e.g. documents, email accounts, spreadsheets), and/or copies of the virtual disk itself (e.g. a copy of a.vmdk).
Each non-production object 260A to C may contain or represent copies of more than one production object. Non-production object (260A) represents, for example, three production data objects, 255C,230, and 245C. 260A, for example, is a non-production object that represents three separate production data objects 255C (represented as 245C?), 230? As indicated by the prime marking (? A non-production object can store a representation or metadata of a production object in a format that is different from the original data object format, for example, compressed, encrypted, duplicated, or optimized. While FIG. Although FIG. A single non-production object (260) may also contain or represent production data items that originate from different computing devices.
Non-production copies are backup copies, archives copies, and snapshots copies. Backup copies are used to protect and restore data for a shorter period of time. They can be native or non-native formats (e.g. compressed, encrypted, duplicated and/or modified from the original format). Archive copies may be deduplicated, compressed, encrypted or otherwise modified to the original format. In some cases, an archive copy can be made of a data item and a stub or logical reference may be used in place of the production copy on the production storage medium. In these examples, the stub can point to the archive copy of a data object that is stored on the non-production medium in order for the information management system to retrieve it if necessary. The stub may also include some metadata associated with the data object, so that a file system and/or application can provide some information about the data object and/or a limited-functionality version (e.g., a preview) of the data object. A snapshot copy is a representation of a data item at a specific point in time. The large amount of data does not need to be copied or relocated, so a snapshot copy can easily be created without affecting production computing resources. A snapshot copy can include a collection of pointers that are derived from a file system or application. Each pointer will point to a specific stored data block. Collectively, these pointers represent the state and storage location of the data object as of a certain point in time, when the snapshot was created. If a block of information is to be changed or deleted, the snapshot writes it to a specific data storage location. The pointer is then directed to this location. The snapshot can store the pointers or the blocks that are pointed at by the snapshot.
Non-production copies can be distinguished from production copies in several ways. A non-production data object is created for the information management goals described in the previous section and is not used or modified directly by the applications 215A to D, hosted services 122, or the operating system. A non-production version of a data item is stored in one or more nonproduction objects 260, which may be formatted differently from the native format of the production data object. This means that the non-production object cannot directly be used by a native application or hosted service without being modified. Thirdly, non-production copies are stored on non-production storage media 265 which is not accessible to applications 215A-D on computing devices or hosted services 122. Some non-production copies are also referred to as ‘offline copies.’ They are not easily accessible (e.g. Not mounted on tape or disk. Offline copies are copies of data which the information management system is able to access without human intervention. Tapes in an automated tape library that are not yet mounted on a drive, and copies the information management system can only access with some human interaction (e.g. “Tapes stored at a remote storage facility.
The information system 150 also produces information management data 275 such as indexing, which allows the information system to perform various information management functions. As shown in FIG. As shown in FIG.
Click here to view the patent on Google Patents.