Invented by Deepak S. Vembar, Atsuo Kuwahara, Chandrasekaran Sakthivel, Radhakrishnan Venkataraman, Brent E. Insko, Anupreet S. Kalra, Hugues Labbe, Altug Koker, Michael Apodaca, Kai Xiao, Jeffery S. Boles, Adam T. Lake, David M. Cimini, Balaji Vembu, Elmoustapha Ould-Ahmed-Vall, Jacek Kwiatkowski, Philip R. Laws, Ankur N. Shah, Abhishek R. Appu, Joydeep Ray, Wenyin Fu, Nikos Kaburlasos, Prasoonkumar Surti, Bhushan M. Borole, Intel Corp
The Intel Corp invention works as followsAn embodiment may include a processor and memory communicatively connected to the processor. A collaboration engine communicatively attached to the processor can identify a shared graphic component between two users in an environment and share the shared components with the users. The collaboration engine can include a variety of components, including a centralized or depth sharer as well as a shared preprocessor. “Other embodiments are disclosed, and claims.
Background for Collaborative Multi-User Virtual Reality
Current parallel graphic data processing includes methods and systems developed to perform specific operations such as linear interpolation. tessellation. rasterization. texture mapping. depth testing. Graphics processors traditionally used fixed-function computational units to process graphics. However, in recent years, some portions of graphics processing processors were made programmable. This allows them to support a greater variety of operations when processing vertex or fragment data. Graphics processors can be used for various VR applications.
In the following description, many specific details are provided to give a better understanding of the invention. One skilled in the art will understand that the invention can be implemented without any of these details. Other times, well-known features are not described to avoid confusing the present invention.
FIG. “FIG. A processing subsystem 101 is included in the computing system 100. It includes one or more processors 102 and a system storage 104. They communicate via an interconnection path, which may include a memory hub (105). The memory hub 105 can be either a separate component of a chipset or integrated into the processor(s) 102. Through a communication link 106, the memory hub 105 can be connected to an I/O system 111. I/O subsystem 111 also includes an I/O hub 107 which can allow the computing system 100 receive input from one or multiple input devices (108). The I/O hub (107) can also enable a display controller to be connected to the processor(s). 102 to provide outputs to the display device(s). 110A. One embodiment of the I/O hub107 and one or more display devices 110A can be combined with an embedded, local, or internal display device.
In one embodiment, the processing subsystem 100 includes one or multiple parallel processors 112 coupled with memory hub 105 by a bus or another communication link 113. Communication link 113 can be any of a number of standard-based communication technologies or protocols such as PCI Express or vendor-specific communications interfaces or communications fabrics. In one embodiment, the one or multiple parallel processors 112 form a computingly focused parallel or a vector processing system. This may include a large amount of processing cores or clusters such as a multi-integrated core (MIC). In one embodiment, the one-or-more parallel processor(s 112) form a graphics subsystem which can output pixels via the I/O hub 107 to the one (or more) display device(s 110A. The one or multiple parallel processors 112 may also include a display interface and controller (not shown), allowing a direct connection with one or several display devices 110B.
In the I/O subsystem 112, a system storage device 114 can be connected to the I/O hub107 to provide a storage system for computing system 100. An I/O switch (116) can be used as an interface mechanism to allow connections between I/O hub107 and other components. This includes a network adapter/wireless network adapter (118/119), which may be integrated into platform, and many other devices that can added via add-in devices 120. The network adapter (118) can be either an Ethernet adapter, or another wired networking adapter. Wireless network adapter 119 may include one or several of Wi-Fi, Bluetooth or near field communication (NFC) devices or any other network device that includes one, or more, wireless radios.
The computing system 100 may include additional components that are not shown. These could include USB or other ports connections, optical storage drives and video capture devices. The communication paths connecting the components of FIG. 1. may be implemented using any of the suitable protocols, including PCI (Peripheral Complement Interconnect) based protocols (e.g. PCI-Express) or any other bus/point-to-point communication interfaces or protocols(s), like the NV Link high-speed interconnect or interconnect protocols that are known in the art.
In one embodiment, one or more parallel processing units 112 include circuitry that is optimized for graphics and videos processing, such as video output circuitry. This constitutes a graphics processor unit (GPU). In a second embodiment, the parallel processors 112 include circuitry that is optimized for general-purpose processing while maintaining the computational architecture described herein. In a further embodiment, the components of computing system 100 can be integrated on an integrated circuit with other system elements. The one or more parallel processing units, 112 memory hub 105 and processor(s) 102 can all be integrated in a single integrated circuit. The components of the computing systems 100 can also be integrated in a single package, forming a “system-in-package” (SIP) configuration. In one embodiment, at least a part of the components of computing system 100 may be integrated into a Multi-chip Module (MCM), and this MCM can then be connected to other MCMs into a modular computer system.
It will be understood that the computing system 100 illustrated herein is only an illustration and that modifications and variations are possible. You can modify the connection topology including the number and arrangement for bridges, processor(s), 102, and parallel processor(s), 112, as needed. In some cases, system memory (104) is directly connected to processor(s), 102 via a bridge. Other devices, however, communicate with system memory (104) via the memory hub and processor(s), 102. Other topologies allow the parallel processor(s), 112, to be connected to either the I/O hub107 or directly one or more of the processor(s), 102 rather than the memory hub105. Other embodiments may include the I/O hub (107) and memory hub (105), which can be combined into one chip. One embodiment may have two or more processor(s), 102 connected via multiple sockets. These processors can be paired with one or more instances the parallel processor(s), 112.
Some of the components shown in this document are optional, and may not be included with all implementations 100 of the computing system. Some components, such as add-in cards and peripherals, may not be included in all implementations of the computing system 100. Some architectures use different terms for similar components to those shown in FIG. 1 . In some architectures, for example, the I/O hub may be called a Southbridge, and the memory hub may be called a Northbridge.
FIG. 2A shows a parallel processing 200 according to one embodiment. The parallel processor 200 can be implemented by one or more integrated devices such as programmable CPUs, ASICs or FPGAs. The parallel processor 200 illustrated in FIG. is a variation of the parallel processor(s), 112, shown in FIG. “1, according an embodiment.
In one embodiment, the parallel processor 200 includes an I/O unit 204 that allows communication with other devices including other instances of the parallel processing unit. A parallel processing unit also includes an I/O 204 which allows communication with other devices, such as other instances of parallel processing unit 200. The I/O device 204 can be connected directly to other devices. One embodiment of the I/O device 204 connects to other devices using a switch or hub interface, such memory hub 105. The communication link 113 is formed by the connections between the I/O units 204 and the memory hub 105. The parallel processing unit 200’s I/O unit204 connects to a host interface (206) and a memory crossbar (216). Here, commands are received from the host interface to perform processing operations, while commands to the memory crossbar 226 are directed at performing memory operations.
When the host 206 receives commands via an I/O unit, it can direct the front end 208 to execute those commands. In one embodiment, the front end 208 is coupled with a scheduling 210 that distributes commands or other items of work to a processing array 212. In one embodiment, the scheduler ensures that the cluster array is configured correctly and in a valid condition before tasks are distributed. In one embodiment, the scheduler is implemented by firmware logic running on a microcontroller. The scheduler 210 implemented on a microcontroller is configured to perform complex work distribution and scheduling operations at fine and coarse granularity. This allows for rapid context switching and preemption of threads running on the processing array. In one embodiment, a host software can schedule workloads on the processing array using one or more graphics processing doorbells. The scheduler logic in the scheduler microcontroller can automatically distribute the workloads across the processing array.
The processing cluster array 212 can contain up to?N” Processing clusters (e.g. cluster 214A and cluster 214B through cluster 214N). The clusters 214A-214N can run a large number concurrent threads. The scheduler 220 can assign work to clusters 214A-214N from the processing cluster array 212, using different scheduling and/or work allocation algorithms. These may vary depending upon the type of program or computation being executed. The scheduler 210 can manage the scheduling dynamically or in part with compiler logic when compiling program logic for execution by processing cluster array 212. One embodiment allows for different types or types of programs to be processed by different clusters 214A-214N.
The processing cluster array 212 can be configured to perform different types of parallel processing operations. One embodiment of the processing cluster array is designed to perform general-purpose parallel computation operations. The processing cluster array 212 may include logic that can be used to perform processing tasks such as filtering video and/or audio, modeling operations including physics operations and data transformations.
In one embodiment, the processing cluster array212 is designed to perform parallel graphics processing operations. The processing cluster array 212 may include additional logic that supports the execution of graphics processing operations in embodiments where the parallel processor 200 can be configured to execute them. This includes texture sampling logic to perform texture operations as well as tessellation and other vertex processing logic. The processing cluster array 212 can also be used to run graphics processing-related shader programs, such as vertex shaders and tessellation shadesrs, geometry shadingrs, and pixelshaders. Parallel processing unit 202 can transfer data to system memory via I/O unit. 204 is used for processing. The transferred data can be saved to on-chip memory during processing (e.g. parallel processor memory 222) and then written back into system memory.
In one embodiment, the parallel processing unit (202) can be used to process graphics processing. The scheduler 210 can be set up to split the processing workload into roughly equal-sized tasks to make it easier to distribute the operations to multiple clusters 214A to 214N of the processing cluster array 212. Some embodiments allow for different processing to be performed on portions of the processing cluster array 212. One portion of the processing cluster array 212 may be used to perform topology generation and vertex shading. A second portion might be used to perform geometry shading and tessellation. The third portion could be used to shade pixels or perform other screen space operations to create a rendered image. Buffers may be used to store intermediate data from one or more clusters 214A to 214N, which can be transmitted between the clusters 214A to 214N for further processing.
The processing cluster array 212 may receive processing tasks during operation. This is done via the scheduler210. The scheduler receives commands from the front end 208 defining processing tasks. Processing tasks are data that describe how data will be processed. These include data indices, such as surface (patch), primitive, vertex, and/or pixels data. Also, state parameters and commands can be used to define the process (e.g. what program is to run). The scheduler 210 can be set up to retrieve the indices for the tasks, or it may get them from the front-end 208. Front end 208 can also be configured to ensure that the processing cluster array (212) is in a valid state prior to the workload specified by incoming commands buffers (e.g. batch buffers, push buffers etc.).
Each one or more instances the parallel processing unit 200 can be paired with parallel processor memory 222. The memory crossbar 226 can access the parallel processor memory 222. It can also receive memory requests from both the processing cluster array 212, and the I/O device 204. A memory interface 218 allows the memory crossbar to access parallel processor memory 221. Multiple partition units can be included in the memory interface 218 (e.g. partition unit 220A and partition unit 220-B) which can each couple to a specific portion of parallel processor memory 221 (e.g. memory unit 220-N). One implementation has the number 220A-220N partition units equal to the number memory units. For example, a first partition 220A has a corresponding 1 memory unit 224, a second 220B has the corresponding 284B memory unit, and an Nth 220N has the corresponding 284N memory unit 224. Other embodiments may have the number 220A-220N not equal to the number memory devices.
In different embodiments, the memory unit 224A-224N may include various memory devices such as dynamic random access memory or graphics random memory (DRAM), or synchronous graphics random address memory (SGRAM), and graphics double data rate memory (GDDR). The memory units 224A-249N can also contain 3D stacked memory. This includes but is not limited to high-bandwidth memory (HBM). The specific implementation of the memory unit 224A-224N may vary and can be chosen from a variety of conventional designs. Render targets such as texture maps or frame buffers may be stored across memory units 224A?224N. This allows partition units 220A-220N the ability to write portions of each render goal in parallel, which makes efficient use of the parallel processor memory 222. A local instance of parallel processor memory 222, may not be used in certain embodiments. Instead, a unified memory design which uses system memory and local cache memory is used.
In one embodiment, any cluster 214A-214N in the processing cluster array212 can process data that will then be written to any memory unit 224A-224N within parallel CPU memory 222. The memory crossbar 221 can be used to transfer each cluster’s output 214A-21N to any partition unit 220-222N or another cluster 214A-21N that can perform additional processing operations. Each cluster 214A to 214N can communicate through the memory crossbar 218 to access and write to external memory devices. One embodiment of the memory crossbar 2216 includes a connection with the memory interface 218 for communication with the I/O device 204 and a connection with a local instance the parallel processor memory 222. This allows the processing units in the different processing clusters (214A-214N) to communicate with other memory, or system memory, that is not local to parallel processing unit 200. One embodiment of the memory crossbar 221 can use virtual channels in order to separate traffic streams between clusters 214A-214N, and partition units 220A-220N.
Click here to view the patent on Google Patents.