Invented by Adrian Murtaza, Harald Fuchs, Bernd CZELHAN, Jan PLOGSTIES, Matteo AGNELLI, Ingo Hofmann, Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
The Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV invention works as followsIn one example, a system is equipped with at least one video decoder that can decode video signals to display VR, AR or 360-degree environments to the user. At least one audio encoder is included in the system to decode audio streams. The system is configured for requesting at least one audiostream and/or audio element of an Audio stream, and/or adaptation set from a server based on at least the current viewport of the user and/or his head orientation, and/or movement and/or interaction metadata or virtual positional data.
Background for Optimizing Audio Delivery for Virtual Reality Applications
In a Virtual Reality environment (VR), or similar in an Augmented Reality or Mixed Reality (MR), or 360-degree Video environments, the user can usually visualise 360-degree content, using for example a Head Mounted Display, and listen to it on headphones (or loudspeakers with correct rendering depending on its position).
In a simple case, content is created in a manner that reproduces only one audio/video (360-degree video, for example) at a given moment. The audio/video has a fixed position (e.g. a sphere, with the user in the middle), and the user cannot move within the scene. He can only rotate his head to different directions (yaw pitch roll). The user’s head orientation determines the video and audio that is displayed (different viewports).
While video is delivered in 360-degrees, it also includes metadata to describe the rendering process, such as stitching information and projection mapping. Audio content is not selected according to the user’s current viewport but rather the entire scene. The audio content is adapted based on metadata to the user’s current viewport. For example, an audio object may be rendered differently depending on the information about the user orientation or viewport. It is important to note that 360-degree media refers to content that can be viewed from more than one angle simultaneously (e.g., by the user’s head or using a remote control device).
In a more complicated scenario, the user can move around in the VR scene or “jump” The audio content may also change from one scene into the next (e.g. audio sources that are not audible on one scene could become audible on the next scene). ?a door is opened?). With the existing systems, audio scenes can be encoded in one stream, and if necessary, in additional streams (depending on the main stream). These systems are called Next Generation Audio Systems (e.g. MPEG-H3D Audio). Some examples of these use cases include:
For the purposes of describing this situation, the notion of Discrete Viewpoints is introduced. This is a discrete location (or VR environment) for which audio/video content can be accessed.
The ?straight-forward? “The’straight-forward’ solution is to use a real time encoder that changes the encoding (numbers of audio elements and spatial information) based on feedback from the playback device about user position/orientation. based on feedback from the playback device about user position/orientation. This would require, for example, in a streaming-environment, a complex communication between client and server.
The complexity of such a system is beyond the capabilities and features available in equipment and systems today, or in those that will be developed within the next decade.
The content that represents the entire VR environment (?the whole world?) The content representing the complete VR environment (?the entire world?) could be delivered continuously. The problem would be solved, but the bandwidth required could be so high that it is not possible to use the existing communications links.
This is a complex use case for a real time environment. Alternative solutions that are low-complexity enable this functionality.
2. “Terminology and Definitions”.
The following terms are used in technical fields:
In this context, the notion of Adaptation Sets is used more generically and sometimes refers to the Representations. The media streams (audio/video) are also encapsulated into Media segments, which are the actual files that are played by the client. Media segments can be in a variety of formats, including ISO Base Media File Format, which is similar to MPEG-4 container format or MPEG-2 transport stream (TS). The encapsulation into Media Segments and in different Representations/Adaptation Sets is independent of the methods described in here, the methods apply to all various options.
The methods described in this document are based on a DASH server-client communication. However, they can be used with any other delivery environment, including MMT, MPEG-2, DASH-ROUTE and File Format.
In general, an adaptation set can be referred to as a layer higher than a stream, and it may contain metadata (e.g. associated with positions). A stream can contain a number of audio elements. A scene audio may be associated with a number of streams as part of multiple adaptation sets.
3. Current Solutions
Current solutions include:
The current solutions only allow users to rotate in VR but not move.
According to one embodiment, the system for a 360-degree video, virtual reality, mixed reality or augmented reality environment may include:
According to a second embodiment, the system may include:
One embodiment” may include a server that delivers audio and video streams for a virtual environment (VR), augmented environment (AR), mixed reality (MR), or 360-degree video, and the video and audio stream to be reproduced on a media consumption device.
Another embodiment can include: a server that delivers audio and video streams for a virtual environment such as VR, AR, MR or mixed reality. The video and audio streams are then reproduced on a media consumption device.Click here to view the patent on Google Patents.