Invented by Justin-Josef Angel, Eric Alan Breitbard, Colin Neil Swann, Robert Steven Murdock, Amazon Technologies Inc
The Amazon Technologies Inc invention works as follows
The user can reduce motion sickness caused by virtual reality headsets by placing a virtual nose into their field of vision. The user can select different options to obtain nose data, or the data can be determined dynamically by using image analysis algorithms. Data analysis of image data of the face of the user can determine such aspects as size, shape and color of the nose. The model of the nose is created in three dimensions and is then treated as a virtual object. Lighting, shadows and textures can be applied to it. From the image data, the pupillary distance is calculated and then used to determine which point of view to render each part of the nose. The level of detail in the rendering can change due to changes in the lighting or expression.Background for Realistic rendering in virtual reality applications
Virtual reality devices such as goggles or headsets are developing rapidly to the point that they will soon be widely available in various consumer applications. Virtual reality headsets, which display images of virtual worlds in a virtual environment, have been shown at various events. Application developers are now preparing their release. Motion sickness is still a problem. When the perception of reality is distorted, or presented in a surprising way, it can cause motion sickness and headaches.
The systems and methods of various embodiments in the present disclosure address one or more deficiencies of conventional approaches for rendering virtual reality in an electronic environment. In particular, different embodiments allow a user to generate a nose model that can then be used by a virtual device such as a headset for rendering an appropriate nose part in the content. In certain embodiments, the user is able to “design” a nose. In some embodiments, a user can?design? In other embodiments, the nose data can also be dynamically determined by analyzing image information, such as a user’s face, in order to determine the relative positions of facial features. These feature points can be used to determine the size, location, and shape of the nose. Size and shape data are used to create a virtual nose model, such as a wireframe model or mesh. “The location of the user’s nose in the image data allows appearance data for the nose to be determined. The appearance data can include aspects such as base color, variations of color, texture and reflectivity, which can then be used to apply texture, lighting and/or shading to the nose model.
When a virtual reality device (VR) is used to display VR content, it can obtain the nose data, including mesh and texture information. Views are rendered separately for each of the eyes, so a point-of-view can be determined and then the appropriate part of the nose can be rendered based on that view. The nose can be treated like an object within the virtual environment, with lighting, shading and other effects being applied as if it were any other virtual object. The view can change as the user changes his or her gaze direction or moves their head. The amount of detail applied to the nose (e.g. resolution and texture), can be affected by factors like lighting and gaze direction. The nose can be redrawn to reflect the new expression of the user. A substantially accurate nose can be seen in the field of view of the virtual-reality device to help reduce motion sickness. The pupillary distance of a user can be calculated using the image data that has been analyzed to determine nose size and shape. In some embodiments, the pupil positions are determined by two of the feature-points generated during the feature detection process. “The pupillary distance of a user allows virtual content to render from the correct point of view (i.e. with the right amount of disparity) which can further mitigate motion sickness in some embodiments.
Below are a number of other features and benefits that may be included in the various embodiments.
FIG. In accordance with different embodiments, 1A shows an example pair of images, 102,104, that can be shown to a wearer of a virtual reality device or similar device. In such devices, each user’s eye is rendered or displayed in a slightly different way due to their different locations. This is due to the pupillary distance between the eyes of the user (or any other measure of separation or location of the eyes). The angle at which objects are viewed depends in part on the distance of the object from the eyes. The disparity is the term used to describe this varying angle as a function of distance. This results in an apparent lateral separation in both images that differs from left to right. Objects in ‘infinity’ are represented by a pixel. The difference in angle between the two images will be about zero degrees. As objects approach, the angle difference between their locations with respect to the eyes will increase. This will result in an increase in the difference of pixel location between the images. In the case of an object that is close to the eyes of the user, the object will appear significantly farther to the left in the right eye than in the left. In FIG. The portion of road closest to the user will have a larger difference in pixel locations than the other portions of road further away. This is based on the disparity between distance and the pixel locations.
This disparity in angles can be used when rendering a virtual-reality scene to show the same scene with two different viewpoints. In the driving scenario, there is a virtual (wire frame or mesh) model of the road, surrounding environment and its relative “position”. The position of each eye can be used to create the rendering of the scene. This will give the correct amount of difference between the two views. The view of each virtual three-dimensional (3D model) can be determined and then textures and lighting (as is known for these purposes) applied to render the scene. Left and right eye images may be displayed simultaneously, on two separate screens or portions of one screen, or alternately. A virtual shutter, for example, can cause the left and the right eyes to be able view content in the opposite order. This process creates a 3D environment. Virtual reality devices typically include some form of motion or orientation detection sensor. These sensors can be an accelerometer (or gyroscope), electronic compass (or electronic compass), inertial sensor (or magnetometer), and others. They provide information about the movement of the device in response to the movement of the head of the user. This allows the point of view to be updated as the device moves, so that the view presented to the users is the one expected based on their movements. Views can be rerendered at a rate of thirty or sixty frames per seconds, so that the user feels as though they are actually in the virtual world.
As mentioned, however minor variations or deviations to the expected reality can cause problems like motion sickness. This can be due to dropped frames, delayed point-of-view updates, improper lighting, shading or other aspects. In order to reduce motion sickness, it has been shown that a virtual nose can be displayed in virtual reality views. Human brains are used to detecting a user’s nose from the information that is captured by both eyes. The information is then filtered out so that the nose portion does not appear in the field. A lack of a nose in virtual reality can cause motion sickness because the brain will think that something is wrong. To date, the discussion on virtual noses in virtual reality has not provided enough information to help determine how to create the correct nose appearance in the virtual environment. The discussion also hasn’t touched on the interaction between virtual environments and virtual noses or how nose appearances should change in different environmental conditions.
Approaches according to various embodiments try to address these and others deficiencies with existing virtual-reality devices and applications. They do this by determining a nose appearance that is appropriate for a particular user and rendering the nose in a way expected by that user. The rendering can also be updated so that it appears to interact correctly with the environment. This includes not only changing the brightness, color and appearance but also the shape of the nose with an expression, or the changes in resolution and focus. The feedback loop can be used to change the rendering if the user is found looking at the nose rendering in the virtual reality device.
For instance, in the left and right images rendered in FIG. In 1B, there are two renderings of the nose of the user: a rendering on the left 126 and one on right 128. The left image 122, as expected, includes a rendering 128 showing the left side the user’s nostril in the lower-right corner of the picture, which is illuminated by the virtual sun, located to the left. The right image 124 has a rendering of the right part of the nose of the user that is somewhat shaded due to its location. The renderings are based on the information about the nose of the user, their current expression and/or similar information. The relative position of the sun in relation to the nose changes as the virtual vehicle moves along the virtual road and the user looks around the virtual world. This will cause the shading and/or lighting to change accordingly. Aspects such as resolution or focus of the renderings can also change depending on factors such as lighting, the viewing direction of a user, etc. “Various embodiments try to determine all or some of these variations, and update the rendering on the nose so that it is perceived as expected or at least reasonably normal by the user.
FIG. FIG. 2 shows an example of a virtual reality device that can be used in accordance to various embodiments. Other devices such as smart goggles, smart glasses and other virtual-reality displays and devices are also within the scope. The device in this example includes a housing made from plastic, with a lip or other portion that is intended to touch the user’s cheeks. This provides comfort for the user and also creates a seal that prevents light from entering the eyes of the wearer. The example device includes a strap or similar mechanism to secure the device on the user’s face, especially when the head is moving. The example device has a left eye screen 208, and a screen for the right eye, but as previously mentioned, in certain embodiments, these screens can be parts of a single screen, arrays of screens, or even holographic displays. In some embodiments, a single screen will be used. Each eye is provided with convex lenses and there may be one or more separation elements to limit the view of each eye. Display circuitry 218 is typically included in the device, as well as memory, processors, graphics processors, display driver software, and any other components that are known to be used for displaying content. The circuitry can be the same for both displays 206 and 208, or some components can be duplicated so that they only display content on the one display. Display screens can be of any type, including AMOLED and LED displays with sufficient refresh rates for virtual reality applications. The device may have one or more orientation and/or motion sensors 210. These sensors can include an accelerometer, magnetometer gyroscope electronic compass inertial sensor and/or any other sensor that provides data on rotation, translation and/or movement. These data can be used for determining the POV (point of view) to render the content. The example device includes at least one component 212 for communicating data via a protocol, such as Bluetooth or Wi-Fi. The communication component allows the device 200 communicate with a computer for various purposes, such as to obtain content for rendering or additional input. Other components can be included in the example device, including speakers, headsets, microphones and power components.
The example 200 device can also include cameras 220,222 or other image-capture devices to capture image data. This includes data such as light reflected from the ambient spectrum or infrared spectrum, for example. One or more cameras may be mounted on the exterior of the device in order to assist with tracking motion and determining environmental conditions. You can determine, for example, the location of light sources and the intensity of ambient light around them, as well as objects or people nearby. These can all be used to create a virtual reality scene. For instance, the lighting can be adjusted to be more environmentally friendly or the user can include objects located near him. Tracking the motion of objects in the image data captured can also help with tracking motion, since rotation and translation data from surrounding objects can indicate the movement of the device.
Further the inclusion of cameras 220,222 on the interior of the device may help determine information, such as the gaze or expression of the user. The device in this example can have at least one IR emission 224 (such as an IR led) that emits IR radiation that can be reflected back by the user. IR is chosen because it’s not visible by the user and therefore won’t be a distraction. It also poses no health risks for the user. The IR emitter 224 emits radiation that is reflected off the face of the user and can be detected by one or multiple IR detectors, or by other image capture elements 222 or 220. In certain embodiments, the captured images can be analyzed in order to determine the facial expression of the person. This may be determined by the variations in the locations of the various features on the face of the person. In some embodiments the pupil location can be determined. This can allow a determination of gaze direction. In some embodiments the gaze direction of a user can affect how objects close to or far from the center of the field of view of the viewer are rendered.
As stated, the nose that will be rendered in a virtual reality environment can either be generated or selected to suit the user. FIGS. FIGS. 3A, 3B and 3C show examples of nose renderings that can be used in accordance with different embodiments. The noses that are suitable for different users may have different features, such as different overall shapes and sizes, as well different sizes and shapes in specific areas, like the bridge, tip and nostrils. How the nose appears in different situations can depend on the size, shape and location of its nostril bump. Reflectivity (based upon the oil content of the skin), texture and other factors may also vary from user to user. In some embodiments, a user may be given a choice of virtual noses and asked to select the one that best matches their nose. For example, a user can be presented with a number of different nose shapes, such as those shown in FIGS. The user can choose the most suitable nose from the set of three shown in FIGS. In some embodiments, the user can modify the shape of selected noses, for example, by adjusting the size or shape a particular portion of the face. This may affect the nose model that is used to render the image. A user may also be able to choose a color, for example by choosing from a palette or selecting a specific color using a slider. The user can store the nose size and shape data as nose model data, and the color as texture data or other data for rendering virtual content using a device like that described in relation to FIG. 2.
In other embodiments the nose data of the user can also be determined by using the image data that was captured. A user may, for example, take a selfie, or a picture of their face. This includes at least a portion of the face of the user. Various embodiments can capture a video stream, a burst or slew of frames. Image data taken from different angles or stereoscopic images can provide more accurate shape data, or at the very least data in three dimensions. An algorithm or process for facial recognition can be used to analyze an image (or image-data) in order to determine the location of different facial features. In FIG. 400, a situation 400 displaying an example of a set of feature points can be seen. 4A. In this example, the points 402 of specific facial features are determined. These points 402 are arranged in a certain way that helps determine facial features, such as expressions and facial structure. These feature points correspond to the nose of the user, as shown in the illustration. These points can be used to determine the nose’s general shape and size (at the very least in relation to the user?s face or head). The user’s nose model can be generated using these points, and/or the size and shape determined. In some embodiments, the symmetry of a nose can be considered so that only one side of a user’s nose needs to be stored. This may include the data for the bottom, top, right and left extremities of one side, along with the center point. In some embodiments, data from a complete model can be saved to account for asymmetries of the user’s nasal. The points identifying the location of a nose in an image can be used to analyze the area of the image that corresponds to the nose to determine its appearance. This may include factors like color, skin texture and reflectivity. These data can be saved as texture data and mapped to the nose model in the rendering process. This allows the user to determine the size, shape and appearance of their nose with little effort. The image can either be captured with the VR device or another computing device or digital camera. Alternatively, an existing image could be uploaded for analysis.
As mentioned, the shape and size of the nose of the user can change with their expression. As an example, in FIG. The relative location of the feature points 422 for a user can be determined from a second picture (or video frame), where the user is laughing or smiling. As can be seen, compared to FIG. The relative positions of many facial features have changed in relation to FIG. You can see that the relative position of the points which correspond to the nose has also changed. In some embodiments, it may be desirable to capture multiple frames of video, multiple images or multiple streams of video data in which the user displays multiple expressions. You can instruct the user to express certain emotions at specific times. These can then be captured, analyzed and the positions of the expressions can be determined. In other embodiments, the user may be asked to make different expressions and the facial analysis algorithm will determine the correct expression using trained classifiers. The camera inside the virtual reality device, or another location of this type, can monitor the relative positions between visible facial features in order to try to determine the current expression of the user. This can then be used to cause the nose rendered within the VR environment to use the nose model that is relevant to the expression. In some embodiments, a library of different nose models for users can be kept, while in others, a single model that contains information about various facial expressions can be stored. In some embodiments, a nose model is determined first and then modified using standard animations or expression modifiers. “In many cases, the changes to nose shape are so subtle that an animation or modeling approach can be sufficient in some instances.
FIG. The figure 5 shows an overview of a process 500 that is an example for determining nose information for an individual user. This can be used according to various embodiments. Within the scope of various embodiments, it is understood that fewer, alternative, or additional steps can be performed in a similar or alternate order, or in parallel. In this example image data containing a face of the user is obtained 502. This can be a user supplying an image of the user’s facial features, or retrieving an image stored in a profile or account. In other embodiments, this can be instructing the user to capture an video or image (using a webcam, a computer device’s camera, etc.). Showing a view of a user’s face including their nose from one or multiple points of view. In certain embodiments, the user may be instructed to pan the camera around the face of the user while recording video or other images. Image data can be analyzed to identify facial features or feature points in the image. These algorithms can include principal component analysis algorithms (PCA), machine learning algorithms and Hidden Markov Model analysis. These algorithms can then be used to determine the relative position of facial features in an image. After a set facial feature points has been determined, it is possible to determine the part of the image that corresponds to the user’s nostril. In some embodiments, the algorithm’s output will include not only a list of facial feature points but the particular feature point (i.e. tip of nose), which was identified. In this example, the area of the image that is bounded by the points can be considered the nose portion. After determining the region of an image that corresponds to the user’s nostril, it is possible to determine one or more aspects. For example, the size and shape can be determined 508 based on the relative location of the feature points of the nasal area, as well as 510 the appearance characteristics of the skin tone, color, and texture. For example, this can be done by analyzing pixel values within the nose region, using skin texture and color analysis. In some embodiments, a base skin color for the nose may be determined. This can be done by, for example, determining the average of all the colors of pixels within the nose region or a particular location on the nose. In some embodiments, a hue-map or another mapping that stores the deviations from the base nose colour for each position of the nose (or polygon) can be generated. In certain embodiments, at least one photo can be taken using a flash. This can provide an indication of how reflective the nose is. The color and illumination can then be compared between the two images by using a reflectivity algorithm or another similar process. Other information can also be calculated using the facial features identified. The pupillary distance of the user can be determined 512 by the features that correspond to their eyes. This can then be used for generating virtual reality visuals with accurate points-of-view customized to each individual. The nose data, including the nose model, color and texture, pupillary data, and similar data, can be stored in 514 for each user. This data can be used by a virtual reality system or device to render a part of a nose. The nose data may be stored on a virtual device’s memory, but it is more likely to be stored on a computing device that belongs to the user, or in remote systems or services.
FIG. The figure 6 shows an example of a process 600 that renders a user’s nose in a virtual environment. In this example, the virtual reality content that will be rendered to the user 602 can be determined. Content can include information about the virtual reality environment as well as any objects that will be displayed within it. Content can be determined at least partially based on a selected game or application, and can be obtained via any suitable source, including a computing device directly connected to a virtual-reality device, or from a remote server or system configured to deliver the content to the device. The virtual reality content can be complemented by nose data that is useful in rendering the nose of a user within the virtual environment. It can be, for instance, to determine the user for whom a nose should be rendered. It can be done by logging in or configuring the device or performing facial recognition with at least one camera on the device. The nose model that is relevant for the user 604 can be determined, such as obtaining nose model information using a procedure such as the one described in relation to FIG. The nose model information generated using a process such as that described with respect to FIG. 5 can be obtained 604 and associated with the user account. In some embodiments, a virtual reality device’s camera can capture image data which can be used in real-time to determine nose models. The pupillary range for the user 606 can be determined using a similar process to that used for obtaining a nose model of the user, as discussed elsewhere in this document.
Other user data that can be used to render virtual reality visuals, can also be determined. For example, 608 data can be determined for shading the nose models, which can include information such as skin tone, texture and/or the reflectivity of the nas. These data can also be obtained in the same way as described with regard to FIG. This information can also be obtained as discussed with respect to FIG. You can also determine other information that is useful for determining how to render the nose. The lighting effects to be applied to the nose, for example, can be determined. This can include determining relative directions of light sources in the virtual environment (e.g. point, unidirectional or omnidirectional light sources), determining brightness and color, etc. In some embodiments, the current expression of a user can be determined 612. This is done by using image data captured from the user with at least one VR device camera and performing a facial detection process to determine relative locations of facial features. The features can then be processed by a trained classification algorithm or another such process to determine the current expression. This can determine which version or state of user nose model is used for rendering. These steps are described in a sequential order, but it is to be understood that some of the steps, such as 604, 606, 608 610 and/or 612, can be performed in different sequences or concurrently in certain embodiments. For each virtual reality panel (i.e., left and right eye displays) a perspective-appropriate view of the portion of the user’s nose can be rendered 614 with the VR content, with the perspective-appropriate views being rendered using the pupillary distance for determining the appropriate viewpoints for each panel. This can include determining which portion of the nose should be displayed based on a user’s point of view and field of view. This can include mapping the appropriate color and texture (or skin tone) onto the nose model and using lighting and reflectivity data to shade or light the nose portion. The left and right views can be rendered separately as the lighting effect will vary depending on the point of view. The brightness of the surrounding environment and other factors can affect the resolution of the nose and the focus. In a dark environment, for example, the only color that can be used is skin tone, while in brighter environments, or in other situations, it’s possible to render a high-quality virtual nose using a texture map and tone map. The bright side of the nostril could be rendered at high resolution, while the darker side would be rendered at low resolution if the lighting was bright on one side. The nose can be treated like any other object within the scene. This means that lighting, shading and reflection effects are applied to the nose based on its position, the portion visible from each view point, etc. The rendering can also be updated, or new renderings created, as the user moves their head, changes the lighting, changes the expression of the user, or any other changes that occur, as discussed elsewhere.
In addition, image data from the user’s face can also be used to calculate pupillary distance. The pupillary, or distance between two eyes, of a person is crucial for some virtual reality experiences. It determines how the VR images will be rendered, and the point of view. The pupillary range is fixed for all users in conventional systems, but there can be large variations in pupillary ranges, which may result in inaccurate views, and even motion sickness. The user can get a more accurate view by determining the pupillary distance and then using it to determine which points of view to render right and left images. The same process (discussed above) that is used to determine facial features, such as the size and shape of the nose, can be used to calculate pupillary distance. As an example, FIG. The distance between the points in 4A, which are located at the center each pupil can be used to measure the pupillary length. This pupillary measurement can also be used for customizing glasses or goggles. For example, the distance between the lenses, their orientation and other factors can be optimized. Even if a precise scale of an image is not possible, the relative measurement of distance in relation to other features of the user’s facial features can be used as an approximation. It is also important to know the distance between the nose center and the eyes. This can be based at least partly on accurate eye location information. This distance can be used to determine the expected nose size from a person’s perspective.
As mentioned, one or more cameras can be placed on the VR devices themselves to capture certain types of data, such as facial data to determine the correct nose data, or expression and movement data to render an appropriate nose in different situations. One or more cameras can be placed on the front of the VR device to determine the environment and also to capture the user’s face as they pick up the device. Cameras can be positioned in an inner area of the device to capture IR reflected by the pupils to perform gaze tracking and other actions. A camera or cameras can be placed at other locations or on the side to capture data such as the amount of sweat or expressions the user makes while in the VR experience. These cameras are likely IR-based to minimize distractions for the user. They can be positioned in various locations to capture the face of the user from different angles.
In some embodiments, the rendering (including the resolution or texture used to render a virtual nose) may depend on the gaze direction, viewing location etc. The user’s position with respect to VR content or the displays of the VR device that displays the content. In certain embodiments, a computing device uses at least one camera or other image capture elements to image at the very least a portion a user. Image capture elements can use ambient light around the device or the user or light from an LED, display element or other electronic component. In other embodiments at least one element captures the infrared radiation (IR) or any other type of radiation emitted by a component, such as IR LEDs or laser diodes, or reflected from the user. In some embodiments both an ambient-light camera and one or several infrared sensors are used to determine relative positions and/or movements.