Invented by Hae-Jong Seo, Abhishek Bajpayee, David Nister, Minwoo Park, Neda CVIJETIC, Nvidia Corp
The Nvidia Corp invention works as followsIn various examples, the deep neural network is trained to detect sensor blindness using a context and region-based approach. The DNN can compute blindness regions or areas of compromised visibility using sensor data. It may also calculate blindness classifications, blindness attributes and/or blindness locations. The DNN can also predict the usability of each instance for one or more operations, such as those associated with semi-autonomous driving or autonomous driving. The outputs from the DNN can be combined to filter out instances or portions of instances that are compromised. This may result in inaccurate or ineffective results when performing one or more system operations.
Background for Deep neural network processing to detect sensor blindness in autonomous machine applications
Autonomous systems and advanced driver assist systems (ADAS), such as cameras, may be used to perform various tasks, including lane keeping, lane switching, lane assignment and camera calibration. In order for autonomous systems and ADAS to function independently and efficiently it is possible to generate an understanding of the environment surrounding the vehicle either in real-time, or close to real-time. It is essential that the sensors produce unobscured data in order to accurately and efficiently determine the environment around the vehicle (e.g. images, depth maps). Sensors’ ability to perceive their surrounding environment can be affected by many factors, including sensor blockage, such as from debris or precipitation. Sensor blindness can be caused by blur or glare. Snow, rain, glares, sun flares and other factors can cause sensor blindness.
These conventional methods rely primarily on computer vision techniques?such as by analyzing the absence of sharp edge features (e.g., sharp changes in gradient, color, intensity) in regions of the image. They then pieced together these features to determine that there is a sensor-blindness event. These methods are based on computer vision, such as analyzing sharp edge features in an image (e.g. sharp changes in color, gradient, or intensity), using low-level feature analyses, or binary support vector machines with blind versus non-blind output. These feature-based techniques are not scalable due to the need to analyze each feature separately, for example, to determine whether it is relevant to sensor blockage or not. They also require an analysis to determine how to combine different features to create a sensor blindness condition. Due to the computational cost of these conventional approaches, for example, they are rendered unsuitable for real-time or nearly real-time deployment.
Further conventional systems may not be able to distinguish between different types or sensor blindness such as blurred images and occluded images. In treating all types of sensor blindness the same, it may lead to less severe or harmful ones being deemed unusable, even if this is not entirely accurate. For example, a blurred picture may be suitable for certain operations, but an occluded one may not. By using computer vision techniques which are hard-coded conventional systems will be unable learn from past data or learn over time during deployment. This may limit their ability to adapt to different types of sensor blindness.
The present disclosure includes embodiments of deep neural networks for sensor blindness detection. The present disclosure discloses systems and methods that employ region-based detection techniques for detecting and classifying blindness regions in images or other sensor data representations of an environment. These are used by autonomous vehicles, semiautonomous cars, water vessels and robots to make autonomous and semiautonomous control decision.
In contrast with conventional systems such as those described above the system of this disclosure may implement deep learning (e.g. using a DNN, such as a CNN) that detects, predicts, and reports contextual information indicative to sensor blindness in order to inform decisions regarding the usability and usability degree of collected sensor data. The system of the current disclosure, in contrast to conventional systems can identify sensor blindness by using a machine-learned approach that leverages the importance of blindness within certain regions. Blindness regions can refer to any implicit or explicit regions that are represented by sensor data. This includes, but is not limited to, a driving-surface region, an ego vehicle region, a skies region, etc.
The approaches described herein may allow the identification and classification (e.g. regions with impaired visibility or other impairments) of sensor blindness areas in situations where conventional approaches would fail. For example, where the sky, or the ego vehicle, is the source of blindness and the blindness does not actually affect the usability or sensor data. The output of a DNN can also include sensor data regions that the system has identified as being unusable. This requires little or no post-processing to determine if the blindness in the region is fatal for driving. In comparison to conventional systems, significant computing power can be saved, and processing requirements reduced. This allows for a faster run-time and reduces the overall burden of the system.
Systems and Methods are disclosed related deep neural network processing in autonomous machine applications. The systems and method described herein can be used for augmented reality, VR, robotics and security, medical imaging, semi-autonomous or autonomous machine applications and/or other technology areas where sensor blindness may be implemented. The present disclosure can be described in relation to an autonomous vehicle 700, (also referred to as “vehicle” 700 herein). The present disclosure may be described with respect to an example autonomous vehicle 700 (also referred to herein as?vehicle 700? or?autonomous car 700? An example is given in FIGURES. This is not meant to be restrictive. The systems and methods described in this document can be used, for example, by non-autonomous cars, semi-autonomous cars (e.g. one or more adaptive driving assistance systems (ADAS), robots and warehouse vehicles.
Detection of Camera Blindness in Deployment”.
As described in this document, unlike conventional approaches for sensor blindness detection the current system detects sensor blindness and classes it using machine learning models that are trained to analyze several features of an image, either in real-time, or in near-real-time. Deep neural network processing (DNN) can be used, for example, to detect sensor blindness. The causes of the blindness may also be different in each image. Two or more regions of an imaging may be classified blindly or partially blindly, and the cause, such as blurred or occluded images, can then be determined. The DNN can predict the future. The DNN can also be trained to produce a binary output (e.g. true or false) indicating whether sensor data is deemed useful for a specific application (e.g. autonomous or semi-autonomous driving (ADAS). In some embodiments, the DNN output may include a saliency chart that shows the specific regions of the image where it has detected blindness. The DNN can identify and classify sensor blindness more accurately in multiple regions of sensor data representations, such as images, depth maps etc. In embodiments, the process of determining and classifying sensor blindness may be computationally less expensive than conventional systems.
Sensor data (e.g. images from image sensors or depth maps from LIDAR sensor) can be used to detect and classify sensor blindness. Sensor data (e.g. cameras, LIDAR, RADAR, etc.) may be obtained from sensors. The sensors may be mounted on or associated with the vehicle. The sensor data can be used to train a neural net (e.g. a DNN such as a Convolutional Neural Network (CNN)), which will identify areas of interest in a sensor data representation that relate to sensor blindness as well as the causes (e.g. blurred, blocked etc.). The neural network can be a DNN that is designed to identify blindness markers, as well as output classifications identifying where the sensor blindness could be located in the sensor data.
In some cases, the DNN can output channels that correspond to different classifications. The channels can correspond to labels such as blurred areas, blocked areas, reflective areas, open zones, vehicles, skys, frame labels, etc. and they may also correspond to attributes such as rains, glares, broken lenses, light, mud paper, people, etc. In some examples the neural network can output a binary (e.g. True/False or Yes/No) decision that indicates that the sensor information is at least partially useful (e.g. true, yes, zero), and a second binary decision that indicates that the data is not useable (e.g. false, no, one). The sensor data that is not usable may be ignored, skipped, or used to determine whether to give control back to the driver in semi-autonomous or autonomous applications.
After the DNN has output the blindness regions and binary decisions, there are a number of steps that can be taken to determine if the sensor blindness is fatal or not. Post-processing can be done on the DNN outputs to determine blindness. If the blindness is above a certain threshold, the data will not be suitable for autonomous driving or semi-autonomous. In these examples, corrective actions may be taken such as handing over control of a vehicle to the driver.
In some embodiments, a DNN can generate a saliency maps per image frame. This map may indicate spatial regions in an image which the DNN deems important for autonomous driving or semi-autonomous. The DNN, for example, may learn, e.g. during training, that the road in an picture is a region that is more important, while the trees or sky are regions that are lesser important. A motion-based algorithm for sensor blindness detection may also be used in certain examples to validate the DNN’s results. Motion-based sensor blindness algorithms can use feature tracking to compare consecutive images in time and determine whether sensor blindness is present in certain regions of an image. A motion feature analysis of a region of pixels blocks can be performed to determine the likelihood that a particular region contains sensor blindness. In some cases, a Kanade-Lucas-Tomasi algorithm (KLT) may be used to perform motion tracking analysis. In these examples, few to no features can be generated in the image if it is blurred or blocked over a series of images. The number of feature tracks detected for each block of non-overlapping pixels as well as how many consecutive images were used to detect the features can be analysed. If many feature tracks are detected over time, then the likelihood that a pixel block will be blurred or blocked, is low. The number of blurred or blocked pixel blocks can be counted in order to calculate the percentage of an image that has been affected by sensor blindness. This percentage can be compared to the blindness percentage produced by the DNN, e.g. using an agreement check component or agreement verifier, in order to check the accuracy and reliability of the DNN results with respect important regions of image determined using the Saliency Map.
Referring to FIG. 1, FIG. In accordance with certain embodiments of this disclosure, FIG. 1 shows an example data flow chart illustrating a process 100 for sensor-blindness detection in autonomous machines. The detection types described in this document with reference to FIG. This is only an example. The process 100 can be used, for example, to detect and classify any attributes or causes of sensor blindness such as those described in the present disclosure.
The process 100 can include receiving or generating sensor data 102 via one or more sensors on the vehicle 700. Sensor blindness can be detected and classified in real time or near real time using the sensor data 102. Sensor data 102 can include data from all sensors on the vehicle 700, or other objects (such as robots, VR systems, AR Systems, etc. in some examples). Referring to FIGS. The sensor data 102 can include, for example, data from global navigation satellite system (GNSS), sensor(s), 758, ultrasonic, sensor(s), 762, LIDAR, sensor(s), 764, and inertial measurement units (IMU), sensor(s), 766, which may be accelerometers, gyroscopes, magnetic compasses, magnetometers, etc. The sensor data 102 may include, without limitation, the data generated by global navigation satellite systems (GNSS) sensor(s), 758 (e.g. Global Positioning System sensor(s), RADAR sensor(s), 760, ultrasonic sensor(s), 762, LIDAR sensor(s), 764, inertial measurement unit (IMU) sensor(s), 766 (e.g. accelerometer(s), gyroscope(s), magnetic compass(es), magnetometer(s), es), es), es), 766 (es), 766 (es), 766 (es), 766 (es), es), 766 (es), es), es), es), es), es),,,,,,,,,,,,,,,,. Sensor data 102 can also include virtual sensor information generated by any number of virtual sensors on a virtual vehicle. In this example, virtual sensors can correspond to a vehicle or another virtual object within a simulated or virtual setting (e.g. used for testing and training neural networks, or validating their performance), while virtual sensor data represents sensor data captured by virtual sensors in the simulated or virtually environment. By using virtual sensor data, machine learning models 104 described in the present invention can be trained, tested and/or verified using simulated data within a simulated or virtual environment. This allows for testing of more extreme scenarios away from a real world environment where such tests are less safe.
The sensor data 102 can include image data that represents an image (or images), image data that represents a video (e.g. snapshots of videos), and/or sensors data which represent representations of sensor fields (e.g. depth maps for LIDAR, value graphs for ultrasonic sensors). Where the sensor data 102 includes image data, any type of image data format may be used, such as, for example and without limitation, compressed images such as in Joint Photographic Experts Group (JPEG) or Luminance/Chrominance (YUV) formats, compressed images as frames stemming from a compressed video format such as H.264/Advanced Video Coding (AVC) or H.265/High Efficiency Video Coding (HEVC), raw images such as originating from Red Clear Blue (RCCB), Red Clear (RCCC), or other type of imaging sensor, and/or other formats. In some cases, sensor data may be used in the process 100 in its raw format (e.g.), while in others, it may undergo preprocessing. The sensor data 102 can refer to unprocessed data, preprocessed data, or even a combination of both.
In some embodiments, the sensor data processor may employ a preprocessing image pipeline to process raw images acquired by sensor(s), e.g. camera(s), and included in image data 102, to produce preprocessed data that may be used as input image(s), to the input layers of the machine-learning model(s). A suitable pre-processing pipeline could convert a RCCB Bayer image (e.g. 1-channel) from the sensor to a RCB image (e.g. 3-channel), stored in Fixed Precision format (e.g. 16-bits per channel). The pre-processing pipeline can include decompanding (e.g. in this order or an alternative order), noise reduction, demosaicing and white balancing.
When noise reduction is used by the sensor data processor, it can include bilateral denoising within the Bayer domain. When demosaicing by the sensor data processor is used, it can include bilinear insertion. When the sensor data processor uses histogram computing, it can include computing a histogram of the C channel. In some cases, this may be combined with noise reduction or decompanding. When adaptive global tone map is used by the sensor data processor, it can include an adaptive gamma log transform. This can include calculating the histogram, getting the mid-tone levels, and/or estimating maximum luminance using the mid-tone levels.
The outputs of the machine-learning model(s), 104, may include blindness outputs, scalar values 114, saliency maps 116, or other output types. The output(s), 106, may include the blindness regions 108, blindness classifications 110, or blindness attributes 112. The blindness regions 108 can identify the pixel locations in the sensor data (e.g.) where the potential sensor blindness or compromised visibility may be. In some non-limiting embodiments the blindness region(s), 108 may be output on a pixel-by pixel basis by the machine learning models 104. Each pixel, or at least that pixel which is associated with a prediction of blindness, may have a blindness classification 110. Each pixel with an associated blindness, or compromised visibility, e.g. of the same classification 108, and/or in continuous clusters or other relationships, may be determined as a blindness area 108. Other examples may output the blindness regions 108 as pixel locations for vertices in a polygon that corresponds to the blindness regions 108. In this example, blindness regions 108 (e.g. areas associated with compromised visibility), may have associated blindness classifier(s), blindness attribute or attribute(s), and each pixel within the polygon delineating the blindness area(s), 108, may be determined to possess the associated blindness classifier(s), blindness attribute and/or attribute(s). The blindness regions 108 can be defined pixel-by-pixel, for example, by clustering or another association technique to determine blindness pixels associated with the same region. Or, they may be defined using the vertices of polygons that define or delineate the blindness regions 108 (e.g. pixel coordinates).Click here to view the patent on Google Patents.