Invented by Jeffrey S. Myers, Kenneth J. Sanchez, Michael L. Bernico, State Farm Mutual Automobile Insurance Co
The State Farm Mutual Automobile Insurance Co invention works as followsA method for training and using machine learning models that does not consider unwanted factors, which would otherwise be taken into account by the model when it analyzes new data. The model could be a neural net trained on a series of training images. It would then evaluate an insurance candidate based on an image or audio of the applicant. This is done as part of the underwriting process in order to determine the appropriate premium for life or health insurance. The model is taught to probabilistically associate an aspect of the application’s appearance with personal or health-related characteristics. Unwanted factors such as race, age, gender, ethnicity and/or sex are excluded. The model is given an image (e.g. a “selfie”) The trained model receives an image (e.g., a?selfie?) of the applicant for insurance, analyzes it without taking into account the undesired elements, and then suggests the correct insurance premium based on only the remaining desirable factors.
Background for Method for controlling for undesirable factors in machine-learning models
Machine learning models can be trained to analyze data for specific purposes, such as identifying correlations or making predictions. During the training process, models can learn to include unreliable, irrelevant, misleading or other undesirable factors. This is especially true if these biases exist in the data sets. While training with structured data limits the data that is considered by a model, training with unstructured allows it to take into account all the available data including background information, and other undesirable factors. “For example, when a neural net is trained using unstructured data, including appearances of people, to make predictions and correlations about them, it may include such undesirable factors as age and sex in its analysis.
Embodiments” of the present technology are machine learning models which control the consideration of unwanted factors that would otherwise be taken into account by the machine-learning model when analyzing data. One embodiment of the invention, for example, may be configured to train and use a neural net that controls for the consideration of one of more undesired factor which would otherwise be considered when the neural network analyzes new data in an underwriting process for determining an appropriate insurance premium.
In one aspect, a training and use method for a machine-learning model that does not consider any undesired factor which would otherwise be taken into account by the model can broadly include the following. The machine learning model can be trained by using a data set containing information that includes the undesired factor. It is possible to identify the undesired factor and/or one or more interaction terms that are relevant between them. “The machine learning model can be made to ignore the undesired factor when analyzing new data in order to avoid unwanted prejudice or discrimination.
In a second aspect of the invention, a computer implemented method of training and using a model of machine learning to evaluate an applicant for insurance as part of the underwriting process in order to determine a suitable insurance premium may include the following. Machine learning models can be taught to probabilistically associate an aspect of appearance with personal or health characteristics by providing the model with images of people with known personal or health-related features, including undesired elements. It is possible to identify the undesired factor and one or more interaction terms relevant between undesired factor. A communication element may receive an image of the applicant for insurance. The machine learning model can analyze the image to determine personal and/or medical characteristics of the applicant. This analysis will exclude the undesired elements. The machine-learning model can then suggest an appropriate insurance premium that is based in part, but not entirely, on the determined probabilistically personal and/or medical characteristics.
Various implementations may include one or more additional features. To identify the undesirable factors and relevant interactions terms, a machine learning model can be trained using a data set that only contains the undesirable factors and relevant interactions terms. To make the machine-learning model not take into account the undesired variables when analyzing new data, you can combine the machine-learning model with the second machine-learning model. This will eliminate any bias that the undesired factor may have created. In addition or alternatively, training the machine-learning model to identify undesired terms and factors may be used. In addition, instructing the model to ignore the undesired elements when analyzing new data is another way to make it stop considering them. The machine learning model can be a neural net. The second machine-learning model could be a linear one. The machine learning model can be trained to analyze new data in an underwriting process, to determine an insurance premium. This data could include images of the person who is applying for insurance such as life or health insurance, or images of the property that the person wants to cover with property insurance. The machine learning model can be trained to analyze new data in the underwriting process, to determine the appropriate terms of insurance.
The following description and illustration of exemplary embodiments will make the advantages of these and other embodiments more obvious to those who are skilled in the field. The present embodiments can be adapted to other, different embodiments and their details may be modified in many ways. The drawings and descriptions are intended to be indicative and not restrictive.
The present embodiments can relate, for example, to training and using machine-learning models that do not consider one or more undesirable factors when analyzing data. One embodiment of the invention, for example, may be configured to train and use a neural net that controls for the consideration of one of more undesired factor which would otherwise be considered when the neural network analyzes new data in an underwriting process for determining an appropriate insurance rate and/or terms of coverage.
Machine-learning models can be trained to analyze data for specific purposes, like insurance underwriting. This involves identifying correlations, and making predictions. The models can learn to incorporate undesirable factors during training, especially if biases exist in the data sets. While training with structured data limits the data a model can consider, training with unstructured allows it to take into account all the available data including background information, and undesirable factors. A neural network that is trained on unstructured data, including the appearance of people, to make predictions and correlations about them, may include undesirable factors such as age, race, sex and ethnicity in its analysis of new data.
The present technology is a way of training a machine-learning model to filter or control bias based on these undesirable factors. This includes controlling bias in models with unknown bias levels. This can be done in a variety of ways. In a first example embodiment, one or more undesirable factors can be removed from the neural networks consideration before it performs subsequent analyses of the new data. In a second exemplary embodiment, the neural networks can be trained to identify one or more undesirable factors and ignore them when performing subsequent analyses on new data.
In the first embodiment, the neural network is trained using data that includes one or several undesirable factors. A linear model can then be developed based on these undesired variables and one or multiple relevant interactions between them. Both models are then combined to correct for the bias caused by the undesired factor. Referring to FIG. As shown in 10, the neural network can be trained by using the data set for training. The training data set may contain images, sounds or other information about examples. The information could be about one or more attributes of a person or a property that is being insured (e.g. life, health or property insurance). The neural network can learn to analyze different subjects based on correlations found through training. The neural network can learn to include one or more undesirable factors in its analysis, if they are present in the data set for training. This model’s output can be called?Y Initial?
As shown in 12, “A linear model can be constructed based on one or more factors that are deemed undesirable by the data, and the interaction terms between these variables. The second model can have terms for the undesirable factors, e.g. B1, B2, . . , Bn).
A new model can be created that combines both the linear and trained neural networks. This new model will control for unwanted factors, so they won’t be considered in the analysis of new data during the underwriting process, to determine an appropriate insurance premium or other terms of coverage. As shown in Figure 14. The new model may be referred to as ?Y_Final?, wherein Y_Final=Y_Initial+B1*X1+B2*X2+ . . . +Bn*Xn. X1, X2, . . . Xn can represent the relative portion of the population that contains the bias. The Y_Final can then be determined to not depend upon any bias that the neural network has initially learned.
The second embodiment is shown in FIG. As shown in 20, the neural network can be trained again on data containing one or more undesirable factors. As shown in 22, the same neural network can be trained to identify undesired factor and one or more relevant interactions terms between one or multiple undesired factor. As shown in 24, the neural network can be instructed not to consider the undesired factor while analyzing the new data during the underwriting process, to determine the insurance premium or other terms of coverage.
In an example, the machine-learning model could be a neural net trained on a series of training images. It would then evaluate the insurance applicant using an image as part of the underwriting process in order to determine the appropriate premium for life insurance or health insurance and/or terms of coverage. The model can be trained to probabilistically associate an aspect of an applicant’s appearance with personal or health-related characteristics. The model can be trained to receive an image (e.g. a “selfie”) The trained model may receive the image (e.g., a?selfie?)
In greater detail, please refer to FIG. After the neural network has been trained according to the first or the second embodiments described above and shown in FIGS. As shown in 30 and 32, a processing component employing the neural networks may receive still images and/or video recordings (i.e. voice recordings) of an insurance applicant. As shown in 34 the processing element using the neural network can extract information in order to complete the underwriting, including verifying information given by the applicant or answering questions. It may also automate certain aspects of the process by directly predicting the insurance premiums and/or terms of coverage. “The applicant can then be quickly provided with a quote, as shown on 38.
The neural network can be either a convolutional network (CNN) or a deep-learning neural network. A CNN is one type of feed forward neural network that’s often used for facial recognition systems. Individual neurons can be tiled to respond to regions within the visual field. A CNN can include several layers of small neuron groups that examine small areas of an input image. These are called receptive field. These collections can be tiled to overlap and better represent the original images. This may be repeated with each layer. Deep learning algorithms are used to try and model high-level abstractions from data using complex model architectures or other non-linear transforms. Images can be represented using different methods, including a vector of intensity per pixel or a set edges. Some representations can help you learn how to recognize personal information and health-related data from examples.
The large sample of images or voice recordings (such as video) and/or stills may have been provided by existing policyholders, volunteers, or social media. The applicant’s still image or moving video (e.g. voice) recording may be digital or analog, but otherwise conventional and non-diagnostic. The insurance applicant, or himself or herself. The videos could include audio recordings of the applicants, and the neural networks’ training and analysis might include similar patterns or characteristics in voice. The neural network may analyze images in a probabilistic manner, so that the data generated may have varying degrees or certainty.
Since then, exemplary embodiments may include probabilistically evaluating insurance applicants, and determining the appropriate premiums or other terms of coverage, using voice recordings, still images and/or motion pictures, without the need for conventional medical exams, while excluding undesirable factors from the analysis. The technology described here is exemplary in nature and relates to the underwriting of health or life insurance. However, it can be applied to other types of insurance as well, including property insurance.
I. Exemplary computer system
Referring to FIG. The system 40 shown in FIG. 4 is an example of a computer system that can be used to evaluate an insurance applicant during an underwriting process. This may include determining the appropriate terms for life insurance or other types of insurance, including premiums and discounts. The system 40 can be broadly composed of a memory 42 that stores information such as a database of training recordings and/or images; a communication 44 that receives and transmits signals via a networking 46, which includes receiving the applicant?s image or voice recording; and/or processing 48 using a neural net 54 configured and trained to analyze the applicant?s image or voice recording.
The memory element 42 can store information such as a database of still images (e.g. video) and/or audio recordings (e.g. voice recordings) used to train the processor element 48 and/or still images (e.g. video) and/or audio recordings (e.g. voice recordings) received from the applicants. Memory element 42 can include data storage components, such as read-only memories (ROM), programmable RAM, erasable RAM, random-access memory such as static RAM or dynamic RAM, cache memory, hard drives, floppy discs, optical disks and flash memory. The memory element may be a “computer-readable medium” or include one. The memory element may also store instructions, code or code segments, software and firmware, programs, apps, services or daemons that are executed by processing element 48. Memory element 42 can also store data, documents and sound files. It may also include images, photos, movies, videos, databases, etc. The memory element 42 can be electronically coupled with or in electronic communication with both the communication element 44, and the processing element 48.
Click here to view the patent on Google Patents.