Invented by David H. Gotz, Pei-Yun S. Hsueh, Jianying Hu, Jimeng Sun, International Business Machines Corp

The healthcare industry is constantly evolving, and one of the latest trends is the use of risk-driven stratification to identify individual and group-level risk factors. This approach involves analyzing patient data to identify those who are at a higher risk of developing certain conditions or experiencing certain outcomes, and then tailoring interventions and treatments to meet their specific needs. There are many benefits to this approach. For one, it allows healthcare providers to focus their resources on those who need them most. By identifying high-risk patients early on, providers can intervene before a condition becomes more serious or costly to treat. This can lead to better outcomes for patients and lower healthcare costs overall. Another benefit of risk-driven stratification is that it allows providers to tailor their interventions to the specific needs of each patient. For example, a patient who is at high risk of developing diabetes may benefit from a different treatment plan than someone who is at low risk. By tailoring interventions to each patient’s unique needs, providers can improve outcomes and reduce the risk of complications. There are many different factors that can contribute to a patient’s risk level, including age, gender, family history, lifestyle factors, and medical history. By analyzing these factors, providers can develop a more complete picture of each patient’s risk profile and tailor their interventions accordingly. Of course, there are also challenges to implementing a risk-driven stratification approach. One of the biggest challenges is collecting and analyzing the data needed to identify high-risk patients. This requires robust data analytics capabilities and a willingness to invest in technology and infrastructure. Another challenge is ensuring that interventions are effective and that patients are engaged in their own care. Providers must work closely with patients to develop treatment plans that are both effective and feasible, and they must provide ongoing support and education to help patients manage their conditions. Despite these challenges, the market for identifying individual and group-level risk factors through risk-driven stratification is growing rapidly. As healthcare providers seek to improve outcomes and reduce costs, they are increasingly turning to data-driven approaches to identify high-risk patients and tailor interventions to meet their needs. With the right tools and strategies in place, this approach has the potential to transform the healthcare industry and improve outcomes for patients around the world.

The International Business Machines Corp invention works as follows

Systems and Methods for Individual Risk Factor Identification” include identifying common risks factors for one or multiple risk targets based on population data. The common risk factors are used to stratify individuals into clusters. The processor determines the discriminability for each common risk factor for a cluster target using the individual data for the cluster target.

Background for Identifying individual and group-level risk factors through risk-driven stratification of patients

Technical Field

The present invention is concerned with risk factor identification and, more specifically, identifying individual and group-level risks factors through risk-driven stratification of patients.

Description of Related Art

As more and more diverse clinical data becomes available, many features can be built to use in predictive modeling. It is important to be able to identify the risk factors associated with a health condition that can lead to adverse outcomes (e.g. congestive cardiac failure) in order to improve healthcare quality and reduce costs. Identification of risk factors can allow early detection of disease onset, allowing for aggressive interventions to be taken in order to prevent or slow down potentially life-threatening and expensive conditions.

In personalized care scenarios, it’s common for two or more patients to have the same risk score, but with different risk factors. Traditionally, the risk factor identification process uses feature ranking methods that rank features based on their global utility. Methods based on data from the general population will only reveal common risk factors, and not individual differences in patients.

A method for identifying individual risk factors includes identifying the common risk factors of one or more risk targets based on population data. The common risk factors are used to stratify individuals into clusters. The processor determines the discriminability for each common risk factor for a cluster target using the individual data for the cluster target.

A method for identifying individual risk factors includes identifying the common risk factors of one or more risk targets based on population data. Clusters of individuals are formed based on the common risk factors. Clusters are classified into a number of risk levels, including at least a high-risk and a low-risk group. A processor determines the discriminability of each common risk factor for a cluster target using the individual data from the cluster target to provide reranked common risk as individual risk factors. The discriminability is then a measure of whether a risk is discriminating its cluster against other clusters. Other clusters are at least one other high-risk group, a low-risk group, or a general population.

A method for identifying individual risk factors includes identifying the common risk factors of one or more risk targets based on population data. Clusters of individuals are formed based on the common risk factors. Clusters are defined as a set of risk levels, including at least two high-risk and one low-risk group. The processor identifies the discriminability for each of the risk factors using the individual data from the target cluster. Other clusters may include other high-risk groups, low-risk groups, or the general population. The individual data is used to validate each of the reranked common risks factors. This allows us to determine the risk factor for the target group by filtering the common factors that don’t indicate risk.

A system for identifying individual risk factors includes a module that is configured to identify the common risk factors from population data for one or multiple risk targets. Clustering module stratifies individuals into groups based on the common risk factor. The ranking module is configured using a processor to determine the discriminability of each common risk factor for a cluster target using the individual data for the cluster target to provide reranked common risk as individual risk factors.

A system for identifying individual risk factors includes a module that is configured to identify the common risk factors from population data for one or multiple risk targets. Clustering module stratifies individuals into groups based on the common risk factors. The group identification module identifies the clusters according to a plurality risk levels, including at least one cluster with high-risk and at least another cluster with low-risk. The ranking module determines, with the help of a processor and individual data from the target group, the discriminability of each common risk factor for a cluster. This is done to provide reranked common risk as individual risk factors. Other clusters may include other high-risk groups, low-risk groups, or a general population.

The following detailed description, in conjunction with the accompanying illustrations, will reveal these and other advantages.

According to the present principles, methods and systems for identifying risk factors at an individual level are provided.” Inputs can include both individual data and data from the population. Individual data can include target group data. In a preferred embodiment the target cluster represents a patient or individual to be examined or treated. Individual data may include, e.g., electronic health records, questionnaire data, genetic information, etc.

Using population data we identify common (i.e. global) risk factors that are associated with one or more risk targets (e.g. diabetes). Each risk factor should be correlated positively or negatively with the target risk. Patients from population data are grouped based on the identified risk factors. For example, clustering patients using hierarchical clustering or k-means clustering can be done. There are other methods for clustering. Each cluster is classified as high-risk, low-risk. This could be based on a proportion of patients at risk in each cluster.

The identified risk factors were re-ranked according to their importance for the target cluster. Each risk factor can be ranked according to how it differentiates its cluster (i.e. the target cluster) from other groups. A variety of comparison configurations can be used. In one embodiment, a target cluster can be compared to all other clusters with high risk. In another embodiment, a comparison of the target cluster with all low-risk groups may be made. In a third embodiment, the target group may be compared to the entire population. There are other comparison configurations that can be used. Discriminability can be determined using one of the configurations. In one embodiment of the invention, discriminability can be calculated by calculating each factor’s contribution to training a classifier. In another embodiment of discriminability, it may be calculated by comparing the differences in distribution between the relevant clusters and the target cluster.

Risk Factors that do not reflect actual local risk may be removed.” The remaining risk factor are output as individual risk factors. Individual risk factors are used to identify the main risk factors of a cluster or patient. The individual risk factors can be used to customize a personalized process of care management or displayed at the point of care or patient education.

As one skilled in the art will see, the invention can be expressed as a method, system or product. Aspects of the present invention can be embodied in a completely hardware embodiment or an entirely software embodiment (including firmware and resident software, etc.). Or an embodiment that combines software and hardware aspects, which may all be referred to as a “circuit” ?module? ?module? oder?system? Aspects of the invention can also be embodied in a computer program product that is stored on one or more computer-readable mediums with computer-readable program code.

Any combination(s) of one or more computer-readable medium(s), may be used. Computer readable media can be either a computer-readable signal medium, or a storage medium. Computer readable storage media can be an electronic, magnetic or optical system, apparatus or device. It is not limited to these. A non-exhaustive listing of more specific examples of the computer-readable storage medium includes: an electrical connection with one or multiple wires, an optical fiber or cable, a portable computer discette, a hard drive, a random-access memory (RAM), read-only memories (ROM), flash memory or erasable-programmable-read-only memories (EPROMs or Flash memory), or any combination of these. A computer-readable storage medium can be defined as any tangible medium capable of storing a program that is intended to be used by an instruction execution system or apparatus or device.

A computer-readable signal medium can include a propagated signal that contains computer-readable program code, such as in baseband, or as a part of the carrier wave. This propagated signal can take on a number of different forms, such as electromagnetic, optical or any combination of these. “A computer-readable signal medium can be any computer-readable medium which is not a computer-readable storage medium, and which can communicate, transmit, or transport a programme for use by, or in conjunction with, an instruction execution system or apparatus or device.

The computer program code for carrying out operations in accordance with aspects of the present invention may be written in any combination of one or more programming languages, including an object-oriented language such as Java, Smalltalk, C++, or the like, and conventional procedural programming languages, such as?C?. Computer program code that performs operations in accordance with aspects of the invention can be written using any combination of programming languages. This includes object-oriented languages such as Java, Smalltalk or C++, and procedural languages such as?C? The programming language or other programming languages may be used. The code can be executed on a user’s PC, on a remote server, or on both. The remote computer can be connected to the computer of the user through any network type, such as a LAN or WAN, or it may connect to an external computer, for example through an Internet Service Provider.

Aspects” of the invention are described in the following paragraphs with flowchart illustrations or block diagrams of apparatus (systems), computer program products and methods according to embodiments. Each block in the flowchart diagrams and/or illustrations, as well as combinations of blocks, can be implemented using computer program instructions. These computer program instruction may be given to a processor in a general-purpose computer, special-purpose computer, or another programmable device to create a machine. The instructions are executed by the processor and produce the means to implement the functions/acts that have been specified on the block diagram or flowchart.

The computer program instructions can also be loaded onto a computer or other programmable apparatus or devices in order to cause a series of operational steps to be performed on the computer. The computer program instruction may also be loaded on a computer, another programmable data processor apparatus, or any other device to perform a series operational steps on the computer, programmable devices or devices. This produces a computer-implemented process.

Click here to view the patent on Google Patents.