Invented by Kevin Moore, Leah McGuire, Eric Wayman, Shubha Nabar, Vitaly Gordon, Sarah Aerni, Salesforce Inc
The Salesforce Inc invention works as followsMethods, systems and software are provided for determining suitable hyperparameters to be used in a machine-learning model or feature engineering process. Analyzing a dataset allows for the determination of a suitable machine learning model, and its associated hyperparameters. Hyperparameter values that are suitable for machine learning models with one or more hyperparameters shared and a dataset schema compatible are determined. Hyperparameters can be ranked based on their influence on model performance metrics. Values of hyperparameters with a greater impact may be searched more aggressively.
Background for Identification of hyperparameters and their application to machine learning
Hyperparameters can affect the execution of some machine learning algorithms. Hyperparameters can be used to set parameters such as the number of iterations or the size of the samples. They may also reflect assumptions made about the machine-learning model and the training data. There may be hyperparameters for feature engineering algorithms, which can also affect the way feature engineering is done. Data scientists may use heuristics or their own experience to determine the hyperparameters that are best for a machine learning algorithm, feature engineering algorithm, etc. However, this method may not be consistent and reliable across different datasets, machine-learning algorithms, and data scientists.
Hyperparameters can also be searched algorithmically by using a brute-force approach. The search algorithm can be used to find the best hyperparameters from the possible combinations. However, this method may take exponentially more computing time the higher the number of parameters. The search algorithm might require its own hyperparameters and it may take a lot of time to tune them in order to get a useful search result.
According to one embodiment, computer-implemented media, methods, and systems may include receiving a dataset with a first The method can include selecting a first set of hyperparameters for each of the hyperparameters of a pluralit The method can also include training the selected machine-learning model’s first version using the first group of selected hyperparameters The metadata can include one or more of: a size of a training set, the shape of the data, the number The method can also include executing a second machine-learning model using the metadata, where the secondary model returns a selection One or more of the performance metrics can be accuracy, error precision, recall area under receiver operating characteristic curve (ROC curve), and The method can also include executing a second machine learning algorithm using the plurality hyperparameters of the first machine learning The method can also include identifying a hyperparameter value based on search with variable granularity. This The method can also include identifying the hyperparameter values within the range of values determined for one or several hyperparameters The method can also include identifying a value for a hyperparameter within the range of values determined for the hyperpara The threshold size may be adjusted based on the influence of one or more previously saved hyperparameters over one or more performance
The following detailed description, drawings and claims may reveal additional features, benefits and embodiments. It is important to note that the summary above and the detailed description below are intended as illustrations and to explain the subject matter without limiting its scope.
Embodiments” disclosed herein are techniques for identifying parameters to be used in a machine-learning model using repeatable techniques which can be performed efficiently by an automated computerized system. Hyperparameters that are suitable for a machine-learning model can be identified by, for instance, comparing data with previous data used to create other machine-learning models. The selection of a suitable machine-learning model is based on the similarity between the data and other datasets that have been used with machine learning models. Hyperparameters that should be used to train the machine learning model selected can be determined based on the relative contribution made by the hyperparameters in the performance of the chosen model as determined by the performance metrics of the model. The automated and computerized techniques described herein can be used to identify the values that need to be searched, and/or how granularly to search for individual hyperparameters.
As used in this document, the term “suitable” refers to a parameter or parameter value that achieves correct operation of a system. Refers to a value or parameter that is necessary for a system (such as a machine-learning system) to operate correctly. A suitable value can be the least desirable value in a range, yet still ensure correct system operation. A suitable value is one that improves system performance when compared to another value in a range of values. However, it may not be the optimal value.
The term “algorithm” is used here to refer to either a single algorithm or a plurality of algorithms that may be used simultaneously, or successively in a?stacked? “The term?algorithm? manner.
The term “model” is used here to refer to a machine learning algorithm and one or more parameters. “The term?model? refers to the machine learning algorithm and one or more parameters or hyperparameters that are associated with it.
A machine learning system could allow a data analyst to create machine-learning models. Data scientists can collect datasets from various sources, including databases. A feature engineering algorithm can extract interesting features from the dataset. The feature engineering algorithm can then modify the extracted feature, add new features and remove existing features to create a feature-engineered dataset. A machine learning algorithm can be selected by the data scientist to create a new model using the feature-engineered database. It is also called training the model. Hyperparameters are parameterized values which specify the execution of the machine-learning algorithm. Data scientists may create custom metrics to prioritize the importance of a problem. Metrics can include accuracy, error rates, development time and precision. The hyperparameters should be selected to ensure that the algorithm is executed in the most efficient way possible. “It should be noted that, as discussed previously, the feature engineering algorithm can also be configured with hyperparameters in order to affect its execution in a similar way.
The present subject matter discloses a computer-based automated method for identifying hyperparameters and applying them to machine learning algorithms and/or feature engineering. The present subject matter discloses several embodiments that may be used separately, together, or in combination. The processes used within each embodiment can be carried out simultaneously, asynchronously or in an order different from that shown and described.
In one embodiment, the disclosed method can provide for receiving a data set and generating metadata on the basis of its properties. The metadata can be used to identify a suitable machine-learning model and suitable hyperparameters. The machine learning model identified can then be configured to use the hyperparameters identified and the dataset received.
In one embodiment, the disclosed method can select a machine-learning model and train one of more models with one or multiple datasets. One or more hyperparameters that have a greater impact on model behavior in one or multiple datasets can be identified from the one or several subsequently trained models and compiled into list. A range of values can be searched for each hyperparameter in the list to find values that will cause the machine-learning model to perform according to the performance metrics specified by the data scientist. The machine learning model selected can then be configured to use the hyperparameter values that have been identified as suitable.
In one embodiment, the disclosed method can select a machine-learning model and dataset. The dataset can be organized according to a Schema. The method can receive version data that is associated with the machine learning model selected. The method can identify previously-used values of hyperparameters for a machine-learning model that corresponds with the selected model, based on either the dataset schema or version data associated with those values. A range of values within a threshold may be searched based on the previously used hyperparameter values to identify values that cause the model to perform according to the performance metrics specified by the data scientists. The chosen machine learning model can then be configured to use the hyperparameter values that have been identified as suitable.
FIG. The flow diagram shown in Figure 1 is an example of how to select a machine learning model with associated hyperparameters using one or more datasets. In 105, the data scientist, or another user, selects a dataset. These datasets could be tenant datasets that contain customer data and are subject to privacy and safety protocols. A user of the machine-learning system (e.g. a data scientist, computer engineer, etc.) may not be able to view all or some of the data within one or more datasets that were received in 105 based on the permission level. The datasets that were received at stage 105 can be randomly combined to create two sets: a hold-out and a training. The training set can be used in stage 120 to train the machine learning model selected, while the holdout set is used to evaluate the accuracy of that model. Metadata may be generated in 110 to describe properties of datasets received from 105. This metadata can be based upon the datasets themselves, other data that is available to the system and user input. The metadata can be generated on the basis of all datasets together or per-dataset. The metadata can be generated in a separate pre-processing step or combined with another machine learning procedure as described herein. The metadata can include data such as the size and shape, number of fields, percentage breakdowns of different types of fields (e.g. numeric, categorical or text), type of classification problem, variance of datasets, correlations between data and label, statistical distributions of datasets and so on. After the metadata is generated in stage 110, it can be stored in a database (or other data structure) using conventional methods.
In stage 110, a suitable model of machine learning may be chosen from a number of models generated based on metadata. The machine learning models may be chosen in part based on their known advantages. They may also choose one model over another depending on the content and metadata of the dataset. If the metadata indicates that a dataset has a high proportion of categorical information, then it may be possible to select a machine-learning model that performs well with categorical information. A secondary machine learning model may perform stage 115. The secondary machine-learning model can accept one or multiple datasets, along with metadata. It will then return, based on these datasets, a machine-learning model selected and hyperparameter values that are suitable for hyperparameters of the machine-learning model selected. Hyperparameter values can be either numeric or not. The secondary machine-learning model can be operated according to any machine-learning algorithm such as grid search or random search. Bayesian methods and other conventional algorithms may also be used. The suitable machine learning model that was selected in 115 can be trained in 120 using the hyperparameters selected and the datasets received in 105.
FIG. 2A shows an example flow chart 200 for selecting the one or more values that are suitable for hyperparameters of a machine learning model. In 205, a machine-learning model and one/more datasets are selected by the method. The machine learning models may be chosen according to the method 100 by using the secondary machine-learning model at stage 115. They can also be selected by the user or according to conventional methods. The machine learning models selected in 205 could have been previously trained on a variety of datasets, and they may have produced data that can be used to determine the degree of influence of each hyperparameter for performance metrics associated with the machine learning model. Performance metrics can be calculated automatically or manually by a data analyst. They may include accuracy, error and precision, recall and area under precision-recall (AuPR) curve, area under receiver operating characteristic curves (AuROC), etc. The selection of one or several performance metrics can be important in assessing if one hyperparameter is better than another.
In stage 210, method 200 can identify and rank hyperparameters that are associated with the selected machine learning model in stage 205 based on their influence on one or more performance metrics. This can be done using a secondary model of machine learning that takes the data previously discussed resulting from the training of the selected machine-learning model across the plurality datasets and the one or two selected performance metrics, and returns a ranking for the hyperparameters associated with the model according to the influence they have on the one of more performance metrics. The secondary machine-learning model can use a random forest algorithm, or any other machine-learning algorithms that are capable of computing hyperparameter significance in a model.
After identifying and ranking the hyperparameters based on their influence at stage 210, stage may use any machine learning algorithm to search for hyperparameter values. Grid search algorithms are preferred because they allow for the specification of the size and/or the granularity for each hyperparameter. The search for values of hyperparameters that have a greater influence on performance metrics can be done with granularity. The search for values that are suitable can be done with less granularity when it is determined that the hyperparameters have a lesser influence on performance metrics. This allows computing resources to be used more efficiently by allocating more time for searches that will produce better results. If a hyperparameter is determined to have a strong influence, then 50 hyperparameters values can be analyzed, whereas for a hyperparameter with a weak influence, only 5 values are analyzed. The search process 215 can then return one hyperparameter value for each hyperparameter that is associated with the machine-learning algorithm selected in stage.
In stage 220–the hyperparameter values that were determined in stage 215, may be stored into a hyperparameter storage, which can be implemented by any conventional memory device. The hyperparameter storage may be indexed according to model and contain data such as the time and date of training, the version code of algorithms used by the model (if applicable), the schema of the dataset that was used for training, performance metrics of the model based on the discussed performance metrics, values for each model hyperparameter, etc. Hyperparameter selection can be accelerated in the future by looking up suitable values for hyperparameters where data matching is available, instead of performing each step 210-215. The machine learning model that was selected in step 205 can be trained in stage 225 using the dataset chosen in 205, and the hyperparameter values selected in stage 215.
FIG. The flow diagram 250 in 2B shows an example of how to select one or more hyperparameters for a machine learning model. The method is given a choice of a machine-learning model and one or several datasets in 255. The machine-learning model can be selected in accordance with method 100 by using the secondary machine-learning model at stage 115. It may also be selected by the user or according to conventional methods. The machine learning algorithm selected in stage 255, may have a version which can be identified at 260. Versions may be based on the machine-learning algorithm used by the model. A newer version may use new hyperparameters and/or eliminate others. “In general, across different versions of a machine-learning algorithm, a majority or all of the hyperparameters can remain the same. This allows for an advantage to be gained by storing previously used, suitable hyperparameters in the hyperparameter storage.
In stage 265 the method 250 can retrieve hyperparameters with their associated values that were previously used in conjunction with the machine learning model selected from the hyperparameter storage previously described. The hyperparameters retrieved and their associated values could have been used previously with a version of the machine learning model selected or a version different. The version of the machine-learning algorithm can be stored in the store, as previously discussed for stage 220. The hyperparameters store can also be used to associate the schema for the dataset that was used in the training of the model. The dataset can affect the suitability and accuracy of hyperparameters. Stage 265 could also compare the selected dataset in 255 to the dataset in the hyperparameters store in order to determine the similarities and differences.
Click here to view the patent on Google Patents.