Given multiple columns of numeric types and one column which has less than 3 distinct values (for labels) it will try to build multiple queries where each query will represent single label type.
Operator Usage in Easy Mode
- Click + on the parent node.
- Enter the Build Decision Tree operator in the search field and select the operator from the Results to open the operator form.
- In the Table drop-down, enter or select the table to create a model.
- In the Max Depth field, enter the maximum depth parameter to construct the decision tree.
- In the Impurity field, enter the threshold for a node to be counted as decided vs. undecided.
- Optional. In the Columns, click Add More to add an additional list of columns. The first column will be treated as label columns, the rest will be used as feature columns.
- Click Run to view the result.
- Click Save to add the operator to the playbook.
- Click Cancel to discard the operator form.
Usage Details
LQL Command
buildDecisionTree(table: TableReference, maxDepth: Long, impurity: Double, columns:String*)
Parameters:
table (TableReference) - The table to create a model
maxDepth (Long) - Max depth parameter to construct decision tree
impurity (Double) - Impurity threshold for a node (e.g. query) to be counted as decided vs undecided. You can think of this as inverse confidence. For example: 0.01 means uncertainty = 0.01, e.g. if it is 99% certain it will create a rule.
columns: List of columns to manually specify label and feature columns, first column will act as a label, rest will be features. If you specify one column, then it will be label, rest numeric columns will be automatically set to be feature columns
**Returns:
Returns one row with one column where the cell contains JSON object with TreeModel
and Data
objects. TreeModel will contain the array of queries, Data
will contain predicted data.
** Example **
`` {sql}df =
select rand() as f1, rand() as f2, rand() as f3 from tabledf1 =
select *, case when f1 > 0.5 then 1 else case when f2 > 0.5
then 2 else 0 end end as label from df`
df2 = buildDecisionTree(df1, 5, 0.05, "label", "f1", "f2", "f3")
//buildDecisionTree(df1, 5, 0.5, "label") should also work
** Output ** Output should be JSON object which will contain `TreeModel` and `Data` objects, where TreeModel contains tree nodes, e.g. conditions, and Data will contain addition columns:lhub_decision_tree_node_impurity, lhub_decision_tree_path, lhub_decision_tree_predicted_label, lhub_isDecided, these columns are the output of decisionTreeModel. ``` {json}"data":{ "TreeModel":[...], "Data":[...] }
you can extract the data into table as well as follow:
{java}jsonToTable(df2, "RESULT.Data")
This would create an input table back with additional columns mentioned above which is the prediction and path information.