Document toolboxDocument toolbox

matchSimilarFromCorpus

This operator works with buildTermCorpus operator, where buildTermCorpus builds the model and matchSimilarFromCorpus operator matches the text to the corpus in the model and adds the columns those kept.

Operator Usage in Easy Mode

  1. Click + on the parent node.
  2. Enter the Match Similar From Corpus operator in the search field and select the operator from the Results to open the operator form.
  3. In the Table drop-down, enter or select the name of the table to run this operator on.
  4. In the Model Name drop-down, Enter or select the name of the model in the MODEL NAME field.
  5. In the Column drop-down, enter or select the name of the column that contains the text to extract TF-IDF features.
  6. In the Number of Matches, enter a value to get the number of best matches.
  7. Click Run to view the result.
  8. Click Save to add the operator to the playbook.
  9. Click Cancel to discard the operator form.

Usage Details

Uses the processed corpus from buildTermCorpus and a new column of text to return the Cosine similarity.

```cplusplus LQL Command matchSimilarFromCorpus(table: TableReference, modelName:String, column: String, numberOfMatches:Int*)

**Input Parameters**  
_table_ (TableReference): Table name  
_modelName_ (String): model name  
_column_    (String): Column name that contains the text to extract TF-IDF features  
`numberOfMatches` (Int\*): Optional parameter to return number of best matches 

**Returns**:  
Returns the greatest Cosine similarity score 'lhub_cosineSimilarity', ranging from 0.0 - 1.0, where 0.0 doesn't match, 1.0 perfectly matches from the TF-IDF terms from the saved corpus along with the columns defined at corpus creation in the columnsKeep argument with 'lhub_' prefix.

## Example

**Input**  
table and model name from buildTermCorpus operator

<style>
  th {
    border: 1px solid #cccccc;
    background-color: #eeeeee;
    padding: 8px 5px 8px 5px;
    text-align: left
  }
</style>

<div><table>
<thead>
<tr>
<th>corpus</th>
</tr>
</thead>
<tbody>
<tr><td>h a c d i j b</td></tr>
  <tr><td>gg aa ff jj c i b</td></tr>
  <tr><td>k o m p n l q</td></tr>
</tbody>
</tr>
</table></div>

LQL command
``` {java}
matchSimilarFromCorpus(inputTable, "corpusModel")
// table = inputTable
// model name that was created by buildTermCorpus operator = "corpusModel"

Output

corpuslabeldomainlhub_confidence
h a c d i j bxgoogle more than 0..5
gg aa ff jj c i byfacebook more than 0.5
k o m p n l qzapplemore than 0.5

lable and domain columns are came from a corpusModel, where in the parameters it was set to keep ["label", "domain"] columns which would be added in the output based on matches. lhub_confidence is the best matches confidence score (e.g. cosine distance).