Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Current »

Description

In a given string, the first set of characters that matches an established regular expression will be replaced by an indicated template. If no occurrence is found, it returns the original expression or an optional specified fail value. Use the Substitute all (subsall) operation to replace all the occurrences in the given string. 

You can use the Template (template) and Regular expression, regexp (re) operations to transform the values in a string column into the required template and regexp data types.

How does it work in the search window?

Select Create field in the search window toolbar, then select the Substitute operation. You need to specify at least three arguments:

Argument

Data type

More information

String to scan mandatory

string

You can select a column in the table or enter a value manually.

Regular expression mandatory

regexp

You can select a column in the table or enter a value manually. If you introduce it yourself, you can use the regexp syntax to establish grouping patterns.

Template mandatory

template

You can select a column in the table or enter a value manually. If you introduce it yourself, you can use the capturing group syntax to make reference to specific groups established by the regular expression.

Fail value optional

string

You can select a column in the table or enter a value manually.

The data type of the values in the new field is string.

Note that Devo automatically changes the strings manually entered in the Regular expression and Template arguments to the required regexp and template data types. If you want to use a column on these arguments, it must be a regexp/template type column. You can use the Regular expression, regexp (re) and Template (template) operations to transform a string column to the required data type.

Example

In the demo.ecommerce.data table, we want to replace the first colon value (:) in every string of our timestamp field by a hyphen (-). We will create a new column using the Substitute operation to do it.

  • String to scan- timestamp field

  • Regular expression - Click the pencil icon and enter → :

  • Template - Click the pencil icon and enter → -

Click Create field and you will see the following result:

We can also create a field in the demo.ecommerce.data table that substitutes the first dot in IP addresses by a space. To do it, we will create a new field using the Substitute operation and we will call it Substitute. Before that, we need to transform the clientIpAddress column into string type using the to string (str) operation and we will call it IPstring.

Once we have the IPstring column, the arguments needed to create the new Substitute field are:

  • String to scan- IPstring column

  • Regular expression - Click the pencil icon and enter the following syntax to group up to the first dot→ ([0-9]+)\.*

  • Template - Click the pencil icon and make reference to the capturing group specified by the regular expression syntax, followed by a space → \1SPACE

If you are going to use the same regular expression and template several times, it is advisable to create column using the Regular expression, regexp (re) and Template (template) operations and use them as arguments in the substitute operations.

Click Create column and you will see the following result:

  • The first dot of the IP addresses has been substituted by a space.

If you want to substitute all the dots in the IP addresses, you can use either the Substitute all (subsall) operation with the same arguments or keep using this operation with some adjustments:

  • String to scan- IPstring column

  • Regular expression - Repeat the regular expression syntax used before as many times as groups needed → ([0-9]+)\.*([0-9]+)\.*([0-9]+)\.*

  • Template - Make reference to as many capturing groups as groups defined by the regular expression syntax, followed by each of them by a space → \1SPACE\2SPACE\3SPACE

Click Create field and you will see the following result:

  • The dots of the IP addresses has been substituted by a spaces.

How does it work in LINQ?

Use the operator select... as...  and add the operation syntax to create the new column. These are the valid formats for the Substitute operation:

  • subs(string, re(string), template(string))

  • subs(string, re(string), template(string), fail_value_string)

  • subs(string, regexp, template)

  • subs(string, regexp, template, fail_value_string)

Note that when you enter a string value as a regular expression and template using LINQ, you have to transform them to regexp and template format using the Regular expression, regexp (re) and Template (template) operations, as you can see in the examples. This is not needed if you perform this operation directly from the search window interface, as said above.

Example

You can copy the following LINQ script and try the previous examples on the demo.ecommerce.data table.

from demo.ecommerce.data
  select subs(timestamp, re(":"), template("-")) as substitute_timestamp
from demo.ecommerce.data
  select str(clientIpAddress) as IPstring,
    subs(IPstring, re("([0-9]+)\\.*([0-9]+)\\.*([0-9]+)\\.*"), template("\\1 \\2 \\3 ")) as Substitute
  • No labels