title

Identify Similar Data Rows with the Fuzzy Grouping Transformation

description

Identify Similar Data Rows with the Fuzzy Grouping Transformation

author

chugugrace

ms.author

chugu

ms.date

03/14/2017

ms.service

sql

ms.subservice

integration-services

ms.topic

how-to

helpviewer_keywords

Fuzzy Grouping transformation

match similar data [Integration Services]

similar data rows [Integration Services]

fuzzy matches

Identify Similar Data Rows with the Fuzzy Grouping Transformation

[!INCLUDEsqlserver-ssis]

To add and configure a Fuzzy Grouping transformation, the package must already include at least one Data Flow task and a source.

To implement Fuzzy Grouping transformation in a data flow

In [!INCLUDEssBIDevStudioFull], open the [!INCLUDEssISnoversion] project that contains the package you want.
In Solution Explorer, double-click the package to open it.
Click the Data Flow tab, and then, from the Toolbox, drag the Fuzzy Grouping transformation to the design surface.
Connect the Fuzzy Grouping transformation to the data flow by dragging the connector from the data source or a previous transformation to the Fuzzy Grouping transformation.
Double-click the Fuzzy Grouping transformation.
In the Fuzzy Grouping Transformation Editor dialog box, on the Connection Manager tab, select an OLE DB connection manager that connects to a [!INCLUDEssNoVersion] database.

[!NOTE]
The transformation requires a connection to a [!INCLUDEssNoVersion] database to create temporary tables and indexes.
Click the Columns tab and, in the Available Input Columns list, select the check box of the input columns to use to identify similar rows in the dataset.
Select the check box in the Pass Through column to identify the input columns to pass through to the transformation output. Pass-through columns are not included in the process of identification of duplicate rows.

[!NOTE]
Input columns that are used for grouping are automatically selected as pass-through columns, and they cannot be unselected while used for grouping.
Optionally, update the names of output columns in the Output Alias column.
Optionally, update the names of cleaned columns in the Group OutputAlias column.

[!NOTE]
The default names of columns are the names of the input columns with a "_clean" suffix.
Optionally, update the type of match to use in the Match Type column.

[!NOTE]
At least one column must use fuzzy matching.
Specify the minimum similarity level columns in the Minimum Similarity column. The value must be between 0 and 1. The closer the value is to 1, the more similar the values in the input columns must be to form a group. A minimum similarity of 1 indicates an exact match.
Optionally, update the names of similarity columns in the Similarity Output Alias column.
To specify the handling of numbers in data values, update the values in the Numerals column.
To specify how the transformation compares the string data in a column, modify the default selection of comparison options in the Comparison Flags column.
Click the Advanced tab to modify the names of the columns that the transformation adds to the output for the unique row identifier (_key_in), the duplicate row identifier (_key_out), and the similarity value (_score).
Optionally, adjust the similarity threshold by moving the slider bar.
Optionally, clear the token delimiter check boxes to ignore delimiters in the data.
Click OK.
To save the updated package, click Save Selected Items on the File menu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify Similar Data Rows with the Fuzzy Grouping Transformation

To implement Fuzzy Grouping transformation in a data flow

See Also

FilesExpand file tree

identify-similar-data-rows-by-using-the-fuzzy-grouping-transformation.md

Latest commit

History

identify-similar-data-rows-by-using-the-fuzzy-grouping-transformation.md

File metadata and controls

Identify Similar Data Rows with the Fuzzy Grouping Transformation

To implement Fuzzy Grouping transformation in a data flow

See Also