Ungroup Words Node
The Ungroup Words node is designed to take a user selected column and Ungroup the Words found in each input String into separate rows.
Chinese Words are identified by referring to a Word Dictionary. The Word Dictionary is comprised of a large internal Chinese Matching Words list built into the Ungroup Words node. The user may optionally add a Supplemental Dictionary containing their own set of Matching Words. These supplemental words are typically things like Brand Names that would not normally be included in a standard language dictionary.
Words are parsed by any end-of-word marker (such as a space) and any punctuation. Numbers are treated as if they were part of the English alphabet. The period full-stop is treated as punctuation and not as a decimal point, which means the Ungroup Words node will split decimal numbers into two separate number Words.
If a match is found then the Matched Word is added to the output collection. If two Chinese Words are matched from the same starting point in the string then the longer Word is kept and the shorter Word is discarded. An English-equivalent example would be matching ‘cat’ and ‘catch’ – in this case the Word ‘catch’ would be retained and ‘cat’ would be discarded.
The Ungroup Words node is designed to take a user selected column and Ungroup the Words found in each input String into separate rows. The results can be used to identify a Product Name, SKU Number, or Brand from a general Description of the Product. Both Chinese and English is currently supported.
Input Product Array
The Input Product Array or other table containing the column of input Strings that will be Ungrouped into Words.
The optional user-defined set of Matching Words to supplement the internal Chinese Matching Words list
The user selects which column from the Input Product Array contains the input Strings to be Ungrouped into Words. The Ungroup Words node can also Regroup a number adjacent Matched Words and add them to the output collection. The user can select to Regroup Couplets (two adjoining Words), Triplets (three adjoining Words), Quadruplets (four adjoining Words), Quintuplets (five adjoining Words), or Singlets (don’t Regroup).