KNIME Nodes

Advanced Data Generation

In most of the workflows found in this blog, the input data was generated by a Market Simulation node, such as:

  1. Customer Distributions node
  2. Matrix Distributions node

But KNIME itself offers many powerful data generation capabilities – particularly in the KNIME Data Generation Extension. To install this extension, check out the Getting Started Guide: Installing the KNIME Analytics Platform.

This workflow generates Customer Profile sample data using some of these advanced data generation techniques. All of the nodes used in this example come from KNIME. But these nodes and this type of data generation can be easily integrated into a Market Simulation workflow.

This KNIME Node Use Case provides an example of a useful KNIME workflow. These workflows do not depend upon Market Simulation but can supplement a Market Simulation workflow. If you have not yet installed KNIME, go to Getting Started.

#1 New Customers

This first step will create 200 random Customers (48% Male / 52% Female) with Customer RowID’s running from c0 to c199 and CustomerID’s running from 1 to 200.

Empty Table Creator

Create a column of 200 Customers in an Empty Table.

Column Rename

Rename the CustomerID Column.

Counter Generator

Generate a Counter for each CustomerID.

Random Label Assigner

Randomly assign Customers to be either Male (48%) or Female (52%).

Output Data

Adds the CustomerID and the Gender data columns.

#2 Age Profile

This second step uses 6 Age Pyramid Profiles to randomly allocate the age of these Customers from between 17 years old and 100 years old.

Random Label Assigner

Create a set of Population Age Pyramids.

Double To Int

Round the generated Age.

Gaussian Distributed Assigner

Define the probable Age for each Pyramid.

Column Filter

Remove the Pyramid Profile column.

Output Data

Adds the Age data column.

#3 Occupations

Assign an Occupation to each Customer based upon their Age and an Occupation Probability.

Numeric Binner

Bin Customers by Age into Generations.

Conditional Label Assigner

Assign a Family status to each Customer by their Age Generation.

Gaussian Distributed Assigner

Assign Occupations to each Customer randomly by their Age Generation.

Output Data

Adds the Generation, Occupation, and Family columns.

#4 Income

Generate an appropriate Income for all Customers based upon their Age Generation and whether or not they are a Student.

Nominal Value Row Filter

Filter by Occupation = Non-Student.

Nominal Value Row Filter

Filter by Occupation = Student.

Gamma Distributed Assigner

Generate Incomes by Age Generation.

Gamma Distributed Assigner

Generate Incomes randomly for Students.

Output Data

Adds the Income column.

#5 Shred Income

Shred the Income (that is, set Income = 0.0) of some Customers based upon the Probability that they are not currently working.

Conditional Label Assigner

Shred Income by the Probability the Customer is working.

Constant Value Column

Shred Income by overriding Income column and setting to 0.0.

Row Splitter

Split rows so Top = Has Income and Bottom = No Income.

Output Data

Deletes some of the Income data.

#6 Clean Up

Clean up the final Customer Profiles by rounding the Income, removing extra columns, and sorting by CustomerID.

Double To Int

Round the Income to the nearest Integer.

Sorter

Sort all the rows by the original CustomerID.

Column Filter

Remove the Generation column and the Has Income column.

Output Data

Removes the Generation and Has Income columns then sorts.