Advanced Data Generation
In most of the workflows found in this blog, the input data was generated by a Market Simulation node, such as:
But KNIME itself offers many powerful data generation capabilities – particularly in the KNIME Data Generation Extension. To install this extension, check out the Getting Started Guide: Installing the KNIME Analytics Platform.
This workflow generates Customer Profile sample data using some of these advanced data generation techniques. All of the nodes used in this example come from KNIME. But these nodes and this type of data generation can be easily integrated into a Market Simulation workflow.
#1 New Customers
This first step will create 200 random Customers (48% Male / 52% Female) with Customer RowID’s running from c0 to c199 and CustomerID’s running from 1 to 200.
Empty Table Creator
Create a column of 200 Customers in an Empty Table.
Rename the CustomerID Column.
Generate a Counter for each CustomerID.
Random Label Assigner
Randomly assign Customers to be either Male (48%) or Female (52%).
#2 Age Profile
This second step uses 6 Age Pyramid Profiles to randomly allocate the age of these Customers from between 17 years old and 100 years old.
Assign an Occupation to each Customer based upon their Age and an Occupation Probability.
Bin Customers by Age into Generations.
Conditional Label Assigner
Assign a Family status to each Customer by their Age Generation.
Generate an appropriate Income for all Customers based upon their Age Generation and whether or not they are a Student.
#5 Shred Income
Shred the Income (that is, set Income = 0.0) of some Customers based upon the Probability that they are not currently working.
#6 Clean Up
Clean up the final Customer Profiles by rounding the Income, removing extra columns, and sorting by CustomerID.