Node Description
Customer Distributions
The Customer Distributions node generates input data for a Market Simulation. It takes an optional Input Attributes List to create a set of Customer Distributions representing the Willingness To Pay (WTP) of Customers in the Market. Each row in the set of Output Customer Distributions corresponds to the part-worth value of a Feature, or the WTP of a Product, for a Virtual Customer.
The Input Attributes List can define the Distribution Type and Input Parameters of each Output Customer Distribution. If the Input Attributes List does not define the Output Customer Distribution, then the Input Parameters from the Configuration Dialog are used. Unlike the similar Matrix Distributions node, the Output Customer Distributions from this node will not be correlated.
For example, if the user wishes to create a Normal (Gaussian) Customer Distribution, then the Mean and Standard Deviation (SD) is set according to either the Configuration Dialog, or overridden by the ‘A’ column (corresponding to the Mean) and the ‘B’ column (corresponding to the SD) in the Input Attribute List.
Or for example, if the user wishes to create a Uniform Customer Distribution, then the Minimum Value and the Maximum Value is again set according to either the Configuration Dialog, or overridden by the ‘A’ column (now corresponding to the Minimum Value) and the ‘B’ column (now corresponding to the Maximum Value) in the Input Attribute List.
The Output Customer Distributions from this Customer Distributions node can become part of a Customer Willingness To Pay Matrix (WTP Matrix) for a set of Products. The Input WTP Matrix can feed a downstream Market Simulation node or a Market Tuning node.
The Input Attribute List is optional. Missing values will be replaced by the defaults in the Configuration Dialog. If no input table is provided, then the Customer Distributions node will generate a single Customer Distribution with a Distribution Type and Input Parameters set according to the Configuration Dialog.
The available list of Distribution Types for the user to select from includes:
Normal (Gaussian): (Wikipedia) Generates a set of part-worth values for each Virtual Customer in the shape of a Normal (Gaussian) Distribution. The part-worth values can be drawn randomly or can have evenly changing gaps within a Normal Distribution of a given Mean and Standard Deviation (SD). The output values can be truncated by the Minimum and Maximum limits (if enabled). The Distribution can be sorted in Ascending, Descending, or Random order. Configuration parameters include:
- Mean (A): Any floating-point (double) value
- Standard Deviation (B): Any value greater than > 0.0
Linear: (Wikipedia) Generates a set of part-worth values for each Virtual Customer in the shape of a Uniform (Linear) Distribution. The part-worth values can be drawn randomly or can be evenly spaced between the Starting Value and the Ending Value, optionally truncated by Minimum and Maximum limits. The Distribution can be sorted in Ascending, Descending, or Random order. Configuration parameters include:
- Starting Value (A): Any floating-point (double) value (inclusive)
- Ending Value (B): Any floating-point (double) value (inclusive)
Asymptote End: (Wikipedia) Generates a set of part-worth values from an Exponential Function of the form [a x EXP(-b * CustomerID) + c]. The values selected from this Exponential Function will be between the Start value and 0.0 zero such that the beginning of the curve steeply declines but then rounds off and hugs the end value 0.0 zero. Configuration parameters include:
- Start (A): Any value greater than > 0.0
- Curviness (B): The ‘Curviness’ of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.
Asymptote Start: (Wikipedia) Generates a set of part-worth values from an Exponential Function of the form [a x EXP(-b * CustomerID) + c]. The values selected from this Exponential Function will be between the Start value and 0.0 zero such that the curve initially hugs the Start value and then steeply declines towards 0.0 zero. Configuration parameters include:
- Start (A): Any value greater than > 0.0
- Curviness (B): The ‘Curviness’ of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.
Beta: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Beta Distribution with a user-specified Alpha and Beta:
- Alpha (A): Any value greater than > 0.0
- Beta (B): Any value greater than > 0.0
Binomial: (Wikipedia) Generates a set of random integer part-worth values for each Virtual Customer in the shape of a Binomial Distribution with a user-specified Number of Trials and Probability of Success. Note that the Bernoulli distribution is a special case of the binomial distribution where just a single trial is conducted (Trials = 1). Configuration parameters include:
- Trials (A): Number of Trials is any integer value greater than > 0.0
- Probability (B): Probability of Success is any value between 0.0 and 1.0 (exclusive)
Cauchy: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Cauchy Distribution with a user-specified Median and Scale:
- Median (A): Any floating-point (double) value
- Scale (B): Any value greater than > 0.0
Chi-Square: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Chi-Square Distribution with a user-specified ‘Degrees of Freedom’. After the part-worth value is calculated, the fixed value from ‘Input Parameter B’ is added to shift the result:
- Degrees of Freedom (A): Any value greater than > 0.0
- Then Add Fixed Value (B): Any floating-point value added after the random value is calculated
Exponential: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of an Exponential Distribution with a user-specified Mean. After the part-worth value is calculated, the fixed value from ‘Input Parameter B’ is added to shift the result:
- Mean (A): Any value greater than > 0.0
- Then Add Fixed Value (B): Any floating-point value added after the random value is calculated
F: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of an F Distribution with a user-specified ‘Degrees of Freedom Numerator’ and ‘Degrees of Freedom Denominator’:
- Degrees of Freedom Numerator (A): Any value greater than > 0.0
- Degrees of Freedom Denominator (B): Any value greater than > 0.0
Gamma: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Gamma Distribution with a user-specified Shape and Scale:
- Shape (A): Any value greater than > 0.0
- Scale (B): Any value greater than > 0.0
Inverse Gaussian: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Inverse Gaussian Distribution with a user-specified Mu and Lambda. As Lambda tends to infinity, the Inverse Gaussian distribution becomes more like a Normal (Gaussian) distribution:
- Mu (A): The Mean having any value greater than > 0.0
- Lambda (B): The Shape Parameter having any value greater than > 0.0
Poisson: (Wikipedia) The Poisson Distribution can be used for modeling the number of times an event occurs in an interval of time or space. Generates a set of random part-worth values for each Virtual Customer in the shape of a Poisson Distribution with a user-specified Probability and Entropy:
- Lambda (A): The Poisson Mean having any value greater than > 0.0
- Entropy (B): The Convergence criterion for cumulative probabilities (set to 0.0 by default)
Quadratic: (Wikipedia) The Quadratic Distribution starts at the y-intersect, decreases (or increases) to touch the x-intersect once, then increases (or decreases) again. The Distribution follows the equation [y = a ( x^2 – b )] with only one x-intersection occurring at the minimum (or maximum) of the y-value. The Quadratic Distribution can be used to model the ‘Cost To Make’ (CTM) a Product where the Marginal Cost initially falls with increased production, but then starts to increase again as resources become scarce and operational inefficiencies are magnified. As the minimum value is fixed at 0.0 it may be necessary to shift the values in this Distribution before using it in a Market Simulation model.
- X-Intersection (A): The CustomerID row in the Output Distribution where the curve touches the X-Axis once (the X-Intersection cannot equal = 0.0)
- Y-Intersection (B): The starting value of the Output Distribution where the curve intersects the Y-Axis (the Y-Intersection cannot equal = 0.0)
Sawtooth: (Wikipedia) The Sawtooth wave distribution looks like the teeth of a plain-toothed saw. The raw (unsorted) Distribution starts at zero and ramps upwards towards the Distribution’s Amplitude. It reaches the Amplitude after the Distribution’s Period, then drops to zero and starts again. Configuration parameters include:
- Amplitude (A): The maximum height of the wave
- Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself
Sigmoid: (Wikipedia) Has the characteristic horizontal ‘S-shaped’ curve and is part of the family of Logistic Functions of the form [a / ( 1 + EXP(-b * (row – Customers/2) )]. The values selected from this function will be between the Start value and 0.0 zero such that the beginning of the curve hugs the start value, then steepens, then the end of the curve hugs the end value 0.0 zero. Configuration parameters include:
- Start (A): Any value greater than > 0.0
- Curviness (B): The ‘Curviness’ of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.
Simple Bimodal: (Wikipedia) Generates a simple Bimodal Distribution (a ‘two-humped’ Customer Distribution) from two Normal (Gaussian) Distributions. The user specifies the ‘First Mean’ and the ‘Second Mean’ with the Standard Deviation (SD) automatically calculated to be a quarter of the distance between the two Means. The user specifies:
- First Mean (A): Half of the Virtual Customers will be distributed around the ‘First Mean’
- Second Mean (B): Half of the Virtual Customers will be distributed around the ‘Second Mean’. The ‘First Mean’ cannot equal the ‘Second Mean’.
Sinusoidal: (Wikipedia) The smooth periodic oscillation generated from the sine function rising and falling between 0.0 and the Amplitude. The raw (unsorted) Distribution starts rising at half-Amplitude and reaches the Amplitude after a quarter-Period. It then curves downward and reaches 0.0 zero after three-quarter-Periods. Configuration parameters include:
- Amplitude (A): The maximum height of the wave
- Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself
Spike: (Wikipedia) Is a vertical ‘S-shaped’ curve that looks similar to a rotated Sigmoid function but is generated from a pair of Exponential Functions of the form [a x EXP(-b * CustomerID) + c]. The values selected from this Exponential Function will be between the Start value and 0.0 zero such that the beginning of the curve steeply declines, then rounds off, but then steeply declines again towards the end value 0.0 zero. Note that a sorted Normal Distribution will also generate a similar looking vertical S-shaped curve. Configuration parameters include:
- Start (A): Any value greater than > 0.0
- Curviness (B): The ‘Curviness’ of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.
Square: (Wikipedia) The Square wave distribution alternates at a steady frequency between the Amplitude and 0.0 zero. The raw (unsorted) Distribution starts at the Amplitude and drops to zero after a half-Period. After the Distribution’s Period, the wave is reset to its Amplitude and starts again. Configuration parameters include:
- Amplitude (A): The maximum height of the wave
- Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself
T: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a T Distribution with a user-specified Degrees of Freedom. After the part-worth value is calculated, the fixed value from ‘Input Parameter B’ is added to shift the result:
- Degrees of Freedom (A): Any value greater than > 0.0
- Then Add Fixed Value (B): Any floating-point value added after the random value is calculated
Triangle: (Wikipedia) The Triangle wave distribution raises and falls linearly between 0.0 and the Amplitude. The raw (unsorted) Distribution climbs steadily from half-Amplitude and reaches the Amplitude after a quarter-Period. It then falls steadily and reaches 0.0 zero after three-quarter-Periods. Configuration parameters include:
- Amplitude (A): The maximum height of the wave
- Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself
Weibull: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Weibull Distribution with a user-specified Shape and Scale:
- Shape (A): Any value greater than > 0.0
- Scale (B): Any value greater than > 0.0
Note: technical details concerning how the data generation is performed can be found by referring to the Apache Commons Math Library.
This Community Node documentation assumes you have already downloaded the open-source KNIME analytics platform and installed the free Market Simulation (Community Edition) plugin. If not, start by returning to Getting Started.
Downloads
#1 Normal Distribution
Inputs
None
No inputs required.
Node
Configuration
For a ‘Normal’ Distribution Type, Input Parameter A = Mean and Input Parameter B = Standard Deviation (SD).
Outputs
Attribute List
The Output Attribute List is empty if the Input Attribute List is missing.
Customer Distributions
There will be a Customer Distribution column for each row in the Input Attribute List. But if the Input Attribute List is missing then the node will generate just a single Customer Distribution called ‘Distribution’.
#2 Inverse Gaussian
Inputs
None
No inputs required.
Node
Configuration
For an ‘Inverse Gaussian’ Distribution Type, Input Parameter A = Mu and Input Parameter B = Lambda. See the Wikipedia page for more details.
Outputs
Attribute List
The Output Attribute List is empty if the Input Attribute List is missing.
Customer Distributions
There will be a Customer Distribution column for each row in the Input Attribute List. But if the Input Attribute List is missing then the node will generate just a single Customer Distribution called ‘Distribution’.
#3 Simple Bimodal
Inputs
None
No inputs required.
Node
Configuration
For a ‘Simple Bimodal’ Distribution Type, Input Parameter A = First Mean and Input Parameter B = Second Mean. The Standard Deviation (SD) of both Normal Distributions is equal to a quarter of the distance between the two Means. See the Wikipedia page for more details.
Outputs
Attribute List
The Output Attribute List is empty if the Input Attribute List is missing.
Customer Distributions
There will be a Customer Distribution column for each row in the Input Attribute List. But if the Input Attribute List is missing then the node will generate just a single Customer Distribution called ‘Distribution’.
#4 Many Distributions
Inputs
Attribute List
To generate many Customer Distributions at the same time, the ‘Table Creator’ node can be used to define the Distribution Type as well as the Input Parameters in the ‘Input Attribute List’.
Node
Configuration
The Distribution Type and Input Parameters in the Configuration Dialog are used as default values in case they are not defined in the upstream Input Attribute List.
Outputs
Attribute List
The Output Attribute List adds the Mean and Standard Deviation (SD) columns to the Input Attribute List.
Customer Distributions
There will be a Customer Distribution column for each row in the Input Attribute List.
Update #1 – Chains

- Distribution Naming, and
- Chaining together multiple Customer Distribution nodes.
Distribution Naming: In the past, solo Customer Distributions would all be named ‘Distribution’. To change this name it was necessary to use the KNIME ‘Column Rename’ node. It was only possible to directly specify a Customer Distribution Name if an optional ‘Input Attribute List’ was connected to the top-port of the node. Now it is possible to set a user-defined name for each solo Customer Distribution within the node’s Configuration Dialog.
Chaining Customer Distributions: In the past, the KNIME ‘Column Appender’ node or the KNIME ‘Joiner’ node was required to collect together multiple Customer Distributions into a single table. Now a new (optional) ‘Input Customer Distributions’ port has been added to the bottom of the Customer Distributions node. Using this bottom-port allows the user to link upstream Customer Distributions so that they are automatically appended before the new Customer Distributions generated downstream.
Update #2 – Linear Types

Two Customer Distribution Types have been deprecated and replaced with a new ‘Linear’ type:
- Deprecated: Uniform
- Deprecated: Ordered
- New: Linear
Linear: (Wikipedia) Generates a set of part-worth values for each Virtual Customer in the shape of a Uniform (Linear) Distribution. The part-worth values can be drawn randomly or can be evenly spaced between the Starting Value and the Ending Value by setting the ‘Smooth’ parameter. The Distribution can optionally be truncated by Minimum and Maximum limits. The Distribution can be sorted in Ascending, Descending, or Random order. Configuration parameters include:
- Starting Value (A): Any floating-point (double) value (inclusive)
- Ending Value (B): Any floating-point (double) value (inclusive)
Update #3 – Min / Max

-
Maximum: If enabled, the data generated for the Customer Distribution will capped at this ceiling Maximum. If a randomly generated data point is greater than this Maximum value then a second randomly generated data point will be used instead. The final data point will only be set to this Maximum value after multiple attempts to generate an acceptable random data point have failed. This Configuration Dialog default can be overridden by a ‘Maximum’ column in the Input Attribute List.
Minimum: If enabled, the data generated for the Customer Distribution will capped at this floor Minimum. If a randomly generated data point is less than this Minimum value then a second randomly generated data point will be used instead. The final data point will only be set to this Minimum value after multiple attempts to generate an acceptable random data point have failed. This Configuration Dialog default can be overridden by a ‘Minimum’ column in the Input Attribute List.
In the example below, the first Distribution has been capped between a Maximum and Minimum range, while the second Distribution has not. The difference between the two outputs can be seen in the histogram.
Update #4 – New Types

A number of new Customer Distribution Types have been implemented, including:
- Exponential Functions:
- Asymptote Start
- Asymptote End
- Sigmoid
- Spike
- Periodic Functions:
- Square
- Triangle
- Sawtooth
- Sinusoidal
- Other Functions:
- Quadratic