# Correlation Extension

The Correlation Extension node is designed to take an ‘Input Correlation Matrix’ and intelligently extend it using a list of ‘Input Correlation Pairs’.

The Correlation Matrix represents the degree of Horizontal Differentiation between Features, Benefits, Attributes, Levels, and Products. The Correlation Matrix may be used by a downstream node (such as the Matrix Distributions node or the Feature Generation node) to generate a set of Customer Distributions comprising the Willingness To Pay (WTP) of individual Virtual Customers.

For example, the ‘Input Correlation Matrix’ may be a 3×3 matrix of correlation values (doubles between -1.0 and +1.0) with row names and column names of ‘A’, ‘B’, and ‘C’. The names A, B, and C may be Features or Products. The matrix describes all the correlations between Customer Distribution A, Customer Distribution B, and Customer Distribution C.

The list of ‘Input Correlation Pairs’ contains additional relationships used to extend (grow) the original ‘Input Correlation Matrix’. These input correlation values change the Correlation Matrix in two ways:
(a) Replace existing correlations already found in the Correlation Matrix, and
(b) Add new rows of correlations to the Correlation Matrix.

For example, if the ‘Input Correlation Pairs’ list contained the relationship (‘A’, ‘B’, 0.7) then the node would simply replace the existing (single) correlation value already found in the original Correlation Matrix.

However, if the ‘Input Correlation Pairs’ list contained the relationship (‘A’, ‘X’, 0.5) then the node would add a whole row to the Correlation Matrix with values for A:X, B:X, and C:X. The correlation for the pair A:X would be set to the 0.5 value found in the ‘Input Correlation Pairs’ relationship. But the correlation values for B:X and C:X would also be set by multiplying 0.5 by the existing correlations for A:B and A:C.

The user does not have to specify each correlation in every new row of the Correlation Matrix – the node will do this automatically. But if the user wishes to specify each correlation value themselves, then they can do so by adding additional rows to the bottom of the ‘Input Correlation Pairs’ list. The node processes this list of relationships row-by-row, so after adding a new row for [A:X, B:X, C:X] the user could override the calculated correlations for B:X and C:X.

Multiple relationships are used when filling in missing correlation values. For example, if the list contained the relationship (‘A’, ‘X’, 0.5) and (‘B’, ‘X’, 0.4) then the missing correlation for the pair C:X would be blended from the correlations A:C:X and B:C:X.

Note that the ‘Input Correlation Matrix’ will be converted into a clean and symmetrical Correlation Matrix when first loaded. That means: (a) the diagonal A:A, B:B, C:C correlations will be set to 1.0; (b) correlation values will be range-limited to between -1.0 and +1.0; (c) missing correlations will be set to 0.0; and (d) the correlation for A:B will be set the same as the correlation for B:A (hence lower-left-triangle and upper-right-triangle correlation matrices can be input).

The correlations found in the list of ‘Input Correlation Pairs’ are not range-limited when first loaded and can be set outside the [-1.0 to +1.0] limit to boost the correlation with existing entries. But the final correlations found in the output tables will be so range-limited.

The purpose of this node is to allow the user to quickly extend an existing Correlation Matrix with new rows given limited available data. For example, a Willingness To Pay (WTP) Matrix may have been calculated by an upstream node for the user’s own Products, and now the user wishes to add Competitive Products to this WTP Matrix. The user may know each Competitive Product is a ‘Perfect Match’ (correlation = 0.9) or a ‘Near Match’ (correlation = 0.7) to one of their own Products (‘Perfect Match’ Products tend not to be perfectly correlated as the buying experience from the Competitor’s store will still be different). But the user may not know the correlation between the matched Competitive Product and all of the other Products in the Market. If the matched Product is also the most similar Product, then this node can approximate the correlations to all other Products.

This Community Node documentation assumes you have already downloaded the open-source KNIME analytics platform and installed the free Market Simulation (Community Edition) plugin. If not, start by returning to Getting Started.

# Extend Correlation Matrix

The first workflow replaces the Correlation between A:B and adds a new row of Correlations around A:X. The second workflow adds multiple relationships to the new distribution X and hels estimate the missing correlation for C:X. Both operate according to the ‘Input Correlation Pairs’ matrix.

## Inputs

#### Correlation Pairs

The input set of correlations as a list of pairs. Each pair should quantify the correlation between a single row and a single column for all unique row-column combinations for the Output Correlation Matrix.

#### Correlation Matrix

The input set of correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique and correspond to a column of the same name.

## Node

#### Configuration

The Correlation Extension node does not require additional configuration.

## Outputs

#### Correlation Matrix

The output set of correlations that define the relationship between all Customer Distributions. The Correlation Matrix will be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name will be unique and correspond to a column of the same name.

#### Repaired Matrix

The repaired output set of correlations that define the relationship between Customer Distributions. Repairing is required when the correlations are unrealistic. For example, if A is highly correlated to B (for example, A:B = +0.99) and if A is highly correlated with C (for example, A:C = +0.99) then B must be highly correlated with C (that is, B:C >> 0.0). More precisely, the Correlation Matrix must have all positive definite Eigenvalues. Note that it is not necessary for downstream nodes that generate Customer Distributions (such as the Matrix Distributions node or the Feature Generation node) to use this Correlation Repaired Matrix as these downstream nodes will always first self-repair the Input Correlation Matrix. The Output Correlation Repaired Matrix will contain the same columns as the Output Correlation Matrix.

#### Error Matrix

The difference between the Output Correlation Matrix and the Output Correlation Repaired Matrix. This is a convenience output to show how the Correlation Matrix needs to be repaired before Customer Distributions can be generated. The Output Correlation Error Matrix will contain the same columns as the Output Correlation Matrix.