Data Generation

Market Basket Analysis

This workflow demonstrates how the data for the Market Basket Analysis was generated. This supermarket offers 48 Products across 10 Categories. Ten-Thousand Shopping Baskets were generated containing between 2 and 15 Products each.

Sales data was generated by first estimating the affinity between Categories and Products. For example, a strong affinity was assumed to exist between the “Meat & Fish” Category and the “Vegetables” Category as Customers shopping for one typically purchases the other. But affinities were also assumed to exist between Products spanning different Categories. For example, the workflow assumes there is a relationship between “Cornflakes” (in the “Cereal” Category) and “Full Cream Milk” (in the “Dairy” Category).

This KNIME Node Use Case provides an example of a useful KNIME workflow. These workflows do not depend upon Market Simulation but can supplement a Market Simulation workflow. If you have not yet installed KNIME, go to Getting Started.

Part-Worth Distributions

The workflow generates the part-worth Willingness To Pay (WTP) Customers have for each of the Categories and each of the Products.

Define Products

The supermarket sells 48 Products across 10 Categories. Yogurt falls within the Dairy Category and is priced at $5.99.

Category WTP

There is a part-worth WTP for each Category. Customers place an average value on the Dairy Category of $12.

Category Correlations

The affinity between Categories is measured in terms of Correlation. Dairy and Cereals are purchased together about 25% of the time.

Category Correlation Matrix

Category Correlations are transformed into a Correlation Matrix.

Category Distributions

The Category Correlations and Mean WTP values are combined to generate part-worth WTP Category Distributions.

Product Correlations

The affinity between Products is also measured in terms of Correlation. Sales of Yogurt are correlated with sales of Milk.

Product Correlation Matrix

Product Correlations are also transformed into a Correlation Matrix just as the Category Correlations were.

Product Distributions

The Product Correlations and Mean WTP values are again combined to generate part-worth WTP Product Distributions.

Product WTP

The part-worth WTP values are summed to generate the Willingness To Pay (WTP) Customers have for each Product in each Category.

Basket Aggregation

Shred Demand

Customers only buy their top choices so it is necessary to shred their demand for less desirable Products. There is a 90% chance Customers will not buy the next Nth Product.

Top Choices

Shredding Demand leaves only the Products each Customer will buy.

Merging Baskets

The Products Customers buy are merged into baskets.

Final Baskets

Each Product purchased by each Customer is ungrouped into transaction rows.