发帖

楼主: shadowaver

593 2

[程序分享] apriori: Frequent itemsets via the Apriori algorithm [推广有奖]

20关注
13粉丝

随心所欲不逾矩

已卖：1245份资源

教授

56%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 8309 个
通用积分: 689.8825
学术水平: 18 点
热心指数: 22 点
信用等级: 13 点
经验: 29709 点
帖子: 890
精华: 0
在线时间: 1388 小时
注册时间: 2007-9-27
最后登录: 2026-3-6

楼主

shadowaver

发表于 2025-3-31 14:57:38 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

apriori: Frequent itemsets via the Apriori algorithmApriori function to extract frequent itemsets for association rule mining

from mlxtend.frequent_patterns import apriori

OverviewApriori is a popular algorithm [1] for extracting frequent itemsets with applications in association rule learning. The apriori algorithm has been designed to operate on databases containing transactions, such as purchases by customers of a store. An itemset is considered as "frequent" if it meets a user-specified support threshold. For instance, if the support threshold is set to 0.5 (50%), a frequent itemset is defined as a set of items that occur together in at least 50% of all transactions in the database.
References[1] Agrawal, Rakesh, and Ramakrishnan Srikant. "Fast algorithms for mining association rules." Proc. 20th int. conf. very large data bases, VLDB. Vol. 1215. 1994.
Related

Example 1 -- Generating Frequent ItemsetsThe apriori function expects data in a one-hot encoded pandas DataFrame.Suppose we have the following transaction data:
dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],          ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],          ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],          ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],          ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]We can transform it into the right format via the TransactionEncoder as follows:
import pandas as pdfrom mlxtend.preprocessing import TransactionEncoderte = TransactionEncoder()te_ary = te.fit(dataset).transform(dataset)df = pd.DataFrame(te_ary, columns=te.columns_)df
   Apple    Corn    Dill    Eggs    Ice cream    Kidney Beans    Milk    Nutmeg    Onion    Unicorn    Yogurt
            0    False    False    False    True    False    True    True    True    True    False    True
      1    False    False    True    True    False    True    False    True    True    False    True
      2    True    False    False    True    False    True    True    False    False    False    False
      3    False    True    False    False    False    True    True    False    False    True    True
      4    False    True    False    True    True    True    False    False    True    False    False
Now, let us return the items and itemsets with at least 60% support:
from mlxtend.frequent_patterns import aprioriapriori(df, min_support=0.6)
   support    itemsets
            0    0.8    (3)
      1    1.0    (5)
      2    0.6    (6)
      3    0.6    (8)
      4    0.6    (10)
      5    0.8    (3, 5)
      6    0.6    (8, 3)
      7    0.6    (5, 6)
      8    0.6    (8, 5)
      9    0.6    (10, 5)
      10    0.6    (8, 3, 5)
By default, apriori returns the column indices of the items, which may be useful in downstream operations such as association rule mining. For better readability, we can set use_colnames=True to convert these integer values into the respective item names:
apriori(df, min_support=0.6, use_colnames=True)
   support    itemsets
            0    0.8    (Eggs)
      1    1.0    (Kidney Beans)
      2    0.6    (Milk)
      3    0.6    (Onion)
      4    0.6    (Yogurt)
      5    0.8    (Eggs, Kidney Beans)
      6    0.6    (Eggs, Onion)
      7    0.6    (Kidney Beans, Milk)
      8    0.6    (Kidney Beans, Onion)
      9    0.6    (Yogurt, Kidney Beans)
      10    0.6    (Kidney Beans, Eggs, Onion)
Example 2 -- Selecting and Filtering ResultsThe advantage of working with pandas DataFrames is that we can use its convenient features to filter the results. For instance, let's assume we are only interested in itemsets of length 2 that have a support of at least 80 percent. First, we create the frequent itemsets via apriori and add a new column that stores the length of each itemset:
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))frequent_itemsets
   support    itemsets    length
            0    0.8    (Eggs)    1
      1    1.0    (Kidney Beans)    1
      2    0.6    (Milk)    1
      3    0.6    (Onion)    1
      4    0.6    (Yogurt)    1
      5    0.8    (Eggs, Kidney Beans)    2
      6    0.6    (Eggs, Onion)    2
      7    0.6    (Kidney Beans, Milk)    2
      8    0.6    (Kidney Beans, Onion)    2
      9    0.6    (Yogurt, Kidney Beans)    2
      10    0.6    (Kidney Beans, Eggs, Onion)    3
Then, we can select the results that satisfy our desired criteria as follows:
frequent_itemsets[ (frequent_itemsets['length'] == 2) &                (frequent_itemsets['support'] >= 0.8) ]
   support    itemsets    length
            5    0.8    (Eggs, Kidney Beans)    2
Similarly, using the Pandas API, we can select entries based on the "itemsets" column:
frequent_itemsets[ frequent_itemsets['itemsets'] == {'Onion', 'Eggs'} ]
   support    itemsets    length
            6    0.6    (Eggs, Onion)    2
Frozensets
Note that the entries in the "itemsets" column are of type frozenset, which is built-in Python type that is similar to a Python set but immutable, which makes it more efficient for certain query or comparison operations (https://docs.python.org/3.6/library/stdtypes.html#frozenset). Since frozensets are sets, the item order does not matter. I.e., the query
frequent_itemsets[ frequent_itemsets['itemsets'] == {'Onion', 'Eggs'} ]
is equivalent to any of the following three

frequent_itemsets[ frequent_itemsets['itemsets'] == {'Eggs', 'Onion'} ]
frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Eggs', 'Onion')) ]
frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Onion', 'Eggs')) ]

Example 3 -- Working with Sparse RepresentationsTo save memory, you may want to represent your transaction data in the sparse format.This is especially useful if you have lots of products and small transactions.
oht_ary = te.fit(dataset).transform(dataset, sparse=True)sparse_df = pd.DataFrame.sparse.from_spmatrix(oht_ary, columns=te.columns_)sparse_df
   Apple    Corn    Dill    Eggs    Ice cream    Kidney Beans    Milk    Nutmeg    Onion    Unicorn    Yogurt
            0    False    False    False    True    False    True    True    True    True    False    True
      1    False    False    True    True    False    True    False    True    True    False    True
      2    True    False    False    True    False    True    True    False    False    False    False
      3    False    True    False    False    False    True    True    False    False    True    True
      4    False    True    False    True    True    True    False    False    True    False    False
apriori(sparse_df, min_support=0.6, use_colnames=True, verbose=1)Processing 21 combinations | Sampling itemset size 3
   support    itemsets
            0    0.8    (Eggs)
      1    1.0    (Kidney Beans)
      2    0.6    (Milk)
      3    0.6    (Onion)
      4    0.6    (Yogurt)
      5    0.8    (Eggs, Kidney Beans)
      6    0.6    (Eggs, Onion)
      7    0.6    (Kidney Beans, Milk)
      8    0.6    (Kidney Beans, Onion)
      9    0.6    (Yogurt, Kidney Beans)
      10    0.6    (Kidney Beans, Eggs, Onion)
APIapriori(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0, low_memory=False)
Get frequent itemsets from a one-hot DataFrame
Parameters

df : pandas DataFrame
pandas DataFrame the encoded format. Also supportsDataFrames with sparse data; for more info, pleasesee (https://pandas.pydata.org/pandas ... rse-data-structures)
Please note that the old pandas SparseDataFrame formatis no longer supported in mlxtend >= 0.17.2.
The allowed values are either 0/1 or True/False.For example,

Apple Bananas Beer Chicken Milk Rice 0 True False True True False True 1 True False True False False True 2 True False True False False False 3 True True False False False False 4 False False True True True True 5 False False True False True True 6 False False True False True False 7 True True False False False False

min_support : float (default: 0.5)
A float between 0 and 1 for minumum support of the itemsets returned.The support is computed as the fractiontransactions_where_item(s)_occur / total_transactions.
use_colnames : bool (default: False)
If True, uses the DataFrames' column names in the returned DataFrameinstead of column indices.
max_len : int (default: None)
Maximum length of the itemsets generated. If None (default) allpossible itemsets lengths (under the apriori condition) are evaluated.
verbose : int (default: 0)
Shows the number of iterations if >= 1 and low_memory is True. If

=1 and low_memory is False, shows the number of combinations.
low_memory : bool (default: False)
If True, uses an iterator to search for combinations abovemin_support.Note that while low_memory=True should only be used for large datasetif memory resources are limited, because this implementation is approx.3-6x slower than the default.

Returns
pandas DataFrame with columns ['support', 'itemsets'] of all itemsets that are >= min_support and < than max_len (if max_len is not None). Each itemset in the 'itemsets' column is of type frozenset, which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset).
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Algorithm FREQUENT Apriori Priori PRIOR Frequent itemsets via the

[程序分享] apriori: Frequent itemsets via the Apriori algorithm [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级热心勋章

20周年荣誉勋章

本版微信群

[程序分享] apriori: Frequent itemsets via the Apriori algorithm [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级热心勋章

20周年荣誉勋章

本版微信群

扫码加我拉你入群