RFM模型分析用户行为

根据美国数据库营销研究所Arthur Hughes的研究，客户数据库中有三个神奇的要素，这三个要素构成了数据分析最好的指标：最近一次消费(Recency)、消费频率(Frequency)、消费金额(Monetary)。

RFM模型：

R(Recency)表示客户最近一次购买的时间有多远，对消费时间越近的客户，提供即时的商品或服务也最有可能有所反应。
F(Frequency)表示客户在最近一段时间内购买的次数，经常买的客户也是满意度最高的客户。
M (Monetary)表示客户在最近一段时间内购买的金额，消费金额是最近消费的平均金额，是体现客户短期价值的中重要变量。如果预算不多，那么我们酒的将服务信息提供给收入贡献较高的那些人。

一般原始数据为3个字段：客户ID、购买时间（日期格式）、购买金额，用数据挖掘软件处理，加权（考虑权重）得到RFM得分，进而可以进行客户细分，客户等级分类，Customer Level Value得分排序等，实现数据库营销！

（编号次序RFM,1代表高，0代表低）

重要价值客户（111）：最近消费时间近、消费频次和消费金额都很高，必须是VIP啊！

重要保持客户（011）：最近消费时间较远，但消费频次和金额都很高，说明这是个一段时间没来的忠实客户，我们需要主动和他保持联系。

重要发展客户（101）：最近消费时间较近、消费金额高，但频次不高，忠诚度不高，很有潜力的用户，必须重点发展。

重要挽留客户（001）：最近消费时间较远、消费频次不高，但消费金额高的用户，可能是将要流失或者已经要流失的用户，应当基于挽留措施。

RFM模型的应用在于建立一个用户行为报告，这个报告会成为维系顾客的一个重要指标。

现在我们以某淘宝店家做客户激活为案例，RFM_TRAD_FLOW.csv 为某段时间内客户消费记录

1
2
3

import pandas as pd
trad_flow = pd.read_csv('data/RFM_TRAD_FLOW.csv', encoding='gbk')
trad_flow.head(10)

数据部分展示：

transID	cumid	time	amount	type_label	type
9407	10001	14JUN09:17:58:34	199	正常	Normal
9625	10001	16JUN09:15:09:13	369	正常	Normal
11837	10001	01JUL09:14:50:36	369	正常	Normal
26629	10001	14DEC09:18:05:32	359	正常	Normal
30850	10001	12APR10:13:02:20	399	正常	Normal
32007	10001	04MAY10:16:45:58	269	正常	Normal
36637	10001	04JUN10:20:03:06	0	赠送	Presented
43108	10001	06JUL10:16:56:40	381	正常	Normal

计算F 反应顾客对打折的偏好程度

通过计算F反应客户对打折产品的偏好

1 2	F=trad_flow.groupby(['cumid','type'])[['transID']].count() F.head()

建立数据透视表

1 2	F_trans=pd.pivot_table(F,index='cumid',columns='type',values='transID') F_trans.head()

对缺失的数据填补为零

1 2	F_trans['Special_offer']= F_trans['Special_offer'].fillna(0) F_trans.head()

计算兴趣用户比例

1 2	F_trans["interest"]=F_trans['Special_offer']/(F_trans['Special_offer']+F_trans['Normal']) F_trans.head()

计算M反应客户的价值信息

通过计算M反应客户的价值信息

1 2	M=trad_flow.groupby(['cumid','type'])[['amount']].sum() M.head()

数据透视，缺失值补零，计算价值用户

M_trans=pd.pivot_table(M,index='cumid',columns='type',values='amount')
M_trans['Special_offer']= M_trans['Special_offer'].fillna(0)
M_trans['returned_goods']= M_trans['returned_goods'].fillna(0)
M_trans["value"]=M_trans['Normal']+M_trans['Special_offer']+M_trans['returned_goods']
M_trans.head()

通过计算R反应客户是否为沉默客户

定义一个从文本转化为时间的函数

from datetime import datetime
import time
def to_time(t):
    out_t=time.mktime(time.strptime(t, '%d%b%y:%H:%M:%S'))  
    return out_t

将时间进行转化

1	rad_flow["time_new"]= trad_flow.time.apply(to_time)

获取高频消费客户

1 2	R=trad_flow.groupby(['cumid'])[['time_new']].max() R.head()

构建模型，筛选目标客户

# In[12]
from sklearn import preprocessing
threshold = pd.qcut(F_trans['interest'], 2, retbins=True)[1][1]
binarizer = preprocessing.Binarizer(threshold=threshold)
interest_q = pd.DataFrame(binarizer.transform(F_trans['interest'].values.reshape(-1, 1)))
interest_q.index=F_trans.index
interest_q.columns=["interest"]
# In[12]
threshold = pd.qcut(M_trans['value'], 2, retbins=True)[1][1]
binarizer = preprocessing.Binarizer(threshold=threshold)
value_q = pd.DataFrame(binarizer.transform(M_trans['value'].values.reshape(-1, 1)))
value_q.index=M_trans.index
value_q.columns=["value"]
# In[12]
threshold = pd.qcut(R["time_new"], 2, retbins=True)[1][1]
binarizer = preprocessing.Binarizer(threshold=threshold)
time_new_q = pd.DataFrame(binarizer.transform(R["time_new"].values.reshape(-1, 1)))
time_new_q.index=R.index
time_new_q.columns=["time"]
# In[12]
analysis=pd.concat([interest_q, value_q,time_new_q], axis=1)
# In[12]
#analysis['rank']=analysis.interest_q+analysis.interest_q
analysis = analysis[['interest','value','time']]
analysis.head()

label = {
    (0,0,0):'无兴趣-低价值-沉默',
    (1,0,0):'有兴趣-低价值-沉默',
    (1,0,1):'有兴趣-低价值-活跃',
    (0,0,1):'无兴趣-低价值-活跃',
    (0,1,0):'无兴趣-高价值-沉默',
    (1,1,0):'有兴趣-高价值-沉默',
    (1,1,1):'有兴趣-高价值-活跃',
    (0,1,1):'无兴趣-高价值-活跃'
}
analysis['label'] = analysis[['interest','value','time']].apply(lambda x: label[(x[0],x[1],x[2])], axis = 1)
analysis.head()