统计-经管之家官网

insight 2015-10-24 16:36

三季度统计数据公布后，有两个问题引起了广泛关注一是GDP增长只有6.9低于7，应该如何对待？二是工业企业盈利下降，有人建议增加投资，那么现阶段增加投资是解决上述两个问题的好办法吗？从相关数据来看，导致三季度GDP增速下降的直接原因就是投资下降，增加投资似乎恰好对症。固定资产投资1-2月 3月 4月 5月 6月 7月 8月 9月 2015 34477430344246851266658875133750508 55554 2014 30283380393875646638590544672346239 52001 增速 13.9 13.1 9.6 9.9 11.6 9.9 9.2 6.8 但深入探究投资的内在运行规律后，笔者发现增加投资很难行得通，在我国的投资中房地产第二产业和基础设施投资占比超过80%，下面本别对此进行分析从日本的数据来看房地产销售占GDP的百分比是一条抛物线，在前期随着人口的增加和城市化进程的加快不断上升，1973年达到顶点后随着城市化进程的减缓和人口老龄化和总人口减少逐渐下降，98年以后占GDP的比例继续下降。日本房地产销售/GDP 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 3.1 3.4 3.4 3.5 3.5 3.9 4.1 4.3 4.6 5.1 5.6 5.5 5.9 6.2 6.4 6.5 6.3 7 8.3 7.4 7.0 7.3 6.9 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 6.8 6.9 6.4 5.8 5.65.0 4.7 4.6 4.7 5.6 5.9 5.8 5.9 5.2 4.8 5.0 5.4 5.0 5.5 4.7 4.0 中国房地产销售/GDP 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 3 3.3 4.0 4.4 5.0 5.9 6.5 9.5 9.6 11.1 7.9 12.9 12.9 12.2 12 14 12 中国房地产销售占GDP的比例除开08年受金融危机影响，从98年以来一直快速上涨，但14年在没有明显外部因素影响的情况下出现了明显下跌，今年前三季度也只有11.6%。此外从商品房销售增速逐渐下降、商品房待售面积以及待售面积占当年销售面积的比例不断增加，施工面积和施工面积与竣工面积的比例上升等迹象来看房地产市场长周期的拐点可能已经或正在到来。房地产销售 98 99 00 01 02 03 04 05 06 07 08 09 2513 2988 3955 4863 6032 7956 10376 17576 20826 29889 25068 44355 增幅 18.9 32.4 23 24 31.9 30.4 69.4 18.5 43.5 -16.1 76.9 10 11 12 13 14 52721 58589 64456 81428 76292 增幅 18.9 11.1 10 26.3 -6.3 从单月数据来看虽然5月以来百城房价指数开始回升，销售同比2014年大曾，但与2013 年同期比除6月单月大涨外，其他月份增幅并不大，复合增长率依然较低。 1-2月 3月 4月 5月 6月 7月 8月 9月 10月11月 12月 2015 5972 60515716 66709886 6912 6871 8703 2014 7090 59734944 53677459 5182 5346 7566 增幅 -15.8 1.3 15.6 24.3 32.5 33.4 28.5 15 2013房地产销售 1-2月 3月 4月 5月 6月 7月 8月 9月 2013 7361 6631 5855 6017 7512 6173 6175 8304 2015 5972 6051 5716 6670988669126871 8703 商品房待售面积 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015（9） 14679 14550 13463 18626 19947 27194 36460 49285 62169 66510 商品房待售面积占销售面积的比例 26.5 23.5 17.4 28.2 21.1 24.9 32.8 37.8 51.5 52.6 （笔者在统计局网站上没有查找到2010年待售面积数据）房地产施工面积 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 164445 194090 235882 274149 319650 405539 507959 573418 665572 726482 施工面积与竣工面积的比例 2.96 3.14 3.05 4.16 3.37 3.87 4.64 5.15 5.1 6.02 这里顺带看一下未来5年GDP的增速，日本在房地产的上升期1956--1973年间GDP年均增长9.1%，但在房地产的长期拐点到来后 74--90年间年均增长平均只有 3.8%，虽然其中有受石油危机影响等因素，但由于房地产的产业链条非常长，其影响还是非常明显的。现在有报道说GDP增速可能会下调到6.5，从日本的经验来看未来5年经济增长要达到6.5难度不小。笔者建议淡化GDP增长目标多关注就业和防范金融风险（未来5年防范金融风险的任务非常艰巨，有机构研究发债企业的财务资料后发现有2万亿债务有违约风险 http://www.zerohedge.com/news/2015-10-01/chinese-cash-flow-shocker-more-half-commodity-companies-cant-pay-interest-their-debt 另有机构研究了2700家非金融上市公司的财务数据后认为中国银行业潜在的不良贷款高达8% http://www.zerohedge.com/news/2015-10-13/clsa-just-stumbled-neutron-bomb-chinas-banking-system ）再看工业企业投资 1-2月 3月 4月 5月 6月 7月 8月 9月 10月 11月 12月 2014 11705 16549 16543 19797 24529 19934 19632 21428 20081 17929 19917 2015 13055 18306 17816 21625 26644 21445 20835 22459 增幅 11.5 10.6 7.7 9.2 8.3 7.6 6.1 4.8 今年以来第二产业投资一路下滑，从年度数据来看第二产业投资增速从2011年开始一路下滑第二产业投资增速 2011 2012 2013 2014 27.3 20.2 17.4 13.2 笔者认为第二产业投资增速下滑与产能过剩相关，虽然国家统计局没有公布产能利用率（在此再次公开建议统计局公布产能利用率），但通过统计局公布的工业企业财务数据计算得出的总资产周转率也恰好从2011年开始下降。2014年已经下降到接近2009年的水平，规模以上工业企业总资产周转率 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 1.25 1.29 1.30 1.39 1.34 1.24 1.33 1.27 1.17 1.11 1.03 显然在此情况下继续增加工业投资无疑会增大银行倒帐的风险。 (为什么今年以来天威债和中钢债等国企债务风险频发？来看相关数据工业企业资产负债率 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 国有控股 59.5 59.2 59.4 59 57.2 56.6 57.5 60 61.4 61.1 61 61.1 61.9 61.3 私营 60.4 59.6 60.6 60.2 59.8 59.3 59.5 58.8 57.4 56.4 54.6 54.1 53.4 52 股份制 57.0 57.2 58.3 59 59.6 59.1 60.4 60.8 60.4 59.3 58.5 58.3 58.6 57.5 私营企业和股份制企业在08年金融危机爆发后为了规避风险资产负债率都在下降，但国企却有明显上升，虽然在08年大量农民工返乡的背景下也许是无奈的选择，但此后就业稳定后资产负债率并没有减下来无疑与GDP增长目标定得过高，国企在各级政府的压力下一直在选择扩张有关,但从总资产周转率来看2014年私营工业企业为1.98，股份制企业为1.32，国有控股企业只有0.7。国企的扩张从财务方面看很不合理)。最后笔者并不反对在就业出现问题时增加基础设施投资，但应该看到基础设施投资的资金多来源于政府而从日本的经验看随着人口老龄化经济增长低迷税收下降，但财政支出伴随着养老和医疗费用的增加一路上涨，未来财政的压力非常大，在就业没有问题时笔者不建议增加基础实施投资。那么，企业利润下降等问题怎样解决，中国依然只是一个中等收入国家，如果不依靠发展很多问题是很难解决的。笔者的答案是一增加出口，增加出口不仅有利于经济增长，增加企业利润，而且与投资不同增加出口并不会增加企业债务负担在企业的资产负债表上体现为收入增加，有利于降低金融风险。在当前国际经济复苏不稳定的情况下增加出口并不能单靠人民币贬值，否则国际压力会很大，而要依靠改革创新今年以来国务院多次研究并出台了众多便利出口的简政放权措施以及重庆市创新招商方式实行全产业链招商降低企业交易成本无疑都取得了非常良好的效果。增加出口需要高度关注周边国家，建议一带一路战略在前期聚焦周边国家。美国的最大出口国是加拿大和墨西哥而不是中国，德国的最大出口国是法国而不是美国。周边国家的发展对我们增加出口非常有利。从对美博弈的角度来讲周边尤其是东南亚国家的份量也很重。中美双方体量巨大经济利益交织，美国要挺身同中国对抗不能不担心经济利益受损，国内意见难以统一，美国重返亚太以来使用的手法就是挑起周边国家同中国对抗，笔者梳理了美国金融危机后对华政策发现在天安舰事件香港赴菲律宾旅游团被劫持事件和日本在钓鱼岛抓扣中方船长事件相继发生后美国才对中国变脸，据约瑟夫 · 奈在一篇文章中透露克林顿当政时曾想拒绝给与中国贸易最惠国待遇，对中国改持强硬态度，但由于周边国家反对而不得不改弦易辙。美国为什么反对亚投行，笔者认为亚投行的成立明显削弱了美国挑起东南亚各国同中国对立的能力是一个重要原因。建议整合外交，对外投资和外贸资源，制定经略周边的综合战略。其次是放开计划生育限制。从东北的情况来看，放开计划生育对经济增长非常重要。以房地产为例，在国家统计局公布的70个大中城市的房价中9月价格下降的城市有 21 个，上涨的城市有 39 个，持平的城市有 10 个。但其中8个东北的城市只有哈尔滨环比上涨，牡丹江持平，其余全部下跌。东北由于在城镇化和人口老龄化方面领先全国一步，所以在房地产的走势方面可以视为一个先行指标。东北经济增速下降和房地产投资有密切关系，以辽宁为例2014年房地产开发投资 5301.3 亿元，下降 17.8% ，而2014年全国房地产投资增长10.5%.反差非常明显。

35 次阅读|0 个评论

分享非参数统计

西门高 2015-10-19 21:29

序统计量的应用

13 次阅读|0 个评论

分享提高统计推断的几种检验

西门高 2015-10-12 09:43

敏感检验文件检验安慰检验

9 次阅读|0 个评论

分享概率统计

accumulation 2015-10-6 20:17

1、随机变量与概率（1）古典概型与几何概型*** （2）贝努利概型事件概率** 2、随机变量及其分布（1）概率密度函数与分布函数*** （2）常见的一维随机变量及其分布*** 3、多维随机变量及其分布（1）二维随机变量的概率分布、边缘分布与条件分布及其密度*** （2）随机变量独立性与不相关性** 4、随机变量的数字特征（1）期望、方差及其性质*** （2）协方差、相关系数及其性质*** 5、大数定律与中心极限定理（1）大数定律* （2）中心极限定理*** （3）切比雪夫不等式** 6、数理统计的基本概念（1）卡方分布、t分布与F分布*** （2）统计量及其数字特征、无偏性与一致性** 7、参数估计（1）矩估计与最大似然估计*** （2）区间估计与置信区间**

个人分类: 金融工程|0 个评论

分享概率统计词汇

accumulation 2015-10-6 19:08

tolerance limits 容许限 total 总共，和 transformation 转换 treatment 处理 trimmed mean 截尾均值 true value 真值 t-test t 检验 two-tailed test 双侧检验 U unbalanced 不平衡的 unbiased estimation 无偏估计 unbiasedness 无偏性 uniform distribution 均匀分布 V value of estimator 估计值 variable 变量 variance 方差 variance components 方差分量 variance ratio 方差比 various 不同的 vector 向量 W weight 加权，权重 weighted average 加权平均值 within groups 组内的 Z Z score Z 分数

个人分类: 计量经济学|0 个评论

分享概率统计词汇

accumulation 2015-10-6 19:04

R random 随机的 random number 随机数 random number 随机数 random sampling 随机取样 random seed 随机数种子 random variable 随机变量 randomization 随机化 range 极差 rank 秩 rank correlation 秩相关 rank statistic 秩统计量 regression analysis 回归分析 regression coefficient 回归系数 regression line 回归线 reject 拒绝 rejection region 拒绝域 relationship 关系 reliability 可靠性 repeated 重复的 report 报告，报表 residual 残差 residual sum of squares 剩余平方和 response 响应 risk function 风险函数 robustness 稳健性 root mean square 标准差 row 行 run 游程 run test 游程检验 S sample 样本 sample size 样本容量 sample space 样本空间 sampling 取样 sampling inspection 抽样检验 scatter chart 散点图 S-curve S 形曲线 separately 单独地 sets 集合 sign test 符号检验 significance 显著性 significance level 显著性水平 significance testing 显著性检验 significant 显著的，有效的 significant digits 有效数字 skewed distribution 偏态分布 skewness 偏度 small sample problem 小样本问题 smooth 平滑 sort 排序 soruces of variation 方差来源 space 空间 spread 扩展 square 平方 standard deviation 标准离差 standard error of mean 均值的标准误差 standardization 标准化 standardize 标准化 statistic 统计量 statistical quality control 统计质量控制 std. residual 标准残差 stepwise regression analysis 逐步回归 stimulus 刺激 strong assumption 强假设 stud. deleted residual 学生化剔除残差 stud. residual 学生化残差 subsamples 次级样本 sufficient statistic 充分统计量 sum 和 sum of squares 平方和 summary 概括，综述 T table 表 t-distribution t 分布 test 检验 test criterion 检验判据 test for linearity 线性检验 test of goodness of fit 拟合优度检验 test of homogeneity 齐性检验 test of independence 独立性检验 test rules 检验法则 test statistics 检验统计量 testing function 检验函数 time series 时间序列

个人分类: 计量经济学|0 个评论

分享概率统计词汇

accumulation 2015-10-6 19:02

M main effect 主效应 matrix 矩阵 maximum 最大值 maximum likelihood estimation 极大似然估计 mean squared deviation(MSD) 均方差 mean sum of square 均方和 measure 衡量 media 中位数 M-estimator M 估计 minimum 最小值 missing values 缺失值 mixed model 混合模型 mode 众数 model 模型 Monte Carle method 蒙特卡罗法 moving average 移动平均值 multicollinearity 多元共线性 multiple comparison 多重比较 multiple correlation 多重相关 multiple correlation coefficient 复相关系数 multiple correlation coefficient 多元相关系数 multiple regression analysis 多元回归分析 multiple regression equation 多元回归方程 multiple response 多响应 multivariate analysis 多元分析 N negative relationship 负相关 nonadditively 不可加性 nonlinear 非线性 nonlinear regression 非线性回归 noparametric tests 非参数检验 normal distribution 正态分布 null hypothesis 零假设 number of cases 个案数 O one-sample 单样本 one-tailed test 单侧检验 one-way ANOVA 单向方差分析 one-way classification 单向分类 optimal 优化的 optimum allocation 最优配制 order 排序 order statistics 次序统计量 origin 原点 orthogonal 正交的 outliers 异常值 P paired observations 成对观测数据 paired-sample 成对样本 parameter 参数 parameter estimation 参数估计 partial correlation 偏相关 partial correlation coefficient 偏相关系数 partial regression coefficient 偏回归系数 percent 百分数 percentiles 百分位数 pie chart 饼图 point estimate 点估计 poisson distribution 泊松分布 polynomial curve 多项式曲线 polynomial regression 多项式回归 polynomials 多项式 positive relationship 正相关 power 幂 P-P plot P-P 概率图 predict 预测 predicted value 预测值 prediction intervals 预测区间 principal component analysis 主成分分析 proability 概率 probability density function 概率密度函数 probit analysis 概率分析 proportion 比例 Q qadratic 二次的 Q-Q plot Q-Q 概率图 quadratic term 二次项 quality control 质量控制 quantitative 数量的，度量的 quartiles 四分位数

个人分类: 计量经济学|0 个评论

分享概率统计词汇

accumulation 2015-10-6 18:59

eigenvalue 特征值 equal size 等含量 equation 方程 error 误差 estimate 估计 estimation of parameters 参数估计 estimations 估计量 evaluate 衡量 exact value 精确值 expectation 期望 expected value 期望值 exponential 指数的 exponential distributon 指数分布 extreme value 极值 F factor 因素，因子 factor analysis 因子分析 factor score 因子得分 factorial designs 析因设计 factorial experiment 析因试验 fit 拟合 fitted line 拟合线 fitted value 拟合值 fixed model 固定模型 fixed variable 固定变量 fractional factorial design 部分析因设计 frequency 频数 F-test F 检验 full factorial design 完全析因设计 function 函数 G gamma distribution 伽玛分布 geometric mean 几何均值 group 组 H harmomic mean 调和均值 heterogeneity 不齐性 histogram 直方图 homogeneity 齐性 homogeneity of variance 方差齐性 hypothesis 假设 hypothesis test 假设检验 I independence 独立 independent variable 自变量 independent-samples 独立样本 index 指数 index of correlation 相关指数 interaction 交互作用 interclass correlation 组内相关 interval estimate 区间估计 intraclass correlation 组间相关 inverse 倒数的 iterate 迭代 K kernal 核 Kolmogorov-Smirnov test 柯尔莫哥洛夫 - 斯米诺夫检验 kurtosis 峰度 L large sample problem 大样本问题 layer 层 least-significant difference 最小显著差数 least-square estimation 最小二乘估计 least-square method 最小二乘法 level 水平 level of significance 显著性水平 leverage value 中心化杠杆值 life 寿命 life test 寿命试验 likelihood function 似然函数 likelihood ratio test 似然比检验 linear 线性的 linear estimator 线性估计 linear model 线性模型 linear regression 线性回归 linear relation 线性关系 linear term 线性项 logarithmic 对数的 logarithms 对数 logistic 逻辑的 lost function 损失函数

个人分类: 计量经济学|0 个评论

分享概率统计词汇

accumulation 2015-10-6 18:55

概率论与数理统计词汇英汉对照表 A absolute value 绝对值 accept 接受 acceptable region 接受域 additivity 可加性 adjusted 调整的 alternative hypothesis 对立假设 analysis 分析 analysis of covariance 协方差分析 analysis of variance 方差分析 arithmetic mean 算术平均值 association 相关性 assumption 假设 assumption checking 假设检验 availability 有效度 average 均值 B balanced 平衡的 band 带宽 bar chart 条形图 beta-distribution 贝塔分布 between groups 组间的 bias 偏倚 binomial distribution 二项分布 binomial test 二项检验 C calculate 计算 case 个案 category 类别 center of gravity 重心 central tendency 中心趋势 chi-square distribution 卡方分布 chi-square test 卡方检验 classify 分类 cluster analysis 聚类分析 coefficient 系数 coefficient of correlation 相关系数 collinearity 共线性 column 列 compare 比较 comparison 对照 components 构成，分量 compound 复合的 confidence interval 置信区间 consistency 一致性 constant 常数 continuous variable 连续变量 control charts 控制图 correlation 相关 covariance 协方差 covariance matrix 协方差矩阵 critical point 临界点 critical value 临界值 crosstab 列联表 cubic 三次的，立方的 cubic term 三次项 cumulative distribution function 累加分布函数 curve estimation 曲线估计 D data 数据 default 默认的 definition 定义 deleted residual 剔除残差 density function 密度函数 dependent variable 因变量 description 描述 design of experiment 试验设计 deviations 差异 df.(degree of freedom) 自由度 diagnostic 诊断 dimension 维 discrete variable 离散变量 discriminant function 判别函数 discriminatory analysis 判别分析 distance 距离 distribution 分布 D-optimal design D- 优化设计 E eaqual 相等 effects of interaction 交互效应 efficiency 有效性

个人分类: 计量经济学|0 个评论

分享结缘于此！

wandousw 2015-8-19 20:32

目前正处于大学毕业，等待研究生开学的学术狗一枚，大四做毕设开始接触大数据与繁复的统计资料，借老师的账号下过几篇年鉴，现在毕业了来给研究生的导师打工，用需要收集与整理大量数据，因此感觉很有必要来注册一枚~不追求过多的论坛币，只愿自给自足，能够与各位盆友分享手上的有用资料~就这样子！

个人分类: 生活|38 次阅读|0 个评论

分享描述性统计

xiongjerry 2015-8-19 12:05

xtdes：对Panel Data截面个数、时间跨度的整体描述 xtsum：分组内、组间和样本整体计算各个变量的基本统计量（只能显示 Mean、Std. Dev.、Min、Max、Observations） xttab：采用列表的方式显示某个变量的分布（Freq. 与 Percent）

个人分类: stata命令|58 次阅读|0 个评论

分享随机过程与统计物理

accumulation 2015-7-14 23:06

葛颢：非平衡态统计物理的随机数学理论。数学进展 43 ， 161 (2014) 平衡态热力学 Fermi, E.: Thermodynamics. Dover publications, Inc. (New York) (1936) 非平衡态热力学 I. Prigogine: Introduction to Thermodynamics of Irreversible Processes. Wiley; 3rd edition. (1968) S.R. de Groot and P. Mazur: Non-equilibrium thermodynamics. Dover Publications, Inc., New York. (1984) 平衡态统计物理汪志诚：热力学统计物理 ( 第 4 版 ) 高等教育出版社 (2008) T.L. Hill: An introduction to statistical thermodynamics. Dover Books on Physics. (1987) D.A. McQuarrie: Statistical mechanics. University Science Books (2000) Hugo Touchette: The large deviation approach to statistical mechanics. Physics Reports 478, 1-69, (2009) David Chandler: Introduction to Modern Statistical Mechanics. OUP USA (1987) 非平衡态统计物理 ( 随机过程为模型 ) Sekimoto, K. Stochastic Energetics. (Berlin: Springer) (2010) Seifert, U. Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep. Prog. Phys. 75, 126001 (2012) Zhang, X.J., Qian, H. and Qian, M.: Stochastic theory of nonequilibrium steady states and its applications. Part I. Phys. Rep. 510, 1-86 (2012) 严格数学理论： Kintchin: Mathematical foundations of statistical mechanics. Dover Publications (1949) Jiang, D-Q., Qian, M. and Qian, M-P. Mathematical Theory of Nonequilibrium Steady States. On the Frontier of Probability and Dynamical Systems (Berlin:Springer) (2004)

个人分类: 金融工程|0 个评论

分享一段完整的统计代码

xulimei1986 2015-7-8 14:49

#!/usr/bin/env python #-*- coding:utf-8 -*- ''' created on 2015-05-08 author g6591 ''' import sys reload(sys) sys.setdefaultencoding('utf-8') import time import datetime import pymongo from pymongo import MongoClient import os import re import gzip #from datetime import datetime import hashlib import mysql.connector import random import MySQLdb import json import operator from operator import itemgetter , attrgetter import time import urllib2 import urllib urllib.getproxies_registry = lambda : {} #用代理时加上这句 from common import * # 获取网页数据的函数 def geturl(url): res = urllib2.urlopen(url) html = res.read() res.close() return html # uinx时间戳转换函数 def Changetime(datetime1): Unixtime = time.mktime(time.strptime(datetime1,'%Y-%m-%d %H:%M:%S')) return Unixtime # uinx时间戳转换为本地时间 def Localtime(datetime1): Localtime = time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(datetime1)) return Localtime # 字符串时间转换函数 def Normaltime(datetime1): Normaltime = datetime.datetime.strptime(datetime1,'%Y-%m-%d %H:%M:%S') return Normaltime #官方房间、娱乐公会、游戏公会的分类 official_guild = game_guild = ent_guild = # 设置时间 today = datetime.date.today()-datetime.timedelta(days=0) ydate = (today-datetime.timedelta(days=1)).strftime("%Y%m%d") tdate = today.strftime("%Y%m%d") ydate1 = (today-datetime.timedelta(days=1)).strftime("%Y-%m-%d") tdate1 = today.strftime("%Y-%m-%d") # 读取mongdb的数据 HOST = "cc-log.gameyw.netease.com" PORT = 30000 USER = "pt_app_readonly" PWD = "pt_app_readonly" DB = "cc_pt_app" conn = MongoClient(HOST,PORT,read_preference=pymongo.read_preferences.ReadPreference.SECONDARY) db = conn db.authenticate(USER,PWD) def import_mongo1(file_name, time_key, start_datetime, end_datetime): if len(time_key) 10: res = db .find({time_key:\ {"$gte": Normaltime(start_datetime),\ "$lte": Normaltime(end_datetime)}}) else: res = db .find({time_key:\ {"$gte": Changetime(start_datetime),\ "$lte" : Changetime(end_datetime)}}) return res def import_mongo2(file_name): res2 = db .find() return res2 # 读取ddb的数据 conn = MySQLdb.connect(host='cc-log.gameyw.netease.com',user='cc_readonly',passwd='cc_readonly',port=6688,db='cconlineddb4_mirror',charset="utf8") cur = conn.cursor() def import_ddb(file_name): query = "select * from %s;" %file_name try: cur.execute(query) except MySQLdb.Error,e: print "Mysql Error %d: %s" % (e.args , e.args ) results = cur.fetchall() return results # 读取log数据 def read_cc(log_id, date_period, vars, dif): 'read log...' start = time.clock() print 'read %s...' % log_id line_list = for log_date in date_period: try: file_path = u'Y:/%s/%s.log' % (log_id, log_date) fp = open(file_path, 'rb') except IOError: file_path = u'Y:/%s/%s.log.gz' % (log_id, log_date) fp = gzip.open(file_path, 'rb') for line in fp.readlines(): line = line.split('room_name=') line_dict = {} for var in vars.keys(): if vars == 'time': # cost_time=2015-05-08 23:59:59 value0 = re.compile(r',,%s=\s*\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}' % var).findall(line) if value0: value0 = value0 .split('=') date = datetime.datetime.strptime(value0.split(' ') , '%Y-%m-%d') value = datetime.datetime.strptime(str(value0), '%Y-%m-%d %H:%M:%S') if str(date) != (today+datetime.timedelta(days=dif)).strftime('%Y-%m-%d %H:%M:%S'): line_dict.clear() break else: line_dict.clear() break elif vars == 'int': value0 = re.compile(r',,%s=\s*-*\d+' % var).findall(line) if value0: value = int(value0 .split('=') ) else: line_dict.clear() break else: if line.find(',,%s=' % var) != -1: value0 = line.split(',,%s=' % var) .split(',,') if value0: value = value0 else: line_dict.clear() break else: line_dict.clear() break line_dict = value # 若数据每个字段都不为空，行字典不为空 if len(line_dict.keys()) == len(vars): # 若该数据中包括coin_cost字段（item_gain日志中特有的字典字段），且不为空{} if line_dict.get('coin_cost') not in : coincost_dict = dict(i.split('=', 1) for i in line_dict .split(';')) for c in coincost_dict.keys(): line_dict = int(coincost_dict ) dataset.append(line_dict) else: line_dict.clear() fp.close() end = time.clock() print 'log(%s): %s rows. Read: %f s' % (log_id, len(dataset), (end - start)) return dataset # 按照by变量汇总，id为用户id，分别计算付+免用户数和付费用户数，*var为累计汇总变量序列，自动计算累计次数 def sta(d, by_dict, by, date, id, room_type, template_type, terminal, *var): if by not in by_dict: by_dict = {} by_dict = datetime.datetime(int(date.split('-') ),int(date.split('-') ),int(date.split('-') ),0,0,0) by_dict = room_type by_dict = template_type by_dict = terminal by_dict = set() for v in var: by_dict = 0 # 上档总量 by_dict .add(d ) by_dict = len(by_dict ) # 合并两个字典的函数，相同key的数值型数据相加 def union_dict(obj1, obj2): skey = set(list(set(obj1.keys()).intersection(set(obj2.keys()))) ) obj = {} for k1,v1 in obj1.iteritems(): if k1 in skey: obj = {} for k2 in v1.keys(): if k2 != 'uid' and k2 != 'date': obj = obj1 +obj2 else: obj = obj1 else: obj = v1 for k1,v1 in obj2.iteritems(): if k1 not in skey: obj = v1 return obj # 主播签约表及主播上档表 start_datetime = str(datetime.datetime(int(ydate1.split('-') ),int(ydate1.split('-') ),int(ydate1.split('-') ),0,0,0)) end_datetime = str(datetime.datetime(int(tdate1.split('-') ),int(tdate1.split('-') ),int(tdate1.split('-') ),0,0,0)) game_anchor_sign = import_mongo2('gamelive_anchor_current_signing') game_anchor_sign2 = import_mongo2('gamelive_anchor_current_signing') ent_sign = import_ddb('TB_USER_ARTIST') anchor_live = import_mongo1('video_livelog', 'start', start_datetime, end_datetime) anchor_live2 = import_mongo1('video_livelog', 'start', start_datetime, end_datetime) # 下面的字典分别对应：主播的上档、收入、分成、明细、分布数据 live_detail = {} #主播上档明细表 zb_live = {} #主播上档汇总表 zb_income = {} # 按主播统计的收入表 income_huiz = {} # 主播收入汇总表 zb_fc1 = {} # 主播分成明细表1 zb_fc2 = {} # 主播分成明细表2 zb_fc = {} # 按主播统计的分成表 fc_huiz = {} # 主播分成汇总表 # 以下列表对应：娱乐签约主播、游戏签约主播 ent_anchor = for j in ent_sign: ent_anchor.append(j ) for k in game_anchor_sign: game_anchor.append(k ) # 主播上档表:包含娱乐、游戏的所有签约主播 for d in anchor_live: start = time.clock() if d in official_guild: d.update(room_type = 'official_guild') elif d in game_guild: d.update(room_type='game_guild') elif d in ent_guild: d.update(room_type='ent_guild') else: d.update(room_type='other') room_id = d room_type = d uid = d livetype = int(d ) terminal = 'all' if livetype == 0: template_type = 'ent' elif livetype == 1: template_type = 'game' else: template_type = 'other' date = Normaltime(Localtime(int(d ))).strftime('%Y-%m-%d') t = str(date)+'_'+ terminal # 所有主播的上档 m = str(date)+'_'+template_type # 按模板类型统计 f = str(date)+'_'+room_type # 按房间类型统计 r = str(date)+'_'+str(room_id) # 按房间统计 j1 = str(date)+'_'+template_type+'_'+str(room_type) # 按模板房间类型交叉统计 j2 = str(date)+'_'+template_type+'_'+str(room_id) # 按模板房间交叉统计 if date == ydate1: if (livetype ==0 and uid in ent_anchor) or (livetype ==1 and uid in game_anchor): #总体 sta(d, zb_live, t, date, 'uid', 'all', 'all', 'all', 'total_num') if (livetype ==0 and uid in ent_anchor) or (livetype ==1 and uid in game_anchor): #房间类型总体 sta(d, zb_live, f, date, 'uid', room_type, 'all', 'all', 'total_num') if (livetype ==0 and uid in ent_anchor) or (livetype ==1 and uid in game_anchor): #房间总体 sta(d, zb_live, r, date, 'uid', room_id, 'all', 'all', 'total_num') if livetype == 0 and uid in ent_anchor: sta(d, zb_live, m, date, 'uid', 'all', 'ent', 'all', 'total_num') sta(d, zb_live, f, date, 'uid', room_type, 'ent', 'all', 'total_num') sta(d, zb_live, r, date, 'uid', room_id, 'ent', 'all', 'total_num') sta(d, zb_live, j1, date, 'uid', room_type, 'ent', 'all', 'total_num') sta(d, zb_live, j2, date, 'uid', room_id, 'ent', 'all', 'total_num') if livetype == 1 and uid in ent_anchor: sta(d, zb_live, m, date, 'uid', 'all', 'game', 'all', 'total_num') sta(d, zb_live, f, date, 'uid', room_type, 'game', 'all', 'total_num') sta(d, zb_live, r, date, 'uid', room_id, 'game', 'all', 'total_num') sta(d, zb_live, j1, date, 'uid', room_type, 'game', 'all', 'total_num') sta(d, zb_live, j2, date, 'uid', room_id, 'game', 'all', 'total_num') end = time.clock() print 'log(%s): Read: %f s' % ('zb_live', (end - start)) temp1 = {} for i in zb_live.keys(): temp1 = {} for k in zb_live .keys(): if type(zb_live ) != set: temp1 = zb_live live_result = temp1.values() ''' for i in live_result: print i ''' # 物品获得表:核算主播付费礼物的收入,游戏主播需要乘以与OW的分成比例 ig_vars = {'uid':'int','gain_time' : 'time', 'item_id' :'str','item_type' :'str', 'count':'int', 'coin_cost':'str',\ 'room_id':'int', 'template_type':'str','anchor_uid':'int'} item_gain = read_cc('item_gain', , ig_vars,-1) # 主播付费c券及金币的收入表:包含娱乐、游戏的所有签约主播 for d in item_gain: start = time.clock() date = d .strftime('%Y-%m-%d') room_id = int(d ) item_id = str(d ) template_type = str(d ) uid = int(d ) anchor_uid = int(d ) if d.has_key('pquan') == False: pquan = 0 else: pquan = int(d ) if d.has_key('goldcoin') == False: goldcoin = 0 else: goldcoin = int(d ) if item_id in ('game_60060','ent_60060','game_60332','ent_60332','game_60330','ent_60330','game_60062','ent_60062'): d.update(gift_type='guizu') elif item_id in ('game_60064','ent_60064'): d.update(gift_type='shouhu') elif item_id in ('game_60061','ent_60061','game_60333','ent_60333','game_60331','ent_60331','game_60063','ent_60063'): d.update(gift_type='guizu_xf') elif item_id in ('game_60205','ent_60205'): d.update(gift_type='shouhu_xf') else: d.update(gift_type='comn') gift_type = d i = str(date)+'_'+str(anchor_uid) # 按日期、主播 # 先统计每个主播的数据，再判断是娱乐主播还是游戏主播 if date == ydate1: if i not in zb_income: zb_income = {} zb_income = datetime.datetime(int(date.split('-') ),int(date.split('-') ),int(date.split('-') ),0,0,0) zb_income = anchor_uid zb_income = template_type zb_income = room_id zb_income = 'all' zb_income = ' ' for var in ('total_income', 'gz_income', 'sh_income', 'gzxf_income','shxf_income', 'gift_income'):\ zb_income = 0 if gift_type == 'guizu': zb_income += round(float(pquan+goldcoin)/1000,2) elif gift_type == 'shouhu': zb_income += round(float(pquan+goldcoin)/1000,2) elif gift_type == 'guizu_xf': zb_income += round(float(pquan+goldcoin)/1000,2) elif gift_type == 'shouhu_xf': zb_income += round(float(pquan+goldcoin)/1000,2) elif gift_type == 'comn': zb_income += round(float(pquan+goldcoin)/1000,2) zb_income = round(zb_income +zb_income +\ zb_income +zb_income +zb_income ,2) if zb_income 100: zb_income = '100元' elif zb_income 500: zb_income = '100~500元' elif zb_income 1000: zb_income = '500~1000元' elif zb_income 5000: zb_income = '1000~5000元' elif zb_income 10000: zb_income = '5000~10000元' else: zb_income = '10000元以上' end = time.clock() print 'log(%s): Read: %f s' % ('zb_income', (end - start)) # 主播收入汇总表 for d in zb_income.keys() : start = time.clock() uid = zb_income .get('uid') total_income = zb_income .get('total_income') if total_income 0: if uid in game_anchor or uid in ent_anchor: date = zb_income .strftime('%Y-%m-%d') room_id = zb_income template_type = zb_income terminal = 'all' if room_id in official_guild: room_type = 'official_guild' elif room_id in game_guild: room_type = 'game_guild' elif room_id in ent_guild: room_type = 'ent_guild' else: room_type = 'other' k = str(date)+'_'+terminal # 整体收入 m = str(date)+'_'+template_type # 模板收入 f = str(date)+'_'+room_type # 房间类型收入 r = str(date)+'_'+str(room_id) # 房间收入 j1 = str(date)+'_'+str(room_type)+'_'+template_type # 模板房间类型的交叉分布 j2 = str(date)+'_'+str(room_id)+'_'+template_type # 模板房间的交叉分布 for i in (k, m, f, r, j1, j2): if i not in income_huiz: income_huiz = {} income_huiz = datetime.datetime(int(date.split('-') ),int(date.split('-') ),int(date.split('-') ),0,0,0) income_huiz = uid income_huiz = zb_income .get('terminal') income_huiz = set() income_huiz = 0 for var in ('total_income', 'gz_income', 'sh_income', 'gzxf_income','shxf_income', 'gift_income'):\ income_huiz = 0 if i == k: income_huiz = 'all' income_huiz = 'all' if i == m: income_huiz = 'all' income_huiz = template_type if i == f: income_huiz = room_type income_huiz = 'all' if i == r: income_huiz = room_id income_huiz = 'all' if i == j1: income_huiz = room_type income_huiz = template_type if i == j2: income_huiz = room_id income_huiz = template_type for var in ('total_income', 'gz_income', 'sh_income', 'gzxf_income','shxf_income', 'gift_income'):\ income_huiz += round(zb_income ,2) income_huiz .add(zb_income ) income_huiz = len(income_huiz ) end = time.clock() print 'log(%s): Read: %f s' % ('income_huiz', (end - start)) temp2 = {} for i in income_huiz.keys(): temp2 = {} for k in income_huiz .keys(): if type(income_huiz ) != set: temp2 = income_huiz income_result = temp2.values() ''' for i in income_result: print i ''' for dict in income_result: mongdb_dm.anchor_income.insert(dict) distribute_huiz = {} # 主播收入区间汇总表 for d in zb_income.keys(): start = time.clock() date = zb_income uid = zb_income template_type = zb_income room_id = zb_income income_group = zb_income if room_id in official_guild: room_type = 'official_guild' elif room_id in game_guild: room_type = 'game_guild' elif room_id in ent_guild: room_type = 'ent_guild' else: room_type = 'other' s = str(date)+'_'+income_group # 整体收入分布 ms = str(date)+'_'+income_group+'_'+template_type # 按模板收入区间的分布 fs = str(date)+'_'+income_group+'_'+room_type # 按房间类型收入区间的分布 rs = str(date)+'_'+income_group+'_'+str(room_id) # 按房间收入区间的分布 js1 = str(date)+'_'+income_group+'_'+template_type+'_'+room_type # 模板房间类型的交叉收入区间的分布 js2 = str(date)+'_'+income_group+'_'+template_type+'_'+str(room_id) # 模板房间的交叉收入区间的分布 if zb_income .get('total_income') 0: for i in (s, ms, fs, rs, js1, js2): if i not in distribute_huiz: distribute_huiz = {} distribute_huiz = date distribute_huiz = 'all' distribute_huiz = set() distribute_huiz = 0 distribute_huiz = 0 distribute_huiz = 0 if i == s: distribute_huiz = 'all' distribute_huiz = 'all' distribute_huiz = income_group if i == ms: distribute_huiz = template_type distribute_huiz = 'all' distribute_huiz = income_group if i == fs: distribute_huiz = 'all' distribute_huiz = room_type distribute_huiz = income_group if i == rs: distribute_huiz = 'all' distribute_huiz = room_id distribute_huiz = income_group if i == js1: distribute_huiz = template_type distribute_huiz = room_type distribute_huiz = income_group if i == js2: distribute_huiz = template_type distribute_huiz = room_id distribute_huiz = income_group distribute_huiz .add(zb_income ) distribute_huiz = len(distribute_huiz ) distribute_huiz += round(zb_income ,2) end = time.clock() print 'log(%s): Read: %f s' % ('distribute_huiz', (end - start)) temp3 = {} for i in distribute_huiz.keys(): temp3 = {} for k in distribute_huiz .keys(): if type(distribute_huiz ) != set: temp3 = distribute_huiz distribute_total = temp3.values() ''' for i in distribute_total: print i ''' for dict in distribute_total: mongdb_dm.anchor_income_dist.insert(dict) # 货币获得表：核算主播分成数据及后台监控数据 cg_vars = {'gain_time':'time', 'uid':'int', 'coin_type':'str', 'coin_num':'int', 'room_id':'int', 'template_type':'str', 'reason_type':'int'} coin_gain = read_cc('coin_gain', , cg_vars,-1) for d in coin_gain: start = time.clock() date = d .strftime('%Y-%m-%d') template_type = str(d ) room_id = int(d ) uid = int(d ) i = str(date)+'_'+str(uid) #按日期、主播 # 娱乐主播礼物分成，娱乐主播及游戏主播的贵族、守护、守护续费是当天记录当天的数据 if date == ydate1 and room_id != -2: if i not in zb_fc1: zb_fc1 = {} zb_fc1 = datetime.datetime(int(date.split('-') ),int(date.split('-') ),int(date.split('-') ),0,0,0) zb_fc1 = uid zb_fc1 = 'all' for var in ('gift_fenc_ent', 'ow_gift_fenc_ent','ow_gift_fcday_ent', 'gz_fenc_ent', 'sh_fenc_ent', 'shxf_fenc_ent','del_notaxepay_ent','del_taxepay_ent', \ 'update_notaxepay_ent', 'update_taxepay_ent','gift_fcday_ent','gift_fenc_game','gift_fcday_game', 'ow_gift_fenc_game','ow_gift_fcday_game', 'gz_fenc_game', 'sh_fenc_game', 'shxf_fenc_game','del_notaxepay_game','del_taxepay_game', \ 'update_notaxepay_game', 'update_taxepay_game'): zb_fc1 = 0 if d == 8 and d == 'ent_notaxepay': zb_fc1 += round(float(d )/1000,2) if d == 13 and d == 'ent_notaxepay ': zb_fc1 += round(float(d )/1000,2) if d == 13 and d == 'gold_ingot ': zb_fc1 += round(float(d )/1000,2) if d == 14 and d == 'ent_notaxepay': zb_fc1 += round(float(d )/1000,2) if d == 14 and d == 'gold_ingot': zb_fc1 += round(float(d )/1000,2) if d == 15 and d == 'ent_notaxepay': zb_fc1 += round(float(d )/1000,2) if d == 15 and d == 'gold_ingot': zb_fc1 += round(float(d )/1000,2) if d == 27 and d == 'ent_notaxepay': zb_fc1 += round(float(d )/1000,2) if d == 34 and d == 'ent_notaxepay': zb_fc1 += round(float(d )/1000,2) if d == 35 and d == 'ent_taxepay': zb_fc1 += round(float(d )/1000,2) end = time.clock() print 'log(%s): Read: %f s' % ('zb_fc1', (end - start)) # 娱乐主播礼物分成日结奖励，以及游戏主播的礼物分成、日结奖励数据为当天记录昨天的数据，游戏佣金与人民币的转换100：1 coin_gain2 = read_cc('coin_gain', , cg_vars,0) for d in coin_gain2: start = time.clock() date = (d -datetime.timedelta(days=1)).strftime("%Y-%m-%d") template_type = str(d ) room_id = int(d ) uid = int(d ) i = str(date)+'_'+str(uid) # 按日期、主播;日结奖励及游戏主播的分成收入room_id=-2 if date == ydate1 and d == -2: if i not in zb_fc2: zb_fc2 = {} zb_fc2 = datetime.datetime(int(date.split('-') ),int(date.split('-') ),int(date.split('-') ),0,0,0) zb_fc2 = uid zb_fc2 = 'all' for var in ('gift_fenc_ent', 'ow_gift_fenc_ent','ow_gift_fcday_ent', 'gz_fenc_ent', 'sh_fenc_ent', 'shxf_fenc_ent','del_notaxepay_ent','del_taxepay_ent', \ 'update_notaxepay_ent', 'update_taxepay_ent','gift_fcday_ent','gift_fenc_game','gift_fcday_game', 'ow_gift_fenc_game','ow_gift_fcday_game', 'gz_fenc_game', 'sh_fenc_game', 'shxf_fenc_game','del_notaxepay_game','del_taxepay_game', \ 'update_notaxepay_game', 'update_taxepay_game'): zb_fc2 = 0 if d == 9 and d == 'ent_notaxepay': zb_fc2 += round(float(d )/2000.0,2) if d == 10: zb_fc2 += round(float(d /2000.0), 2) if d == 11: zb_fc2 += round(float(d /2000.0), 2) if d == 12: zb_fc2 += round(float(d /2000.0), 2) if d == 28: zb_fc2 += round(float(d /2000.0), 2) if d == 29 and d == 'game_notaxepay': zb_fc2 += round(float(d /200.0), 2) if d == 30 and d == 'game_taxepay': zb_fc2 += round(float(d /200.0), 2) if d == 36 and d == 'game_notaxepay': zb_fc2 += round(float(d /200.0), 2) if d == 37 and d == 'game_taxepay': zb_fc2 += round(float(d /200.0), 2) end = time.clock() print 'log(%s): Read: %f s' % ('zb_fc2', (end - start)) zb_fc = union_dict(zb_fc1,zb_fc2) for d in zb_fc.keys(): start = time.clock() zb_fc .update(gift_fenc=zb_fc .get('gift_fenc_ent')+zb_fc .get('gift_fenc_game')) zb_fc .update(gift_fcday=zb_fc .get('gift_fcday_ent')+zb_fc .get('gift_fcday_game')) zb_fc .update(ow_gift_fenc=zb_fc .get('ow_gift_fenc_ent')+zb_fc .get('ow_gift_fenc_game')) zb_fc .update(ow_gift_fcday=zb_fc .get('ow_gift_fcday_ent')+zb_fc .get('ow_gift_fcday_game')) zb_fc .update(gz_fenc=zb_fc .get('gz_fenc_ent')+zb_fc .get('gz_fenc_game')) zb_fc .update(sh_fenc=zb_fc .get('sh_fenc_ent')+zb_fc .get('sh_fenc_game')) zb_fc .update(shxf_fenc=zb_fc .get('shxf_fenc_ent')+zb_fc .get('shxf_fenc_game')) zb_fc .update(update_notaxepay=zb_fc .get('update_notaxepay_ent')+zb_fc .get('update_notaxepay_game')) zb_fc .update(update_taxepay=zb_fc .get('update_taxepay_ent')+zb_fc .get('update_taxepay_game')) zb_fc .update(del_notaxepay=zb_fc .get('del_notaxepay_ent')+zb_fc .get('del_notaxepay_game')) zb_fc .update(del_taxepay=zb_fc .get('del_taxepay_ent')+zb_fc .get('del_taxepay_game')) zb_fc .update(total_fenc=zb_fc .get('gift_fenc')+zb_fc .get('gift_fcday')+zb_fc .get('ow_gift_fenc')+zb_fc .get('ow_gift_fcday')+\ zb_fc .get('gz_fenc')+zb_fc .get('sh_fenc')+zb_fc .get('shxf_fenc')) uid = zb_fc .get('uid') if uid in game_anchor: zb_fc .update(template_type = 'game') zb_fc .update(terminal = 'all') elif uid in ent_anchor: zb_fc .update(template_type = 'ent') zb_fc .update(terminal = 'all') else: zb_fc .update(template_type = 'none') zb_fc .update(terminal = 'all') if zb_fc 50: zb_fc .update(fenc_group= '50元') elif zb_fc 250: zb_fc .update(fenc_group= '50~250元') elif zb_fc 500: zb_fc .update(fenc_group= '250~500元') elif zb_fc 2500: zb_fc .update(fenc_group= '500~2500元') elif zb_fc 5000: zb_fc .update(fenc_group= '2500~5000元') else: zb_fc .update(fenc_group= '5000元以上') end = time.clock() print 'log(%s): Read: %f s' % ('zb_fc', (end - start)) # 主播分成汇总表 for d in zb_fc.keys(): if zb_fc .get('total_fenc') 0: date = zb_fc .strftime('%Y-%m-%d') terminal = zb_fc .get('terminal') template_type = zb_fc .get('template_type') uid = zb_fc .get('uid') k = str(date)+'_'+terminal # 整体分成 m = str(date)+'_'+template_type # 模板分成 for i in (k, m): if i not in fc_huiz: fc_huiz = {} fc_huiz = datetime.datetime(int(date.split('-') ),int(date.split('-') ),int(date.split('-') ),0,0,0) fc_huiz = uid fc_huiz = zb_fc .get('terminal') fc_huiz = set() fc_huiz = 0 fc_huiz = 0 fc_huiz = 0 fc_huiz = 0 fc_huiz = 0 for var in ('total_fenc', 'gift_fenc', 'gift_fcday', 'ow_gift_fenc','ow_gift_fcday', 'gz_fenc', 'sh_fenc', 'shxf_fenc','del_notaxepay','del_taxepay', \ 'update_notaxepay', 'update_taxepay'): fc_huiz = 0 if i == k: fc_huiz = 'all' fc_huiz = 'all' if i == m: fc_huiz = 'all' fc_huiz = template_type for var in ('total_fenc', 'gift_fenc', 'gift_fcday', 'ow_gift_fenc','ow_gift_fcday', 'gz_fenc', 'sh_fenc', 'shxf_fenc','del_notaxepay','del_taxepay', \ 'update_notaxepay', 'update_taxepay'): fc_huiz += zb_fc fc_huiz .add(zb_fc ) fc_huiz = len(fc_huiz ) if fc_huiz 0: fc_huiz += 1 if fc_huiz 0: fc_huiz += 1 if fc_huiz 0: fc_huiz += 1 if fc_huiz 0: fc_huiz += 1 end = time.clock() print 'log(%s): Read: %f s' % ('fc_huiz', (end - start)) temp4 = {} for i in fc_huiz.keys(): temp4 = {} for k in fc_huiz .keys(): if type(fc_huiz ) != set: temp4 = fc_huiz fenc_result = temp4.values() ''' for i in fenc_result: print i ''' for dict in fenc_result: mongdb_dm.anchor_fenc.insert(dict) # 主播分成区间汇总表 anchor_fenc_dist = {} for d in zb_fc.keys(): if zb_fc .get('total_fenc') 0: start = time.clock() date = zb_fc uid = zb_fc template_type = zb_fc .get('template_type') fenc_group = zb_fc .get('fenc_group') f = str(date)+'_'+fenc_group # 整体分成分布 mf = str(date)+'_'+fenc_group+'_'+template_type # 按模板分成区间的分布 for i in (f,mf): if i not in anchor_fenc_dist: anchor_fenc_dist = {} anchor_fenc_dist = date anchor_fenc_dist = 'all' anchor_fenc_dist =set() anchor_fenc_dist = 0 anchor_fenc_dist = 0 if i == f: anchor_fenc_dist = 'all' anchor_fenc_dist = 'all' anchor_fenc_dist = fenc_group if i == mf: anchor_fenc_dist = template_type anchor_fenc_dist = 'all' anchor_fenc_dist = fenc_group anchor_fenc_dist .add(zb_fc ) anchor_fenc_dist = len(anchor_fenc_dist ) anchor_fenc_dist += round(zb_fc ,2) end = time.clock() print 'log(%s): Read: %f s' % ('anchor_fenc_dist', (end - start)) temp5 = {} for i in anchor_fenc_dist.keys(): temp5 = {} for k in anchor_fenc_dist .keys(): if type(anchor_fenc_dist ) != set: temp5 = anchor_fenc_dist distribute_total2 = temp5.values() ''' for i in distribute_total2: print i ''' for dict in distribute_total2: mongdb_dm.anchor_fenc_dist.insert(dict) for dict in live_result: mongdb_dm.anchor_live.insert(dict) # 三张明细表： # 每个主播的上档明细表：live_detail for d in anchor_live2: start = time.clock() room_id = d uid = d livetype = int(d ) n = 0 while (n3) : try: url = 'http://uid.cc.163.com/userprofile?uid=%d' % uid info = geturl(url) info1 = info.replace("null",'"null"') break except IOError: info1 = '{}' n += 1 time.sleep(0.5) if info1 not in : nickname = info1.split('"nickname":') .split(',"hiskin":') .split('"') .split('"') ccid = info1.split('"cuteid":') .split(',"city"') else: nickname = 'None' ccid = 'None' date = Normaltime(Localtime(int(d ))).strftime('%Y-%m-%d') if livetype == 0: template_type = 'ent' elif livetype == 1: template_type = 'game' else: template_type = 'other' j = str(date)+'_'+str(uid) # 按日期、主播汇总 if date == ydate1: if (livetype == 0 and uid in ent_anchor) or (livetype == 1 and uid in game_anchor): if j not in live_detail: live_detail ={} live_detail = datetime.datetime(int(date.split('-') ),int(date.split('-') ),int(date.split('-') ),0,0,0) live_detail = uid live_detail = nickname live_detail = ccid live_detail = 0 live_detail = 0 live_detail = room_id live_detail = template_type live_detail += d end = time.clock() print 'log(%s): Read: %f s' % ('live_detail', (end - start)) # 每个主播的收入明细表：与是否上档没关系 income_detail = {} for d in zb_income: if zb_income .get('total_income') 0: start = time.clock() date = zb_income .get('date') room_id = zb_income .get('room_id') template_type = zb_income .get('template_type') uid = zb_income .get('uid') terminal = 'all' n = 0 while (n3) : try: url = 'http://uid.cc.163.com/userprofile?uid=%d' % uid info = geturl(url) info1 = info.replace("null",'"null"') break except IOError: info1 = '{}' n += 1 time.sleep(0.5) if info1 not in : nickname = info1.split('"nickname":') .split(',"hiskin":') .split('"') .split('"') ccid = info1.split('"cuteid":') .split(',"city"') else: nickname = 'None' ccid = 'None' if room_id in official_guild: room_type = 'official_guild' elif room_id in game_guild: room_type = 'game_guild' elif room_id in ent_guild: room_type = 'ent_guild' else: room_type = 'other' k = str(date)+'_'+terminal+'_'+str(uid) # 整体前十的展现 m = str(date)+'_'+template_type+'_'+str(uid) # 模板前十的展现 f = str(date)+'_'+room_type+'_'+str(uid) # 房间类型前十的展现 r = str(date)+'_'+str(room_id)+'_'+str(uid) # 房间前十的展现 j1 = str(date)+'_'+str(room_type)+'_'+template_type+'_'+str(uid) # 模板房间类型的前十展现 j2 = str(date)+'_'+str(room_id)+'_'+template_type+'_'+str(uid) # 模板房间的前十的展现 for i in (k, m, f, r, j1, j2): if i not in income_detail: income_detail = {} income_detail = date income_detail = uid income_detail = nickname income_detail = ccid income_detail = template_type income_detail = room_id income_detail = 'all' for var in ('total_income', 'gz_income', 'sh_income', 'gzxf_income','shxf_income', 'gift_income'):\ income_detail = 0 if i == k: income_detail = 'all' income_detail = 'all' if i == m: income_detail = 'all' income_detail = template_type if i == f: income_detail = room_type income_detail = 'all' if i == r: income_detail = room_id income_detail = 'all' if i == j1: income_detail = room_type income_detail = template_type if i == j2: income_detail = room_id income_detail = template_type if gift_type == 'guizu': income_detail += round(float(pquan+goldcoin)/1000,2) elif gift_type == 'shouhu': income_detail += round(float(pquan+goldcoin)/1000,2) elif gift_type == 'guizu_xf': income_detail += round(float(pquan+goldcoin)/1000,2) elif gift_type == 'shouhu_xf': income_detail += round(float(pquan+goldcoin)/1000,2) elif gift_type == 'comn': income_detail += round(float(pquan+goldcoin)/1000,2) for var in ('total_income', 'gz_income', 'sh_income', 'gzxf_income','shxf_income', 'gift_income'):\ income_detail += zb_income end = time.clock() print 'log(%s): Read: %f s' % ('income_detail', (end - start)) temp6 = {} for i in income_detail.keys(): temp6 = {} for k in income_detail .keys(): if type(income_detail ) != set: temp6 = income_detail detail_result = temp6.values() ''' for i in detail_result: print i ''' for dict in detail_result: mongdb_dm.anchor_income_detail.insert(dict) # 主播分成明细表：与上档表无关 fc_detail = {} for d in zb_fc.keys(): start = time.clock() if zb_fc .get('total_fenc') 0: date = zb_fc uid = int(zb_fc ) template_type = zb_fc .get('template_type') terminal = 'all' n = 0 while (n3): try: url = 'http://uid.cc.163.com/userprofile?uid=%d' % uid info = geturl(url) info1 = info.replace("null",'"null"') break except IOError: info1 = '{}' n += 1 time.sleep(0.5) if info1 not in : nickname = info1.split('"nickname":') .split(',"hiskin":') .split('"') .split('"') ccid = info1.split('"cuteid":') .split(',"city"') else: nickname = 'None' ccid = 'None' k = str(date)+'_'+terminal+'_'+str(uid) # 整体前十的展现 m = str(date)+'_'+template_type+'_'+str(uid) # 模板前十的展现 for i in (k, m): if i not in fc_detail: fc_detail = {} fc_detail = date fc_detail = uid fc_detail = nickname fc_detail = ccid fc_detail = 'all' for var in ('total_fenc','gift_fenc', 'gift_fcday', 'ow_gift_fenc','ow_gift_fcday', 'gz_fenc', 'sh_fenc', 'shxf_fenc'): fc_detail = 0 if i == k: fc_detail = 'all' fc_detail = 'all' if i == m: fc_detail = 'all' fc_detail = template_type for var in ('total_fenc','gift_fenc', 'gift_fcday', 'ow_gift_fenc','ow_gift_fcday', 'gz_fenc', 'sh_fenc', 'shxf_fenc'): fc_detail += round(zb_fc ,2) end = time.clock() print 'log(%s): Read: %f s' % ('fc_detail', (end - start)) temp7 = {} for i in fc_detail.keys(): temp7 = {} for k in fc_detail .keys(): if type(fc_detail ) != set: temp7 = fc_detail detail_result2 = temp7.values() ''' for i in detail_result2: print i ''' for dict in detail_result2: mongdb_dm.anchor_fenc_detail.insert(dict)

0 个评论

分享裂变理论模型

accumulation 2015-7-4 21:48

1.7.4 无规颈断裂模型 Whestone为了解释252Cf自发裂变中释放的瞬发中子数随碎片质量变化的锯齿形规律，提出了无规颈裂变的概念，认为在断裂之前，两块核体是由一个相当长的颈子连接着的，如图1.4 所示，其体积通常是不等的，裂变核具有较大的概率在颈子中心附近断开，但也有可能在其它地点断开，形成质量分布，服从统计规律，因此称为无规颈断裂模型。断裂后的碎片的激发能是可以计算的，它决定了每一碎片所发射的中子数。碎片的动能则为断前的初始动能和在库仑场中加速所得动能之和。同样，碎片所释放的γ 射线也可由碎片的激发能和能级密度计算的角动量分布算出。由此可见，一旦断前形状已知，则无规颈断裂模型可以计算所有裂变后现象。 1.7.5 多模式无规颈断裂模型 Brosa等人提出了多模式的无规颈断裂模型（又称Brosa模型），使裂变后现象的定量研究出现了实质性的进展，其工作特点是发展了一套根据势能曲面确定断点形状，进而由断点形状计算碎片质量分布和平均总动能分布等裂变后现象的方法。每一个通道（也就是上述的一个裂变模式）对应一种断点构形，也具有相应的碎片质量分布，动能分布和中子数分布。根据这种理论，唯一要从实验上决定的是裂变按每一通道进行的概率，其他物理量均可由理论计算得到。他们相继计算了227Ac，236U，252Cf和258Fm等几个核的势能曲面，指出这些核从基态到发生裂变，存在几条可能的变形路径，即不同的势能极小通道，并且不同核中的裂变通道是不完全相同的1。

个人分类: 裂变模型|0 个评论

分享 sas进行pls实现

丁兆海4 2015-6-17 07:47

《应用多元统计分析.高惠璇》第十一章偏最小二乘回归分析 /* YYDY1102 */ data d1102; input weight waist pulse chins situps jumps @@; label weight='体重' waist='腰围' pulse='脉博' chins='单杠' situps='仰卧起坐' jumps='跳高'; cards; 191 36 50 5 162 60 189 37 52 2 110 60 193 38 58 12 101 101 162 35 62 12 105 37 189 35 46 13 155 58 182 36 56 4 101 42 211 38 56 8 101 38 167 34 60 6 125 40 176 31 74 15 200 40 154 33 56 17 251 250 169 34 50 17 120 38 166 33 52 13 210 115 154 34 64 14 215 105 247 46 50 1 50 50 193 36 46 6 70 31 202 37 62 12 210 120 176 37 54 4 60 25 157 32 52 11 230 80 156 33 54 15 225 73 138 33 68 2 110 43 ; run; proc pls data=d1102 details details; model weight waist pulse = chins situps jumps; run; proc pls data=d1102 cv=one; model weight waist pulse = chins situps jumps; run; proc pls data=d1102 nfac=2; model weight waist pulse = chins situps jumps / solution solution; run; proc pls data=d1102 nfac=1; model weight waist pulse = chins situps jumps / solution solution; run; ods graphics on; proc pls data=d1102 cv=one; model weight waist pulse = chins situps jumps; run; ods graphics off;

个人分类: 偏最小二乘回归分析|24 次阅读|0 个评论

分享金融学统计方法

accumulation 2015-6-7 23:51

描述统计；线性回归与非线性回归方法及相关参数的检验；多元统计方法（包括聚类分析、主成分分析、因子分析、典型相关分析）；非参数估计；计量经济模型；时间序列及参数检验；协整；单整分整理论；极值理论；包络分析理论；神经网络分析；状态空间模型（state-space mode)；向量自回归模型（VAR）；自回归条件异方差（ARCH）类；贝叶斯方法。 EMD方法，SSA方法，支持向量基方法，粗糙集方法

个人分类: 金融工程|0 个评论

分享 DW统计量与自相关性

accumulation 2015-5-12 11:29

德宾-沃森（Durbin-Watson）检验。德宾-沃森检验,简称D-W检验，是目前检验自相关性最常用的方法，但它只使用于检验一阶自相关性。因为自相关系数ρ的值介于-1和1之间，所以 0≤DW≤４并且DW＝O＝＞ρ＝１　　即存在正自相关性 DW＝４＜＝＞ρ＝－１　即存在负自相关性 DW＝２＜＝＞ρ＝０　　即不存在（一阶）自相关性　　因此，当DW值显著的接近于O或４时，则存在自相关性，而接近于２时，则不存在（一阶）自相关性。这样只要知道ＤＷ统计量的概率分布，在给定的显著水平下，根据临界值的位置就可以对原假设Ｈ０进行检验。　　如果计量经济模型经检验存在自相关性,首先应分析模型是否遗漏了重要的解释变量，其次是模型的函数形式是否适当。如果还不能解决问题，则可通过广义差分变换、迭代法和广义最小二乘法等方法来消除其不利影响。

个人分类: 金融工程|0 个评论

分享交互项问题

乌小鱼子爱统计 2015-4-28 13:40

以下为楼主学习交互项笔记，具体参见谢宇：《回归分析》，社会科学文献出版社， 2010 第 13 章。 1 、为什么做交互？解决条件效应，即某个自变量对因变量的作用很可能依赖于其他自变量的取值。例如，月收入 --- 化妆品支出与消费者的性别有关可以回答例如这类问题：男性的月收入对化妆品支出的影响是否要小于女性月收入对化妆品的影响？低次项不显著也可以进一步做交互，只要研究假设需要。 2 、交互的几种形式？ 2.1 虚拟变量 * 虚拟变量例如性别（男 =0 女 =1 ）与是否具有高中或以上学历（有 =1 无 =0 ）对收入对数的影响性别的作用取决于劳动者是否具有高中及以上学历，而学历的作用则取决于劳动者是否为女性。 2.2 连续变量 * 虚拟变量例如性别 ( 男 =0 女 =1) 与 exp( 工作年限 ) 对收入对数的影响如果 sex*exp 系数为负，则说明男性的工作年限对收入的影响大于女性工作年限对收入的影响，即工作年限对收入的影响存在性别差异。 2.3 连续变量 * 连续变量：两个不同的连续变量和某个连续变量的平方 2.3.1 exp （工作年限） *grossd( 工业总产值增长率 ) 如果系数为负的话，说明工业总产值增长率对于个人收入的作用和工作年限对个人收入的作用间存在着相互削弱的关系 2.3.2 exp2=exp*exp 一些变量如年龄、工作年限对收入的影响是非线性的，在身体慢慢成长的过程中收入逐步上升，年老收入逐步下降，呈现倒 U 型，因此，研究中引入年龄、工作年限的二次项来描述这种与二次曲线有关的非线性作用。 3 、如何评估交互项有作用？利用嵌套模型检验交互项的存在嵌套之前的模型残差 SSe1 自由度 df1 ; 嵌套之后残差 SSe2 自由度 df2 零假设交互项的偏系数 =0 备择假设交互项的偏系数不为 0 F= /(SSe2/df2) 与 F(0.95,1,df2) 相比。 4 、交互项与低次项的关系 4.1 低次项与交互项为什么要同时放入模型？可不可以删掉低次项？要同时放入，不能删掉低次项。我们应尽量在模型中保留交互项的低此项，否则很有可能产生似是而非的结论（ ClearyKessler,1982 ）。研究中常常遇到的一种情况是，在放入交互项之后，原来的低次项变的不显著了，那么是否需要删除呢？公式原理参见（谢宇：《回归分析》，社会科学文献出版社， 2010:244-245 ）结论是：（ 1 ）如果某个自变量是交互项涉及变量中的低次项，则该自变量在统计上是否显著并不能作为将该变量纳入或剔除出回归模型的依据，因为这种显著（或不显著）是可以通过对另一个低次项加上某个特定常数而人为构造的。（ 2 ）先做不放交互项的模型，再做有交互项的模型。在低此项作用显著的情况下，可以进一步验证由那些低此项构造的交互项对于因变量的作用是否显著。为了使回归模型对交互项的估计保持一致，我们需要将交互项的所有低次项都放入模型。 4.2 如何处理低此项与交互项的共线性？两个连续变量的交互，做对中处理。例如 edu*exp 教育与工作经验的交互，如果 exp 与 edu*exp 选项存在较强的共线性（ pwcorr exp edu*exp , 模型共线性命令： vif 当 vif 的最大值大于 10 ，同时各 vif 的平均值大于 1 时，表明多重共线性比较严重。使用 vif 命令时一定要在回归命令执行以后再用。其他如做时间序列数据的回归也需要注意多重共线性（当样本量较小时，例如小于 100 ）和序列相关性，需要考察 t 统计值、 R2 （ adj-R2 ）、 F 统计量、 D.W. 值。）。解决共线性做法：对中处理法：将低次项减去样本均值后再构造交互项，同时将减去均值后的低次项带入回归模型，即 edu=edu-mean(edu) exp=exp-mean(exp), 模型中将 exp 、 edu 、（ exp ） * （ edu ）带入。

个人分类: 统计|43 次阅读|0 个评论

分享金融计量学—统计量

accumulation 2015-4-25 13:38

AIC最小原则是判定模型好坏标准之一，犹如R2（R平方）一样。 AIC和SC(舒瓦茨信息）常常一并作为判断模型拟合程度的标准之一，特别是在滞后阶数的选择上。比如说，一个VAR（向量自回归模型），经济理论往往无法确定滞后阶数，这时往往采用AIC或者SC最小原则，即观察不同的阶数的VAR模型，哪个模型的AIC或者SC值最小就选用哪个模型进行分析。 AIC、SC都会在模型参数中给出。除了R2、AIC、 SC之外，常用的判断标准还有Lg（极大似然法则）等。这些法则主要用在同一模型不同滞后阶数选择的判断上。 AIC和BIC是同一个指标，一般用于选择模型，也就是模型的比较优劣他们的不同之处在于 AIC=-2 ln( L ) + 2 k 中文名字：赤池信息量 akaike information criterion BIC= -2 ln( L ) + ln(n)*k 中文名字：贝叶斯信息量 bayesian information criterion HQ= -2 ln( L ) + ln(ln(n))*k hannan-quinn criterion 构造这些统计量所遵循的统计思想是一致的，就是在考虑拟合残差的同时，依自变量个数施加“惩罚”。但，倘若因此就说它们是同一个指标，恐怕还是有些不妥，毕竟“惩罚”的力度还是不尽相同的。此外，这些信息量的用途不仅限于选择模型，还能用于选取合适的变换等等。而在那些时候，这些信息量又是另一个模样，也就是说它们有许多变体。因此，它们也被称为AIC准则、BIC准则等等。它们中的每一个体现的都是一系列的标准，而非单独的一个简单式子。

个人分类: 金融工程|0 个评论

分享建立更加完善的国民经济统计体系

太原张建宏 2015-4-24 18:25

党的十六大以来，中国建筑业总产值从2002年的17116.8亿元，增至2011年的117734亿元，年均增长率高达20%以上，实现了行业总产值增长近七倍的惊人飞跃。这期间，在中央加快转变经济增长方式、调整经济结构、促进经济平稳较快发展的一系列政策的作用下，民生工程、基础设施、生态环境建设的步伐不断加快，铁路、公路、机场建设齐头并进，房地产投资过快得到有效控制，保障房建设和棚户区改造工程持续推进，建筑业占GDP比重稳步上升。2012年仅上半年，全国建筑业总产值就达到50460亿元，是2002年全年的近3倍。建筑业本身就是国家的支柱产业，建筑业是专门从事土木工程、房屋建设和设备安装以及工程勘察设计工作的生产部门。其产品是各种工厂、矿井、铁路、桥梁、港口、道路、管线、住宅以及公共设施的建筑物、构筑物和设施。建筑业的上游行业种类繁多,包括钢铁、水泥、砖瓦、建筑陶瓷、平板玻璃、铝材加工、化工、纺织、五金、电梯等行业。建筑业的下游相关行业主要分为三类:房屋建筑业的相关行业为房地产行业,市政基础设施建设的相关行业为各地市政工程建设行业,交通基础设施建设的相关行业为各种交通运输业。目前的情况是，建筑业由于房地产业下滑，因为现在的无房族对购买房产望而兴叹，他们的钱顾了生活就不错了，因此房地产业不景气；国家对棚户区改造不再有以前那么大规模建设；汶川地震震后大规模的建设及玉树水灾的灾后建筑也不存在；各省援疆也已经到了收尾阶段；新农村建筑也已到了后期，因此建筑业出现了大规模的缩减；由于土地面积过于紧张，中国不得不减少公路桥梁的建设。由于建筑业的缩减影响到了其上下游产业的发展，产生了许多产能过剩行业包括钢铁、水泥、建筑陶瓷、平板玻璃、铝材加工、化工、纺织、电梯等行业。如果一味采取市场为导向，不介入ZF干预，其后果就是建筑业及其上下游产业大规模的缩减。建筑业固然应该缩减，但是应该让其在不严重影响经济总体下，适量地进行减少建筑业及其上下游产业的规模；选择适当数量的水利项目进行建设，要说明的一点，在建设项目总量控制要平稳建设。到2011年底，全国共有建筑业企业70414个，比上年减少1449个；从业人数4311.1万人。以下是工信部文件：按照《工业和信息化部关于下达 2014 年工业行业淘汰落后和过剩产能目标任务的通知》(工信部产业〔2014〕148号)要求，各省、自治区、直辖市已将2014年工业行业淘汰落后和过剩产能目标任务分解落实到企业，并在当地政府门户网站公告了相关企业名单。公告显示，炼铁、炼钢、焦炭、铁合金、电石、电解铝、铜(含再生铜)冶炼、铅(含再生铅)冶炼、水泥(熟料及磨机)、平板玻璃、造纸、制革、印染、化纤、铅蓄电池(极板及组装)等十五大工业行业进入淘汰落后和过剩产能企业名单。其中，炼铁44家、炼钢30家、焦炭44家、铁合金164家、电石40家、电解铝7家、铜(含再生铜)冶炼43家、铅(含再生铅)冶炼12家、水泥(熟料及磨机)381家、平板玻璃15家、造纸221家、制革27家、印染107家、化纤4家、铅蓄电池(极板及组装)39家。工信部要求，有关省(区、市)要采取有效措施，力争在2014年9月底前关停列入公告名单内企业的生产线(设备)，确保在2014年年底前彻底拆除淘汰，不得向其他地区转移，并按照《关于印发淘汰落后产能工作考核实施方案的通知》(工信部联产业〔2011〕46号)要求，做好对淘汰落后产能企业的现场检查验收和发布任务完成公告工作。以下是我的观点：在对待建筑业，我们可以开发出一些项目使其规模缓慢缩减，我们可以开发旧城改造；县改市建设项目；对旧公共设施进行换新或改造；对县基础设施及建设项目进行开发，之后我们可以实施城市化建设。在道桥方面我们可以对城市地下轨道建设；对旧有公路桥梁进行改造，还有一项就是针对建筑业等产业的缩减可以让一部分年老工作人员提前退休。能选择的项目我列出一些，希望大家能多提意见。以上文章说明了中国现阶段经济发展的主要障碍和弊病，是建筑业与基础建设不可持续及一些行业生产过剩造成中国经济的不可持续性的。如何在繁杂的统计数据中发现产业发展状况，就必须对各个企业的统计数据进行调查分析以及早的发现并采取措施，我国现在采用的SNA框架下核算体系模式，当然利用SNA核算体系进行核算，可以提供国民经济核算所需的有关数据，但是通过以上对国民经济分产业进行分析SNA就有此力不从心，而修正以后的MPS核算体系可以将统计分析深入到企业。中国现在已经编制出《产业经济分类注释》，我们可以根据分类编码对企业进行分类调查分析，我们可以继续用SNA体系核算对宏观经济进行分析调控，在此基础上，我们可以应用修正以后的MPS提供的数据进行调查、分析、采取措施。也就是将修正后的MPS做为国民经济管理的辅助统计。现在时代不同了，大规模的数据处理及运算都足以辅助统计深入到企业，因此也可以使国民经济管理有的放矢，取得更好的效果。

个人分类: 经济观点|63 次阅读|0 个评论

标签: 统计经管大学堂：名校名师名课

相关日志