西西河

主题:葫芦僧乱判葫芦案 -- 煮酒正熟

共:💬133 🌺93 新:
分页树展主题 · 全看首页 上页
/ 9
下页 末页
                        • 家园 Don't get me wrong

                          I was not mad at you, or anyone. Not at all.

                          I'm also using SAS everyday, but not on statistical analysis. Our advanced analytics team folks also use SAS, and I believe the way that they check for nomality, equal variance, linearity, etc. is through running residual analysis. They plot the residuals and observe the distribution by eyes as a preliminary check (QQ plot for checking linearity, for example). Then they would also refer to more 'scientific' criteria such as a bunch of statistics (W for checking normality assumption). I don't remember all those statistics. Learned that at school but forget most of the stuff. But with a statistics textbook I can easily show you the statistics to check for those assumptions.

                          You keep saying that we were not using the comparables as Test and Control, but you never really explain why. You just keep saying "not comparable" like that. But why not comparable? Man, you've got to tell us more, and better yet, show us your design of experiment so that we could see clearly the difference between yours and our company's, and what advantages yours has over ours. Can you do me this favor?

                          • 家园 我已经说了吧?

                            t 检验的原理之一是样本均数和总体均数得比较。你报告时的两个组样本明显不是来自同一个整体。

                            大样本时通常无需正态检验,然而你们的样本特殊,可能要做。同时也还要做方差齐性检验。这些都不是residual analysis 能干的。可以分别在t 检验前 做Shapiro-Wilk 或 Kolmogorov-Smirnov test,和Levene's test,或 Bartlett's test。具体原理要画图讲。你可依找本教科书看。如果方差不齐,你们就只能作秩和检验了,统计结果也当然大打折扣。

                            强调的是,如果样本都是来自同一总体,鉴于你的样本是如此庞大,以上检验都不必做。这就是我说得你们的protocol 本来没错的原因。不幸的是,似乎你们在取样的时候没有考虑到这种特殊情况。

                            • 家园 Still do not agree

                              Why do you say our two groups are not from the same universe? I already explained several times, it's not the 3000 card users against the other 3000. It's the 5000 (among which there're 3000 card users) against the other 3000. Please tell me why they're not from the same "整体"?

                              As for assumptions validation, I believe you can do either way. You could first check the assumptions before running the test, or you could first run the test and then check for assumptions. In the first approach, if the assumptions do not hold, you need to recreate your testing samples, while in the second approach, if the assumptions do not hold, the quality / accuracy of your conclusion is at risk. It all depends on how serious those assumptions are violated, and which assumption is violated. (Normally the equal variance is more critical than normality.)

                              I don't know exactly how our statisticians are handling above problems, but I'm pretty sure they will exercise their professionalism in due course. How do you know they didn't do their job well? My original article didn't say our statisticians already checked for those assumptions, but this by no means should be interpretated as they did not check.

                              Finally, that VP does not know the first thing about statistics. So by pointing out any errors related to statistical analysis in our job does not prove or support the correctness / appropriateness of his decision of not accepting our proposal. If the VP were a statistician, then your challenge on our statistical analysis would make more sense.

                              • 家园 嘿,老九

                                只要你HAPPY,我骂那个VP是猪都可以。只是没什么用。不能让他和你们做生意。你这个对我有意见,觉得你过了。

                                你们作报告的时候,是3000卡人对3000有意持卡人和可能无意持卡人。现在这个方案是后来的。怎么说呢,信誉有点影响吧。

                                How do you know they didn't do their job well? My original article didn't say our statisticians already checked for those assumptions, but this by no means should be interpretated as they did not check.

                                这里你跟我强辩了吧。我猜你们的protocol根本没这一条,因为正常情况下是不需要做的,原因上贴已讲。如果做了,凭我从你的原始描述和我的统计经验来看,不齐是肯定的。

                                我是从旁观者的角度来想的,人家做到这一级别,从销售起家的,特别是经常和同业一起聚会,只怕对这个什么卡的作用心里还是有一点认识的。应该有什么其它的东西影响她对你们这次数据挖掘的看法。

                                题外话,数据挖掘很热,我们医学也用,成功的不少,闹得笑话也不少。

                                我接小孩去了,咱就不聊了

                • 家园 我再上一个帖子补充了我的观点。

                  可以回答你这个说明。我的关注点并不在于是否真的有差异,而是看你们怎么用统计做出这个差异的。你的解释不能让人释怀。可能我们对统计的理解有一些不同(强调一下统计只有一个,行业不是问题)。我觉得我们都可以慢慢思考一下。

                  另:

                  我们的历史记录现实,T组这三千人,在消费金额上只比C组的三千人高1-3%。所以

                  虽然两组不同质,但差异很小

                  同质均一与否,完全不是你这么做得就可以得出结论的。

                  我们的历史记录现实,T组这三千人,在消费金额上只比C组的三千人高1-3%。所以

                  虽然两组不同质,但差异很小。

                  而用卡以后T组消费金额超过C组28%!这还不说明问题么?

                  还是不能安全地得出结论

                  • 家园 统计学本身当然是不变的,但在各领域内的应用

                    却有可能存在严密性方面的差异。

                    医药领域我了解不多,但也许是比较严密吧。工业领域应该也是比较严密的。营销领域的严密性与医药和工业相比究竟怎么样,我不清楚。但具体来比较两组消费者,你总归要找出一些标准来进行比较吧?否则你怎么进行比较?我们所用的标准就是消费金额,因为这个最客观 (比利润客观)。而且整个行业都是用这个标准。要说整个行业都没有正确运用统计学,这个... 好几十万从业人员,数千的博士,大家都用这套思路,难道都不对?

                    从历史数据来看,A组与C组消费金额经过数量的adjustment后,是大体相同的(微小差异可以理解为noise),而对A施加外力后,A组却比C组高出26%以上。老兄简单一句“不能安全得出结论”,不知在统计学上有何理论依据?

                    • 家园 一套protocol 不能适用所有的统计对象的

                      要说整个行业都不懂统计学,这个... 好几十万从业人员,数千的博士,大家都用这套思路,难道都不对?

                      我决无这个意思。我只是在就事论事。

                      会用统计软件不难,但了解背后的统计思想和原理是必要的。一套protocol 不能适用所有的统计对象的

                      • 家园 还是我说过多次的

                        我们所做的东西,是这个行业里数十家同类公司都在做的东西,所分析的对象情况是一样的,所选择的比较标准也是一样的(都是销售金额),除此之外,就是运用statistical analysis了,那个就是CrystalBall,也是业内非常普遍的统计学软件。

                        所以,如果质疑我们的做法,基本上就是在质疑整个行业多年来的做法了。

                        • 家园 统计细节定非如此

                          我觉得还是不要先拿一个框框为好。就统计谈统计。CrystalBall虽然我不知道是什么,但显然他不会只做t 检验。软件也不会告诉你们怎样挑选样本

                    • 家园 因为你们没有排除variance 的影响

                      A组与C组消费金额经过数量的adjustment后,是大体相同的

                      这是均数比较,不能反映组内变异程度。如果组内变异大的话,两组之间的差异可能就不是处理因素引起的

分页树展主题 · 全看首页 上页
/ 9
下页 末页


有趣有益,互惠互利;开阔视野,博采众长。
虚拟的网络,真实的人。天南地北客,相逢皆朋友

Copyright © cchere 西西河