结论:p值>0.05,可认为来自正态分布的总体。
> ks.test(serumdata,\检验,正态性
One-sample Kolmogorov-Smirnov test
data: serumdata
D = 0.0701, p-value = 0.7097 alternative hypothesis: two-sided
Warning message:
In ks.test(serumdata, \ cannot compute correct p-values with ties
结论:p值>0.05,可认为来自正态分布的总体。
注意,这里的警告信息,是因为数据中有重复的数值,ks检验要求待检数据时连续的,不允许重复值。
Ex3.5
> y<-c(2,4,3,2,4,7,7,2,2,5,4,5,6,8,5,10,7,12,12,6,6,7,11,6,6,7,9,5,5,10,6,3,10) #输入数据 > f<-factor(c(rep(1,11),rep(2,10),rep(3,12))) #因子分类 > plot(f,y,col=\生成箱线图 > x<-c(2,4,3,2,4,7,7,2,2,5,4) > y<-c(5,6,8,5,10,7,12,12,6,6) > z<-c(7,11,6,6,7,9,5,5,10,6,3,10)
> boxplot(x,y,z,names=c(\ #boxplot()生成箱线图
结论:第2和第3组没有显著差异。第1组合其他两组有显著差异。
Ex3.6
数据太多,懒得录入。离散图应该用plot即可。
Ex3.7
> studata<-read.table(\读入数据 > data.frame(studata) #转化为数据框 V1 V2 V3 V4 V5 V6 1 1 alice f 13 56.5 84.0 2 2 becka f 13 65.3 98.0 3 3 gail f 14 64.3 90.0 4 4 karen f 12 56.3 77.0 5 5 kathy f 12 59.8 84.5 6 6 mary f 15 66.5 112.0 7 7 sandy f 11 51.3 50.5
8 8 sharon f 15 62.5 112.5 9 9 tammy f 14 62.8 102.5 10 10 alfred m 14 69.0 112.5 11 11 duke m 14 63.5 102.5 12 12 guido m 15 67.0 133.0 13 13 james m 12 57.3 83.0 14 14 jeffery m 13 62.5 84.0 15 15 john m 12 59.0 99.5 16 16 philip m 16 72.0 150.0 17 17 robert m 12 64.8 128.0 18 18 thomas m 11 57.5 85.0 19 19 william m 15 66.5 112.0
> names(studata)<-c(\给各列命名 stuno name sex age height weight 1 1 alice f 13 56.5 84.0 2 2 becka f 13 65.3 98.0 3 3 gail f 14 64.3 90.0 ...
> attach(studata) #将数据框调入内存
> plot(weight~height,col=\体重对于身高的散点图
> coplot(weight~height|sex,col=\不同性别,体重与身高的散点图 > coplot(weight~height|age,col=\不同年龄,体重与身高的散点图
> coplot(weight~height|age+sex,col=\不同年龄和性别,体重与身高的散点图
Ex3.8
> x<-seq(-2,3,0.05) > y<-seq(-1,7,0.05)
> f<-function(x,y) x^4-2*x^2*y+x^2-2*x*y+2*y^2+4.5*x-4*y+4 > z<-outer(x,y,f) #必须做外积运算才能绘出三维图形
> contour(x,y,z,levels=c(0,1,2,3,4,5,10,15,20,30,40,50,60,80,100),col=\二维等值线
> persp(x,y,z,theta=120,phi=0,expand=0.7,col=\三位网格曲面
Ex3.9
> attach(studata)
> cor.test(height,weight) #Pearson相关性检验
Pearson's product-moment correlation
data: height and weight
t = 7.5549, df = 17, p-value = 7.887e-07
alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.7044314 0.9523101 sample estimates: cor 0.8777852
由此可见身高和体重是相关的。
Ex3.10 Ex3.11
上述两题原始数据太多,网上找不到,懒得录入。略。 Ex4.2
指数分布,λ的极大似然估计是n/sum(Xi)
>
x<-c(rep(5,365),rep(15,245),rep(25,150),rep(35,100),rep(45,70),rep(55,45),rep(65,25))
> lamda<-length(x)/sum(x);lamda [1] 0.05
Ex4.3
Poisson分布P(x=k)=λ^k/k!*e^(-λ)
其均数和方差相等,均为λ,其含义为平均每升水中大肠杆菌个数。 取均值即可。
> x<-c(rep(0,17),rep(1,20),rep(2,10),rep(3,2),rep(4,1)) > mean(x) [1] 1
平均为1个。
Ex4.4 >
obj<-function(x){f<-c(-13+x[1]+((5-x[2])*x[2]-2)*x[2],-29+x[1]+((x[2]+1)*x[2]-14)*x[2]) ;sum(f^2)} #其实我也不知道这是在干什么。所谓的无约束优化问题。
> x0<-c(0.5,-2) > nlm(obj,x0)
$minimum [1] 48.98425
$estimate
[1] 11.4127791 -0.8968052
$gradient
[1] 1.411401e-08 -1.493206e-07
$code [1] 1
$iterations [1] 16
Ex4.5
> x<-c(54,67,68,78,70,66,67,70,65,69)
> t.test(x) #t.test()做单样本正态分布区间估计
One Sample t-test
data: x
t = 35.947, df = 9, p-value = 4.938e-11
alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 63.1585 71.6415 sample estimates: mean of x 67.4
平均脉搏点估计为 67.4 ,95%区间估计为 63.1585 71.6415 。
> t.test(x,alternative=\做单样本正态分布单侧区间估计
One Sample t-test data: x
t = -2.4534, df = 9, p-value = 0.01828
alternative hypothesis: true mean is less than 72 95 percent confidence interval: -Inf 70.83705 sample estimates: mean of x 67.4
p值小于0.05,拒绝原假设,平均脉搏低于常人。
要点:t.test()函数的用法。本例为单样本;可做双边和单侧检验。
Ex4.6
> x<-c(140,137,136,140,145,148,140,135,144,141);x [1] 140 137 136 140 145 148 140 135 144 141
> y<-c(135,118,115,140,128,131,130,115,131,125);y [1] 135 118 115 140 128 131 130 115 131 125 > t.test(x,y,var.equal=TRUE)
Two Sample t-test
相关推荐: