博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
R-table和tapply函数
阅读量:6577 次
发布时间:2019-06-24

本文共 4141 字,大约阅读时间需要 13 分钟。

table可统计数据的频数

tapply可根据因子、向量和要计算的函数计算

 

> class<-c(1,2,3,2,1,2,1,3)

> class

[1] 1 2 3

> c(81,65,72,88,73,91,56,90)->student

> class
[1] 1 2 3 2 1 2 1 3

 

>factor(class)->class

> tapply(student,class,mean)
       1        2        3
70.00000 81.33333 81.00000
> tapply(student,class,min)
 1  2  3
56 65 72

 

> tapply(student,class,max)
 1  2  3
81 91 90
> table(class)
class
1 2 3
3 3 2
>  

 

 

 

 

 

Apply a Function Over a Ragged Array

Description

Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors.

Usage

tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

Arguments

 

X

an atomic object, typically a vector.

INDEX

list of factors, each of same length as X. The elements are coerced to factors by as.factor.

FUN

the function to be applied, or NULL. In the case of functions like +%*%, etc., the function name must be backquoted or quoted. If FUN is NULL, tapply returns a vector which can be used to subscript the multi-way array tapply normally produces.

...

optional arguments to FUN: the Note section.

simplify

If FALSEtapply always returns an array of mode "list". If TRUE (the default), then if FUN always returns a scalar, tapply returns an array with the mode of the scalar.

Value

If FUN is not NULL, it is passed to match.fun, and hence it can be a function or a symbol or character string naming a function.

When FUN is present, tapply calls FUN for each cell that has any data in it. If FUN returns a single atomic value for each such cell (e.g., functions mean or var) and when simplify is TRUE,tapply returns a multi-way array containing the values, and NA for the empty cells. The array has the same number of dimensions as INDEX has components; the number of levels in a dimension is the number of levels (nlevels()) in the corresponding component of INDEX. Note that if the return value has a class (e.g. an object of class "Date") the class is discarded.

Note that contrary to S, simplify = TRUE always returns an array, possibly 1-dimensional.

If FUN does not return a single atomic value, tapply returns an array of mode list whose components are the values of the individual calls to FUN, i.e., the result is a list with a dimattribute.

When there is an array answer, its dimnames are named by the names of INDEX and are based on the levels of the grouping factors (possibly after coercion).

For a list result, the elements corresponding to empty cells are NULL.

Note

Optional arguments to FUN supplied by the ... argument are not divided into cells. It is therefore inappropriate for FUN to expect additional arguments with the same length as X.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

the convenience functions by and aggregate (using tapply); applylapply with its versions sapplyand mapply.

Examples

require(stats) groups <- as.factor(rbinom(32, n = 5, prob = 0.4)) tapply(groups, groups, length) #- is almost the same as table(groups) ## contingency table from data.frame : array with named dimnames tapply(warpbreaks$breaks, warpbreaks[,-1], sum) tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum) n <- 17; fac <- factor(rep(1:3, length = n), levels = 1:5) table(fac) tapply(1:n, fac, sum) tapply(1:n, fac, sum, simplify = FALSE) tapply(1:n, fac, range) tapply(1:n, fac, quantile) ## example of ... argument: find quarterly means tapply(presidents, cycle(presidents), mean, na.rm = TRUE) ind <- list(c(1, 2, 2), c("A", "A", "B")) table(ind) tapply(1:3, ind) #-> the split vector tapply(1:3, ind, sum)
 
 
问题:

有数万个数据,两列数据 一列为名称(A列 ) 一列为值(x列),一个名称可对应多个值,一个值可能有多个名称,具体问题如下所示

A1 x1
A2 x2
A3 x3
A4 x4
A1 x5
A2 x3
A5 x6
A1 x7
想得到的结果,将A列名称唯一化,出现一个值对应多个值的列表,且想批量处理
A1 x1 x5 x7
A2 x2 x3
A3 x3
A4 x4
A5 x6

解决方案1:perl

use strict;

use warnings;
my %hash;
open OUT, "> lines.txt" or die"$!";
while () {
chomp;
my ($line1,$line2) = split/\s+/;
push @{$hash{$line1}},$line2;
}
foreach my $key(sort keys %hash) {
print OUT "$key\t@{$hash{$key}}\n";
}
close OUT;
__DATA__
A1 x1
A2 x2
A3 x3
A4 x4
A1 x5
A2 x3
A5 x6

A1 x7 
 
解决方案2:R
d = read.table("data.txt")
tapply(d[,2], d[,1], print)

 

转载地址:http://dqyno.baihongyu.com/

你可能感兴趣的文章
手把手教你如何新建scrapy爬虫框架的第一个项目(下)
查看>>
前端基础15:JS作用域基础
查看>>
Linux系统相关命令
查看>>
BATJ面试必会之 Spring 篇(一)
查看>>
表驱动法
查看>>
什么是企业内训
查看>>
firefox无法显示java插件plugin
查看>>
H3C设备之OSPF DR选举
查看>>
View控件Edittext属性
查看>>
List grantee right in oracle
查看>>
骨牌铺方格 ——解题报告
查看>>
Activity生命周期
查看>>
通过VBS编写自动输入账号和密码、自动登录程序的脚本
查看>>
MTK APSoC SDK MT7621编译固件的快速开始
查看>>
深度解析Istio系列之安全模块篇
查看>>
Linux 系统 审计
查看>>
JS -------------------设置弹出框位置屏幕的中间
查看>>
性能测试 vbs使用(一)
查看>>
46Exchange 2010升级到Exchange 2013-移除总部CAS2010
查看>>
1.2 linux哲学思想
查看>>