更新于 

数据结构

所有R语言对象都有两个内在属性,类型和长度。类型是对象元素的基本种类,共有四种:

  1. 数值型——整型、单精度实型、双精度实型

  2. 字符型

  3. 复数型

  4. 逻辑型(FALSE、TRUE或NA)

常用mode()class()length得到类型和长度

注:mode:表示对象在内存中的存储类型

基本数据类型’atomic’ mode:

numeric(Integer/double), complex, character和logical

递归的对象(recursive object):

‘list’ 或 ‘function’

class:是一种抽象类型,或者理解为一种数据结构

他主要是用来给泛型函数(参考java中泛型的概念)识别参数用。

对象

对象有7种:向量、因子、数组、矩阵(特殊的数组)、数据框、时间序列(ts)、列表

基本运算中:xor()异或函数:a⊕b = (¬a ∧ b) ∨ (a ∧¬b) 逻辑运算

浏览对象的信息

ls()返回环境中的对象名称,参数pattern指定含有某个字母,^指定首字母:ls(pat="^m")

ls.str() 显示内存中所有对象的详细情况,max.level=-1避免结果过长

向量

seq(from = , to = , by = ((to - from)/(length.out - 1)),length.out = NULL)

1
2
seq(from = 2, to = 5, by = 0.5)
seq(from = 2, to = 5, length.out = 7)

rep(x, times = 1, length.out = NA, each = 1)

times x重复的次数;each x中每一个元素重复的次数

1
2
3
4
rep(2:5, times = 2)
rep(2:5, each = 2)
rep(2:5, times = 2, each = 2)
rep(2:5, c(2,3,4,5)) # 2:5 中的每个元素按c(2,3,4,5)重复

sequence()

1
2
3
sequence(5:10)
sequence(10:5)
sequence(c(3,2,4))

字符

paste (..., sep = " ", collapse = NULL)

1
2
paste(c("X", "Y"), 1:10, sep = "")
paste(c("X", "Y", "Z"), 1:5, sep = "")

因子

factor(x = character(), levels, labels = levels, ordered = is.ordered(x) )

1
2
3
4
5
factor(x = LETTERS[1:3], levels = c("C", "A", "B"))
factor(x = LETTERS[1:3], labels = 1:3)
factor(x = LETTERS[1:3], labels = c(3, 2, 1))
factor(x = LETTERS[1:3],
levels = c("C", "A", "B"), labels = c(3, 2, 1))

gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE)

1
2
gl(2, 3, labels = seq(2))
gl(2, 3, length = 10, labels = seq(3))

数组和矩阵

  1. array(data, dim, dimnames)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
column.names <- paste0("COL", seq(3))
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(row.names,
column.names, matrix.names))

# Print the third row of the second matrix of the array.
print(result[3,,2])

# Print the element in the 1st row and 3rd column of the 1st matrix.
print(result[1,3,1])

# Print the 2nd Matrix.
print(result[,,2])
  1. 对矩阵运算 apply()sweep函数
1
2
3
4
5
6
7
8
9
10
11
A <- matrix(seq(9), ncol=3)
B <- matrix(4:15, ncol=3)
C <- matrix(rep(c(8,9,10),2), ncol=2)
# Create a list of matrices
MyList <- list(A,B,C)

# Extract the 2nd column from `MyList` with the selection operator `[` with `lapply()`
lapply(MyList,"[", , 2)

# Extract the 1st row from `MyList`
lapply(MyList,"[", 1, )
  1. 数据切片sweep
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# NOT RUN {
require(stats) # for median
med.att <- apply(attitude, 2, median)
sweep(data.matrix(attitude), 2, med.att) # subtract the column medians

## More sweeping:
A <- array(1:24, dim = 4:2)
## no warnings in normal use
sweep(A, 1, 5);sweep(A, 2, 5)
(A.min <- apply(A, 1, min)) # == 1:4
sweep(A, 1, A.min)
sweep(A, 1:2, apply(A, 1:2, median))

## warnings when mismatch
sweep(A, 1, 1:3) # STATS does not recycle
sweep(A, 1, 6:1) # STATS is longer

## exact recycling:
sweep(A, 1, 1:2) # no warning
sweep(A, 1, as.array(1:2)) # warning
# }

数据框

data.fram()
注意函数rownames()colnames()attach()with

1
2
3
4
5
attach(Puromycin)
xtabs(~state + conc, data = Puromycin)
summary(Puromycin)
pairs(Puromycin, panel = panel.smooth)

列表

时间序列

ts(data = NA, start = 1, end = numeric(), frequency = 1,deltat = 1, ts.eps = getOption("ts.eps"), class = , names = )

1
2
3
4
5
6
A <- ts(1:10, start = 1959)
ts(1:47, frequency = 12, start = c(1959, 2))
ts(1:47, frequency = 4, start = c(1959, 2))

ts(matrix(rpois(36, 5), 12, 3), start = c(1961,1),frequency = 12)
ts(matrix(rpois(36, 5)), start = c(1961,1),frequency = 12)

常用的统计函数

  1. mad() 中位绝对离差

MAD=median(∣Xi−median(X)∣) 鲁棒性

具体可参见:https://blog.csdn.net/horses/article/details/78749485

  1. 中位数、方差、标准差、范围、四分位数极差、分位数
1
2
3
4
5
6
7
x <- seq(10)
sd(x)
var(x)
sd(x)
range(x)
IQR(x)
quantile(x)
  1. 累乘、累和(结果)
1
2
3
x <- seq(5)
prod(x)
sum(x)
  1. 累和、累积、累小、累大(每一个)
1
2
3
4
5
6
x <- seq(5)
y <- c(3, 5, 2, 6, 1, 7)
cumsum(x)
cumprod(x)
cummax(y)
cummin(y)
  1. 排序 rev()sort()order()rank()
1
2
3
4
5
6
7
8
(x <- sample(seq(20),5,replace = F))
rev(x) # 逆序
sort(x) # 按从小到大顺序
rank(x) # 排序后的顺序
order(x) # 排序后所放元素在之前的位置

x[order(x)]
sort(x)[rank(x)]

一个技巧

这是一个乱入的技巧,记得使用R语言做机器学习的时候用到的。

1
2
3
?substitute
deparse(substitute(series))
?deparse #将表达式转换为字符串