数据结构

所有R语言对象都有两个内在属性，类型和长度。类型是对象元素的基本种类，共有四种：

数值型——整型、单精度实型、双精度实型
字符型
复数型
逻辑型(FALSE、TRUE或NA)

常用mode()、class()、length得到类型和长度

注：mode：表示对象在内存中的存储类型

基本数据类型’atomic’ mode：

numeric（Integer/double）, complex, character和logical

递归的对象（recursive object）：

‘list’ 或 ‘function’

class：是一种抽象类型，或者理解为一种数据结构

他主要是用来给泛型函数（参考java中泛型的概念）识别参数用。

对象

对象有7种：向量、因子、数组、矩阵(特殊的数组)、数据框、时间序列(ts)、列表

基本运算中：xor()异或函数：a⊕b = (¬a ∧ b) ∨ (a ∧¬b) 逻辑运算

浏览对象的信息

ls()返回环境中的对象名称，参数pattern指定含有某个字母，^指定首字母：ls(pat="^m")

ls.str() 显示内存中所有对象的详细情况，max.level=-1避免结果过长

向量

seq(from = , to = , by = ((to - from)/(length.out - 1)),length.out = NULL)

1 2	seq(from = 2, to = 5, by = 0.5) seq(from = 2, to = 5, length.out = 7)

rep(x, times = 1, length.out = NA, each = 1)

times x重复的次数；each x中每一个元素重复的次数

rep(2:5, times = 2)
rep(2:5, each = 2)
rep(2:5, times = 2, each = 2)
rep(2:5, c(2,3,4,5))  # 2:5 中的每个元素按c(2,3,4,5)重复

sequence()

1
2
3

sequence(5:10)
sequence(10:5)
sequence(c(3,2,4))

字符

paste (..., sep = " ", collapse = NULL)

1 2	paste(c("X", "Y"), 1:10, sep = "") paste(c("X", "Y", "Z"), 1:5, sep = "")

因子

factor(x = character(), levels, labels = levels, ordered = is.ordered(x) )

factor(x = LETTERS[1:3], levels = c("C", "A", "B"))
factor(x = LETTERS[1:3], labels = 1:3)
factor(x = LETTERS[1:3], labels = c(3, 2, 1))
factor(x = LETTERS[1:3], 
levels = c("C", "A", "B"), labels  = c(3, 2, 1))

gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE)

1 2	gl(2, 3, labels = seq(2)) gl(2, 3, length = 10, labels = seq(3))

数组和矩阵

array(data, dim, dimnames)


# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
column.names <- paste0("COL", seq(3))
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(row.names,
   column.names, matrix.names))

# Print the third row of the second matrix of the array.
print(result[3,,2])

# Print the element in the 1st row and 3rd column of the 1st matrix.
print(result[1,3,1])

# Print the 2nd Matrix.
print(result[,,2])

对矩阵运算 apply()、sweep函数

A <- matrix(seq(9), ncol=3)
B <- matrix(4:15, ncol=3)
C <- matrix(rep(c(8,9,10),2), ncol=2)
# Create a list of matrices
MyList <- list(A,B,C)

# Extract the 2nd column from `MyList` with the selection operator `[` with `lapply()`
lapply(MyList,"[", , 2)

# Extract the 1st row from `MyList`
lapply(MyList,"[", 1, )

数据切片sweep

# NOT RUN {
require(stats) # for median
med.att <- apply(attitude, 2, median)
sweep(data.matrix(attitude), 2, med.att)  # subtract the column medians

## More sweeping:
A <- array(1:24, dim = 4:2)
## no warnings in normal use
sweep(A, 1, 5);sweep(A, 2, 5)
(A.min <- apply(A, 1, min))  # == 1:4
sweep(A, 1, A.min)
sweep(A, 1:2, apply(A, 1:2, median))

## warnings when mismatch
sweep(A, 1, 1:3)  # STATS does not recycle
sweep(A, 1, 6:1)  # STATS is longer

## exact recycling:
sweep(A, 1, 1:2)  # no warning
sweep(A, 1, as.array(1:2))  # warning
# }

数据框

data.fram()
注意函数rownames()、colnames()、attach()、with

attach(Puromycin)
xtabs(~state + conc, data = Puromycin)
summary(Puromycin)
pairs(Puromycin, panel = panel.smooth)

列表

略

时间序列

ts(data = NA, start = 1, end = numeric(), frequency = 1,deltat = 1, ts.eps = getOption("ts.eps"), class = , names = )

A <- ts(1:10, start = 1959)
ts(1:47, frequency = 12, start = c(1959, 2))
ts(1:47, frequency = 4, start = c(1959, 2))

ts(matrix(rpois(36, 5), 12, 3), start = c(1961,1),frequency = 12)
ts(matrix(rpois(36, 5)), start = c(1961,1),frequency = 12)

常用的统计函数

mad() 中位绝对离差

MAD=median(∣Xi−median(X)∣) 鲁棒性

具体可参见：https://blog.csdn.net/horses/article/details/78749485

中位数、方差、标准差、范围、四分位数极差、分位数

x <- seq(10)
sd(x)
var(x)
sd(x)
range(x)
IQR(x)
quantile(x)

累乘、累和（结果）

1
2
3

x <- seq(5)
prod(x)
sum(x)

累和、累积、累小、累大（每一个）

x <- seq(5)
y <- c(3, 5, 2, 6, 1, 7)
cumsum(x)
cumprod(x)
cummax(y)
cummin(y)

排序 rev()、sort()、order()、rank()

(x <- sample(seq(20),5,replace = F))
rev(x)  # 逆序
sort(x)  # 按从小到大顺序
rank(x)  # 排序后的顺序
order(x) # 排序后所放元素在之前的位置

x[order(x)]
sort(x)[rank(x)]

一个技巧

这是一个乱入的技巧，记得使用R语言做机器学习的时候用到的。

1
2
3

?substitute
deparse(substitute(series))
?deparse  #将表达式转换为字符串