转储、替换和填充data.frame中缺失的值
描述
一组处理data.frames中缺失值的工具。它可以转储,替换,填充(与next或或根据缺失的值删除条目。
Usage
drop_na_dt(.data, …)
replace_na_dt(.data, …, to)
delete_na_cols(.data, prop = NULL, n = NULL)
delete_na_rows(.data, prop = NULL, n = NULL)
fill_na_dt(.data, …, direction = “down”)
shift_fill(x, direction = “down”)
Arguments
| .data | data.frame |
|---|---|
| … | Colunms to be replaced or filled. If not specified, use all columns. |
| to | What value should NA replace by?用什么值来代替缺失值 |
| prop | If proportion of NAs is larger than or equal to “prop”, would be deleted. |
| n | If number of NAs is larger than or equal to “n”, would be deleted. |
| direction | Direction in which to fill missing values. Currently either “down” (the default) or “up”. |
| x | A vector with missing values to be filled. |
detail
drop_na_dt删除特定列中带有NAs的条目。
fill_na_dt用前面(“向下”)或下面(“向上”)的观察结果填充NAs,这也被称为最后的观察结果向前推进(LOCF)和下一个观察结果向后推进(NOCB)。
delete_na_cols可以删除NA比例大于或等于“prop”或NA数量大于或等于“n”的列,delete_na_rows的工作方式类似,但是处理的是行。
shift_fill可以用缺失的值填充向量。
library(tidyfst)df <- data.table(x = c(1, 2, NA), y = c("a", NA, "b"))df %>% drop_na_dt()df %>% drop_na_dt(x)df %>% drop_na_dt(y)df %>% drop_na_dt(x,y)df %>% replace_na_dt(to = 0)df %>% replace_na_dt(x,to = 0)df %>% replace_na_dt(y,to = 0)df %>% replace_na_dt(x,y,to = 0)df %>% fill_na_dt(x)df %>% fill_na_dt() # not specified, fill all columnsdf %>% fill_na_dt(y,direction = "up")x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))xx %>% delete_na_cols() #将全部为缺失值的列删除x %>% delete_na_cols(prop = 0.75)x %>% delete_na_cols(prop = 0.5)x %>% delete_na_cols(prop = 0.24)x %>% delete_na_cols(n = 2)x %>% delete_na_rows(prop = 0.6)x %>% delete_na_rows(n = 2)# shift_filly = c("a",NA,"b",NA,"c")shift_fill(y) # equals to shift_fill(y,"down")shift_fill(y,"up")
