首页 \ 问答 \ 根据唯一列值动态创建多个子集(Dynamically create multiple subsets based on unique column values)

根据唯一列值动态创建多个子集(Dynamically create multiple subsets based on unique column values)

我有一个时间戳列的数据,如下所示

   v1 v2      v3                       v4  v5
   1  apple   2/20/2015  12:09:19 AM  100  98 
   2  pear    2/19/2015  12:09:16 AM   98  97
   3  apple   2/19/2015  12:09:17 AM   NA  80
   4  apple   2/17/2015  12:09:11 AM   78  75
   5  pear    2/20/2015  12:09:12 AM   50  62
   6  cherry  2/21/2015  12:09:13 AM   75  75
   7  apple   2/20/2015  12:09:14 AM   75  75

我想确定每天是否每种水果类型都有一个条目。 文件大小和水果种类数量都很大。

首先,对于每种水果类型,我都希望动态返回子集,例如苹果

   v1 v2      v3                       v4  v5
   1  apple   2/20/2015  12:09:15 AM  100  98 
   3  apple   2/19/2015  12:09:15 AM   NA  80
   4  apple   2/17/2015  12:09:15 AM   78  75
   7  apple   2/20/2015  12:09:14 AM   75  75

然后,对于每种水果类型,我期望计算是否有任何条目在一天内发生(例如,是或否,或者如下的0或1),例如对于苹果

   v2      v3          sign
   apple   2/17/2015   1
   apple   2/18/2015   0
   apple   2/19/2015   1
   apple   2/20/2015   1 
   apple   2/20/2015   1

我对r很陌生,任何指导都很有帮助。 我目前正在使用独特的(df $ v2),但在哈希卡或指定命名。


I have data with a timestamp column as shown here

   v1 v2      v3                       v4  v5
   1  apple   2/20/2015  12:09:19 AM  100  98 
   2  pear    2/19/2015  12:09:16 AM   98  97
   3  apple   2/19/2015  12:09:17 AM   NA  80
   4  apple   2/17/2015  12:09:11 AM   78  75
   5  pear    2/20/2015  12:09:12 AM   50  62
   6  cherry  2/21/2015  12:09:13 AM   75  75
   7  apple   2/20/2015  12:09:14 AM   75  75

I want to determine if an entry occurred for each fruit type in each day. Both file-size and number of fruit types are large.

First for each fruit type I will want to dynamically return the subset e.g. for apple

   v1 v2      v3                       v4  v5
   1  apple   2/20/2015  12:09:15 AM  100  98 
   3  apple   2/19/2015  12:09:15 AM   NA  80
   4  apple   2/17/2015  12:09:15 AM   78  75
   7  apple   2/20/2015  12:09:14 AM   75  75

Then for each fruit type, I am looking to count if any entry occurred in a day (e.g. yes or no or 0 or 1 as below) e.g. for apple

   v2      v3          sign
   apple   2/17/2015   1
   apple   2/18/2015   0
   apple   2/19/2015   1
   apple   2/20/2015   1 
   apple   2/20/2015   1

I am new to r and any guidance is helpful. I am currently using unique(df$v2) but getting stuck on hash or assign naming.

更新时间:2023-03-21 14:03

最满意答案

要返回子集

ap <- subset(df, v2 == "apple")

然后,我认为,下面的内容将为您提供您想要的苹果。 首先,重新编码v3成为日期。

d$v3 <- as.Date(d$v3, format = "%m/%d/%y")

然后在您想要的范围内创建一个日期序列作为数据框,然后合并它,并将所有日期的符号初始设置为0。

dates <- data.frame(v3 = seq.Date(
                     from = as.Date("2/17/15", format = "%m/%d/%y"), 
                     to = as.Date("2/21/15", format = "%m/%d/%y"),
                     by = "days"),
                sign = 0)

ap <- merge(ap, dates, all = TRUE, by = "v3")

最后,当存在有效数据时,重新编码为1

ap$sign <- ifelse(!is.na(ap$v4)|!is.na(ap$v5), 1, ap$sign)
ap
          v3    v2  v4 v5 sign
 1 2015-02-17 apple  78 75    1
 2 2015-02-18  <NA>  NA NA    0
 3 2015-02-19 apple  NA 80    1
 4 2015-02-20 apple 100 98    1
 5 2015-02-20 apple  75 75    1
 6 2015-02-21  <NA>  NA NA    0

您可以通过首先分割数据框架,然后基本循环遍历列表来完成所有步骤。

splt <- split(d, d$v2)
splt <- lapply(seq_along(splt), function(i) merge(splt[[i]], dates, by = "v3", all = TRUE))
lapply(splt, function(x) {
    x$sign <- ifelse(!is.na(x$v4)|!is.na(x$v5), 1, x$sign)
x
})

[[1]]
          v3    v2  v4 v5 sign
1 2015-02-17 apple  78 75    1
2 2015-02-18  <NA>  NA NA    0
3 2015-02-19 apple  NA 80    1
4 2015-02-20 apple 100 98    1
5 2015-02-20 apple  75 75    1
6 2015-02-21  <NA>  NA NA    0

[[2]]
          v3     v2 v4 v5 sign
1 2015-02-17   <NA> NA NA    0
2 2015-02-18   <NA> NA NA    0
3 2015-02-19   <NA> NA NA    0
4 2015-02-20   <NA> NA NA    0
5 2015-02-21 cherry 75 75    1

[[3]]
          v3   v2 v4 v5 sign
1 2015-02-17 <NA> NA NA    0
2 2015-02-18 <NA> NA NA    0
3 2015-02-19 pear 98 97    1
4 2015-02-20 pear 50 62    1
5 2015-02-21 <NA> NA NA    0

编辑

我也应该提到,如果你想要的是水果每天的参赛次数,那么更简单的方法是使用dplyr ,如下所示:

d %>% 
    group_by(v2, v3) %>% 
    summarize(n = n())

      v2         v3     n
   <chr>     <date> <int>
1  apple 2015-02-17     1
2  apple 2015-02-19     1
3  apple 2015-02-20     2
4 cherry 2015-02-21     1
5   pear 2015-02-19     1
6   pear 2015-02-20     1

但是,这看起来不像你想要的那样,这就是为什么我采取了我所采取的方法。


I ended up using xtabs as below.

xtabs(~v3+v2,data=df)

This provided the count per v2 item, I then substituted values greater than 0 to 1.

相关问答

更多

相关文章

更多

最新问答

更多
  • 在javascript中创建类以创建对象并在Java中创建类和对象之间的区别(Difference between creating a class in javascript to create an object and creating an class and object in Java)
  • Facebook API:将身份验证详细信息从Javascript SDK发送到PHP SDK(Facebook API: Send authentication detail from Javascript SDK to PHP SDK)
  • 如何停止队列动画jquery?(How can I stop queue animation jquery?)
  • 使用C#的井字游戏中的人工智能(Artificial Intelligence in Tic-Tac-Toe using C#)
  • 多少流量可以共享虚拟主机(对于Python Django站点)支持?(How Much Traffic Can Shared Web Hosting (for a Python Django site) support?)
  • 带有CIFilters的CAShapeLayer(CAShapeLayer with CIFilters)
  • 如何在Angular 2中读取JSON #text(How to read in Angular 2 the JSON #text)
  • 如何在xml中读取自闭标签的属性?(How to read self closing tag's attribute in xml?)
  • 无法使用http put将图像上传到亚马逊S3(Cannot upload image to amazon s3 using http put)
  • 文件结束无限循环(end of file infinite while-loop)
  • 在cpp的模板(template in cpp)
  • 在构建库时,clang和clang ++有什么区别?(What's the difference between clang and clang++ when building a library?)
  • ng类中的表达式(expression inside ng-class)
  • 在PHP中获取随机布尔值true / false(Get random boolean true/false in PHP)
  • 管道的高效分块用于严格的字节串(Efficient chunking of conduit for strict bytestring)
  • Python ternary_operator(如果其他标志做了其他操作,则执行其他操作)(Python ternary_operator (do-somthing if flag else do-another))
  • Sencha Touch面具发布(Sencha Touch mask ondisclosure)
  • 验证脚本上的通知[重复](Notices on validation script [duplicate])
  • 朋友功能(friend function)
  • 基于角坐标平移和变换平面几何(Translate and transform plane geometry based on corner coordinates)
  • Rails:'如果在本地运行'条件javascript标记包括(Rails: 'if running locally' conditional javascript tag include)
  • 解压文件(Unzipping files)
  • 使用ui-router以角度加载变量状态(loading in variable states with ui-router in angular)
  • 创建Azure云服务需要多长时间?(how long does it take to create an Azure Cloud Service? How to view log information?)
  • 指向整数的指针数组(Array of pointers to integers)
  • Laravel服务提供商没有看到我的包的主要类(Laravel service provider does not see the main class of my package)
  • 这个关于VSS / RSS / PSS / USS的解释是否准确?(Is this explanation about VSS/RSS/PSS/USS accurate?)
  • 在Django-Admin中通过row-id排序显示项目(Ordering the display items by row-id in Django-Admin)
  • 如何使用cythonize启用`--embed`?(How to enable `--embed` with cythonize?)
  • 用于将文本多行设置的Excel脚本(Excel script for ereasing text multiple rows)