首页 \ 问答 \ 在Google BigQuery Standard SQL中放弃包含非连续数字的行(Discard rows with non-consecutive numbers in Google BigQuery Standard SQL)

在Google BigQuery Standard SQL中放弃包含非连续数字的行(Discard rows with non-consecutive numbers in Google BigQuery Standard SQL)

我的数据集中包含重复数字序列的字段(为简洁起见,1 - 5),以及一些序列数量不合的行:

1
2
3
5 -- out of sequence, need to discard
4
5
1 -- sequence starts over
2
3
4
...

我如何丢弃无序的行? 谢谢!

更新:有其他列来指定排序。

UPDATE2:测试数据集:

WITH t AS (
  SELECT * FROM
  UNNEST([
    STRUCT(1 AS id, 1 AS n),
    (2, 2),
    (3, 3),
    (4, 5),
    (5, 4),
    (6, 5),
    (7, 1),
    (8, 2),
    (9, 3),
    (10, 4)
  ])
)

UPDATE3:可能有很多数字(从相同的1 - 5范围内)。 序列总是从1开始,并且所有数字都存在,除了最后一轮,可能不完整(更早结束,请参阅测试集)。 无序数字就像需要删除的“噪音”。


I have dataset with field containing repeating numbers sequence (1 - 5 for brevity), and some rows with the number out of sequence:

1
2
3
5 -- out of sequence, need to discard
4
5
1 -- sequence starts over
2
3
4
...

How do I discard rows that are out of sequence? Thanks!

UPDATE: There is other column to specify the ordering.

UPDATE2: Dataset for testing:

WITH t AS (
  SELECT * FROM
  UNNEST([
    STRUCT(1 AS id, 1 AS n),
    (2, 2),
    (3, 3),
    (4, 5),
    (5, 4),
    (6, 5),
    (7, 1),
    (8, 2),
    (9, 3),
    (10, 4)
  ])
)

UPDATE3: There can be many numbers out of order (from the same 1 - 5 range). The sequence always starts with 1, and all numbers are present, except the last round, which can be incomplete (ending earlier, see the test set). The out-of-order numbers are like a "noise" that needs to be removed.


原文:https://stackoverflow.com/questions/48866149
更新时间:2021-07-19 20:07

最满意答案

下面是BigQuery标准SQL和使用JS UDF
Tt返回从1开始的所有找到的序列和以下连续的数字

#standardSQL
CREATE TEMPORARY FUNCTION extract_sequence(arr ARRAY<STRUCT<id INT64, n INT64>>) 
RETURNS ARRAY<STRUCT<id INT64, n INT64>>
LANGUAGE js AS """
  target = [1,2,3,4,5];
  var result = [];
  j = 0;
  for (i = 0; i < arr.length; i++) { 
    if (arr[i].n == target[j]) {
      x = [];
      x.id = arr[i].id;
      x.n = arr[i].n;
      result.push(x);
      j++
    } 
  }
  return result;
""";
WITH t AS (
  SELECT * 
  FROM UNNEST([
    STRUCT(1 AS id, 1 AS n), (2, 2), (3, 3), (4, 5), (5, 4), (6, 5), (7, 1), (8, 2), (9, 3), (10, 4)
  ])
)
SELECT elem.id, elem.n, grp
FROM (
  SELECT grp, extract_sequence(ARRAY_AGG(STRUCT(id, n) ORDER BY id)) arr
  FROM (
    SELECT id, n, COUNTIF(n = 1) OVER(ORDER BY id) grp
    FROM t
  )
  GROUP BY grp
), UNNEST(arr) elem
ORDER BY id  

结果如预期:

Row id  n   grp  
1   1   1   1    
2   2   2   1    
3   3   3   1    
4   5   4   1    
5   6   5   1    
6   7   1   2    
7   8   2   2    
8   9   3   2    
9   10  4   2      

希望你能适应你的具体情况


Below is for BigQuery Standard SQL and with use of JS UDF
Tt returns all found sequences started with 1 and with following consecutive numbers

#standardSQL
CREATE TEMPORARY FUNCTION extract_sequence(arr ARRAY<STRUCT<id INT64, n INT64>>) 
RETURNS ARRAY<STRUCT<id INT64, n INT64>>
LANGUAGE js AS """
  target = [1,2,3,4,5];
  var result = [];
  j = 0;
  for (i = 0; i < arr.length; i++) { 
    if (arr[i].n == target[j]) {
      x = [];
      x.id = arr[i].id;
      x.n = arr[i].n;
      result.push(x);
      j++
    } 
  }
  return result;
""";
WITH t AS (
  SELECT * 
  FROM UNNEST([
    STRUCT(1 AS id, 1 AS n), (2, 2), (3, 3), (4, 5), (5, 4), (6, 5), (7, 1), (8, 2), (9, 3), (10, 4)
  ])
)
SELECT elem.id, elem.n, grp
FROM (
  SELECT grp, extract_sequence(ARRAY_AGG(STRUCT(id, n) ORDER BY id)) arr
  FROM (
    SELECT id, n, COUNTIF(n = 1) OVER(ORDER BY id) grp
    FROM t
  )
  GROUP BY grp
), UNNEST(arr) elem
ORDER BY id  

with result as expected:

Row id  n   grp  
1   1   1   1    
2   2   2   1    
3   3   3   1    
4   5   4   1    
5   6   5   1    
6   7   1   2    
7   8   2   2    
8   9   3   2    
9   10  4   2      

Hope you will adjust to you specific case

相关问答

更多

BigQuery:带有标准SQL的外部UDF(BigQuery: external UDFs with standard SQL)

确保不要在“UDF编辑器”面板中输入输入。 它应该与您的查询的其余部分一起使用。 有关示例,请参阅迁移指南中的主题 : #standardSQL -- Computes the harmonic mean of the elements in 'arr'. -- The harmonic mean of x_1, x_2, ..., x_n can be expressed as: -- n / ((1 / x_1) + (1 / x_2) + ... + (1 / x_n)) CREATE ...

加入Google BigQuery行(Join rows Google BigQuery)

我建议你使用GROUP_CONCAT函数: SELECT name, GROUP_CONCAT(columnNameContainingTheAttribute) FROM yourTable GROUP BY name 您可以在此处详细了解GROUP_CONCAT功能: https : //developers.google.com/bigquery/query-reference?hl = FR #aggfunctions I suggest that you use the GROUP_C ...

Quantiles在BigQuery标准SQL中起作用(Quantiles function in BigQuery Standard SQL)

您正在寻找APPROX_QUANTILES函数 :)文档中的一个示例是: #standardSQL SELECT APPROX_QUANTILES(x, 2) AS approx_quantiles FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x; +------------------+ | approx_quantiles | +------------------+ | [1, 5, 10] | +-- ...

在Google BigQuery Standard SQL中放弃包含非连续数字的行(Discard rows with non-consecutive numbers in Google BigQuery Standard SQL)

下面是BigQuery标准SQL和使用JS UDF Tt返回从1开始的所有找到的序列和以下连续的数字 #standardSQL CREATE TEMPORARY FUNCTION extract_sequence(arr ARRAY<STRUCT<id INT64, n INT64>>) RETURNS ARRAY<STRUCT<id INT64, n INT64>> LANGUAGE js AS """ target = [1,2,3,4,5]; var result = []; ...

BigQuery:使用标准SQL查询多个数据集和表(BigQuery: Querying multiple datasets and tables using Standard SQL)

如何使用一个简单的UNION ,并使用SELECT环绕它(我使用新的标准SQL选项进行了测试,并按预期工作): SELECT SUM(foo) FROM ( SELECT COUNT(*) AS foo FROM <YOUR_DATASET_1>.<YOUR_TABLE_1> UNION ALL SELECT COUNT(*) AS foo FROM <YOUR_DATASET_1>.<YOUR_TABLE_1>) How about us ...

BigQuery - 相当于标准SQL中的GROUP EACH(BigQuery - equivalent of GROUP EACH in standard SQL)

不。 :O( 标准SQL中没有这样的等价物。 ... EACH是BQ引擎(Legacy SQL)的一个提示,用于更好地处理各自的命令 - 已经在标准SQL中不包含任何提示 您的选择是调整/优化您的查询 Nope. :o( There is no such equivalent in Standard SQL. ... EACH was a hint for BQ Engine (Legacy SQL) to more optimally process respective command - w ...

如何在BigQuery标准SQL中生成系列(How to generate series in BigQuery Standard SQL)

在标准SQL中尝试GENERATE_ARRAY : SELECT num FROM UNNEST(GENERATE_ARRAY(51, 650)) AS num; BigQuery Standard SQL SELECT 50 + ROW_NUMBER() OVER() AS num FROM UNNEST((SELECT SPLIT(FORMAT("%600s", ""),'') AS h FROM (SELECT NULL))) AS pos ORDER BY num BigQuery L ...

Google bigquery中标准SQL中的日期函数(Date function in standard SQL in Google bigquery)

您可以在此处找到BigQuery Standard SQL的所有日期函数: https : //cloud.google.com/bigquery/sql-reference/functions-and-operators#date-functions EXTRACT具体返回对应于指定日期部分的值。 该部分必须是以下之一: DAYOFWEEK (Returns 1-7, where 1=Sunday ... 7=Saturday) DAY DAYOFYEAR MONTH QUARTER (Retu ...

表范围与BigQuery的标准SQL(Table ranges with BigQuery's standard SQL)

最新版本的BigQuery支持与标准SQL相当的表通配符。 该文档可在此处获得: https : //cloud.google.com/bigquery/docs/wildcard-tables 。 另请查看这篇文章: BigQuery中是否有与标准SQL相同的表通配符函数? The latest version of BigQuery supports an equivalent of table wildcards with Standard SQL. The documentation is ...

App Script BigQuery标准SQL插入或更新语句(App Script BigQuery Standard SQL insert or update statement)

您需要将useLegacySql标志/参数设置为false,以指示您要使用标准SQL,如下所示: var job = { configuration: { query: { query: 'INSERT INTO MyDataSet.MyFooBarTable (Id, Foo, Date) VALUES (1, \'bar\', current_Date);', useLegacySql: false } }}; 我在自己的GAS中对此进行了测试,并按预期工作。 Y ...

相关文章

更多

最新问答

更多
  • 未捕获的不变违规:addComponentAsRefTo(...):只有ReactOwner可以有refs(Uncaught Invariant Violation: addComponentAsRefTo(…): Only a ReactOwner can have refs)
  • 通过嵌入式YouTube / Flash视频避免滚轮劫持(Avoid scroll-wheel hijack by embedded youtube / flash video)
  • 如何在多用户环境中处理表单编辑?(how to handle form editing in a Multi-user environment?)
  • PHP关闭MySQL连接(PHP close MySQL connection)
  • 我要微信下载
  • Solr dataimport处理程序查询(Solr dataimport handler query)
  • GIT是什么车.
  • Powershell错误处理和空结果(Powershell Error Handling and Null results)
  • Laravel按第二级值排序第一级集合(Laravel sorting 1st level collection by 2nd level value)
  • logback.xml到logback属性文件(logback.xml to logback property file)
  • 使用fgets和strcat时出现问题(Problems when using fgets and strcat)
  • 绘制熊猫数据框两栏(plot pandas dataframe two columns from)
  • 超全球$ _GET中是否保证键值对的顺序?(Is the order of key-value pairs guaranteed in the superglobal $_GET?)
  • C# - 检查两组索引值(C# - Checking Index values of two sets)
  • 模板中的URL输出为空(URL output in template is empty)
  • 关于redhat linux 9.0的三个镜像文件
  • 为什么要在go中分配对struct的引用?(Why assign a reference to a struct in go?)
  • Datagrid点击事件(Datagrid click event)
  • window.location.href在输入press时返回undefined(window.location.href on enter press returns undefined)
  • Javascript Array.remove()作者:John Resig - 为什么要在for-in语句中枚举?(Javascript Array.remove() by John Resig - why does it enumerate in for-in statement?)
  • sudo gem安装cocoapods --pre错误(sudo gem install cocoapods --pre error)
  • 加载完成后,在所有已加载的HTML文件中执行某些操作(After Load Complete, Do Something in All Loaded HTML Files)
  • 如何为wpf datagrid行应用样式(How to have style applied for wpf datagrid row)
  • 在debian上升级内核[关闭](Upgrade kernel on debian [closed])
  • 无法使用JavaScript FileReader API上传二进制文件(Trouble uploading binary files using JavaScript FileReader API)
  • 工作简历怎样写啊?电脑方面的
  • 查找通过gmail API检索的环聊和聊天消息的时间戳(Find timestamp for hangout and chat messages retrieved with gmail api)
  • 如果在汇编程序中没有退出系统调用,会发生什么情况?(What happens if there is no exit system call in an assembly program?)
  • 无法从Intranet访问Wordpress门户(Wordpress portal not accessible from intranet)
  • PDFBox:禁用字体缓存或更改其位置(PDFBox: Disable Font Cache or change its location)