首页 \ 问答 \ 检查mongodB中是否存在元素(Check if an element exist in mongodB)

检查mongodB中是否存在元素(Check if an element exist in mongodB)

如果数组以dB为单位,我想检查if语句。 到目前为止,我正在检查光标中的上述语句,但我猜它会降低查询速度。 我的代码到现在为止:

编辑:lines = [line.rstrip()for open in line(input_file)]

print len(lines)
row_no = len(lines)
col_no = len(lines)
matrix = sparse.lil_matrix((len(lines), len(lines)))

no_row  = 0
counter = 0
for item in lines:
    # find from database those items which their id exists in lines list and contain a follower_list 
    for cursor in collection.find({"_id.uid": int(item)}):
        if cursor['list_followers'] is None:
                continue
        else:               
            id = cursor['_id']['uid']
            counter+=1
            print counter
            print id
            name = cursor['screenname']
            # text.write('%s \n' %name)
            followers = cursor['list_followers']    
            print len(followers)
            for follower in followers:
                try:
                    if (follower in lines) and (len(followers)>0):
                        matrix[no_row, lines.index(follower)] = 1
                        print no_row, " ", lines.index(follower), " ", matrix[no_row, lines.index(follower)]
                except ValueError:
                    continue
            no_row+=1
            print no_row

scipy.io.mmwrite(output_file, matrix, field='integer')  

最后我发现延迟是由于sparse.lil_matrix的创建造成的


I want to check in an if statement if an array exists in dB. So far, I am checking the above statement in the cursor, but I am guessing that it slows down the query speed. My code until now is:

EDIT: lines = [line.rstrip() for line in open(input_file)]

print len(lines)
row_no = len(lines)
col_no = len(lines)
matrix = sparse.lil_matrix((len(lines), len(lines)))

no_row  = 0
counter = 0
for item in lines:
    # find from database those items which their id exists in lines list and contain a follower_list 
    for cursor in collection.find({"_id.uid": int(item)}):
        if cursor['list_followers'] is None:
                continue
        else:               
            id = cursor['_id']['uid']
            counter+=1
            print counter
            print id
            name = cursor['screenname']
            # text.write('%s \n' %name)
            followers = cursor['list_followers']    
            print len(followers)
            for follower in followers:
                try:
                    if (follower in lines) and (len(followers)>0):
                        matrix[no_row, lines.index(follower)] = 1
                        print no_row, " ", lines.index(follower), " ", matrix[no_row, lines.index(follower)]
                except ValueError:
                    continue
            no_row+=1
            print no_row

scipy.io.mmwrite(output_file, matrix, field='integer')  

Finally I discovered that the delay was due to the creation of the sparse.lil_matrix


原文:https://stackoverflow.com/questions/24670670
更新时间:2020-03-26 12:03

最满意答案

我能想到的最接近的事情是实现稀疏索引并以稍微不同的方式查询。 我将构建一个示例来演示:

{ "a" : 1 }
{ "a" : 1, "b" : [ ] }
{ "a" : 1 }
{ "a" : 1, "b" : [ ] }
{ "b" : [ 1, 2, 3 ] }

基本上你似乎要问的是将最后一个文档作为匹配而不扫描所有内容。 这是不同查询和稀疏索引有帮助的地方。 首先是查询:

db.collection.find({ "b.0": { "$exists": 1 } })

只返回1项,因为它是现有数组,其中第一个索引位置有一些内容。 现在指数:

db.collection.ensureIndex({ "b": 1 },{ "sparse": true })

但由于查询性质,我们必须.hint()这个:

db.collection.find({ "b.0": { "$exists": 1 } }).hint({ "b": 1 }).explain()

得到1个文档,只考虑实际有数组的3个文档。


The nearest thing I can think of is implement a sparse index and query a little differently. I'll construct a sample to demonstrate:

{ "a" : 1 }
{ "a" : 1, "b" : [ ] }
{ "a" : 1 }
{ "a" : 1, "b" : [ ] }
{ "b" : [ 1, 2, 3 ] }

Essentially what you seem to be asking is to just get that last document as a match without scanning everything. This is where a different query and a sparse index helps. First the query:

db.collection.find({ "b.0": { "$exists": 1 } })

Only returns 1 item as that is the existing array with some content at it's first index position. Now the index:

db.collection.ensureIndex({ "b": 1 },{ "sparse": true })

But due to the query nature we have to .hint() this:

db.collection.find({ "b.0": { "$exists": 1 } }).hint({ "b": 1 }).explain()

That gets the 1 document and only considers the 3 documents that actually have an array.

2014-07-10

相关文章

更多

最新问答

更多
  • css在元素之前中断列而不破坏包装器(css break column before element without breaking the wrapper)
  • 如何在Xamarin共享项目中使用自定义渲染器(How to use Custom Renderer in Xamarin Shared Project)
  • 如何为特定表中的特定字段设置唯一?(How to set unique for specific field from specific table?)
  • Google SDK iOS - sign()方法完成处理程序(Google SDK iOS - sign() method completion handler)
  • 在具有接口{}值的地图上实现String()(Implement String() on a map with interface{} values)
  • 检查数据库中是否已存在用户名(Check if username already exist in DB)
  • 使用javascript进行ajax调用时阻止用户交互(Block user interaction while doing ajax call using javascript)
  • 什么'if(err)'在Javascript中精确测试?(What does 'if (err)' tests precisely in Javascript?)
  • jQuery mouseleave无法正常工作(jQuery mouseleave not working)
  • 寻求使用的一些说明(Seeking some clarification on use of )
  • 将数组传递给注释的语法(syntax for passing array to annotation)
  • 用于从两个日期范围之间的文件中提取数据的Shell脚本(Shell script to extract data from file between two date ranges)
  • 元素隐藏但父()没有(Element hides but parent() not)
  • 如何使用Google App Engine Java平台开发web ui(How to develop web ui with Google App Engine Java platform)
  • 对于OWL A级;(For an OWL class A; Getting all properties that A is their domain)
  • Excel VBA公式格式问题(Excel VBA Formula Format Issue)
  • ORA - 02287序列号不允许在这里(ORA - 02287 sequence number not allowed here)
  • Github拉忽略特定文件(Github Pull Ignore Specific File)
  • SQL CONVERT函数在SQL Server中工作但不在应用程序中(SQL CONVERT function working in SQL Server but not in application)
  • backbone.js适用于大型应用程序(backbone.js for large applications)
  • 防止程序关闭(Preventing program from closing)
  • 生成不带图像的heightMap(Generating a heightMap without an Image)
  • Bootstrap - 如何将包含文本的div居中?(Bootstrap - How to center div that has text inside it?)
  • Android - 片段findViewById()总是null?(Android - Fragment findViewById() always null?)
  • 确定CSS中的高度(Figuring out heights in CSS)
  • 使用__autoload包含类和使用命名空间(Use __autoload to include class and use namespace)
  • setTimeout()不允许我传递文本值[重复](setTimeout() doesn't allow me to pass text values [duplicate])
  • 在NSUserDefault中恢复值(Restoring value in NSUserDefault)
  • 知道如何将这种下沉的悬停效果添加到图像/链接吗?(Any idea how to add this sinking hover effect to an image/link?)
  • 在XIB中淡入/淡出UISegmentedControl(fade in/fade out UISegmentedControl in XIB)