reading notes for solr source code

2019-03-27 01:14|来源: 网路

solr source code
1 org.apache.solr.common 基本的类对象
2 org.apache.solr.common.params 存取参数的类,按照参数类型不同分为很多的类,
  (1) AnalysisParams 下有map,ModifiableSolrParams(LinkedHashMap),RequiredSolrParams,SolrQuery
  (2) CommonParams等
3 org.apache.solr.analysis
  (1) tokenizerFactory(BaseTokenizerFactory)可以按照字数(NGramTokenizerFactory)、正则、标签、关键字、
 字符、俄语、树结构(TrieTokenizerFactory)、空格
  (2) BaseCharFilterFactory
  (3) BaseTokenFilterFactory,按照语言种类、停词、元数据(payload)、语音(?DoubleMetaphoneFilterFactory)、
 带连字符的(HyphenatedWordsFilterFactory)、通配符(wildcard)、同义(synonym)
4 org.apache.solr.core
   (1)AbstractSolrEventListener含QuerySenderListener,newSearcher的warm在里面
   (2)core初始化相关 CoreContainer[n个CoreDescriptor];
   SolrCore 包含与搜索相关的
   设置/获取:ResponseHeader、Plugins、booleanQueryMaxClauseCount、getSearcher
   初始化/注册:DeletionPolicy、Listeners、searcher、IndexReader(witer)、Index、HighLighter
   (3)DirectoryFactory,IndexReaderFactory
   (4)索引commmit时间点的保留策略 IndexDeletionPolicyWrapper
   (5)JmxMonitoredMap extends ConcurrentHashMap(高并发多锁的hashmap)
   (6)RequestHandlers (can Read solrconfig.xml and register the appropriate handlers),
 包含LazyRequestHandlerWrapper,让lazyload的requestHandler在第一次调用该requestHandler时才初始化
   (7)SolrResourceLoader包含ClassLoader和getLines

5 org.apache.solr.handler和org.apache.solr.handler.admin
   (1)SnapShooter
   (2)ContentStreamLoader包含xml和csv两种update、delete、read数据的方式,and also can update handler which uses the JavaBin format
   (3)RequestHandlerBase
 SearchHandler(@dismax、相关度排序相关;
     @以组件形式添加各项功能参数及对参数的处理,eg.高亮、facet、mlt、query、stats、debug;
     @shard just the string like 'localhost:8080/complex/';
     @ShardResponse)
 AnalysisRequestHandlerBase(对request的xml进行处理,返回NamedList
        @analyzeTokenStream Analyzes the given TokenStream, collecting the Tokens it produces;
        @convertTokensToNamedLists;
        @AnalysisContext class)
 AnalysisRequestHandler(@processContent for Tokenizing doc;
       @readDoc)
   (4)CoreAdminHandler handleRequestBody为入口,针对各个core的状态进行加载core、重命名core、删除core等
   (5)MoreLikeThisHandler (@getMoreLikeThis mlt.like(lucene function)根据req获取相似度较高的文档集合添加到response中,传入一个doc,首先获取每一个字段的TermFreqVector(即tf),然后将其添加到TermFrequencies中;
  遍历TermFrequencies中所有的term,并取出其tf以及在所有指定字段中最大的df,根据df和当前索引文档数计算idf,然后计算每个term的score=tf*idf,并压入PriorityQueue,
  按照score从大到小取出一定数量的term(maxQueryTerm)进行组建构建一个BooleanQuery,用创建的query进行一次检索,取出得分最高的N篇文档即可。)
   (6)PluginInfoHandler 取出并呈现每个core下加载的QUERY HANDLERS、UPDATE HANDLERS、CACHE、HIGHLIGHTING 等信息
   (7)ReplicationHandler 提供api供slaves从master复制,设置复制的校验、复制前的的变量准备(eg.是否commit、optimize等),以及复制后的相关动作
      (@Adler32 包含校验的拷贝
      @getReplicationDetails showing statistics and progress information
      @FileStream class 带校验和的类)
   (8)ShowFileRequestHandler web方式读取conf文件(设置文件hidden可以控制文件不被访问)
   (9)SpellCheckerRequestHandler 根据空格分割字符串并按照SolrQueryRequest的extendedResults, cmd rebuild,accuracy, suggestionCount, restrictToField, and onlyMorePopular参数来
     选择性增加分割后word的相关信息,eg.词频、suggestion word等
   (10)SystemInfoHandler 包含core,jvm,lucene的systemInfo
   (11)ThreadDumpHandler 线程信息统计current,peak=,daemon
   (12)AdminHandlers 注册所有管理handlers(LukeRequestHandler、SystemInfoHandler、PluginInfoHandler、ThreadDumpHandler、ThreadDumpHandler、ShowFileRequestHandler)
6 org.apache.solr.handler.component
  (1)SearchComponent 基类,子类基本都包含多shards document的整合,即按照单机和分布式搜索分别使用process、distributeProcess
  (2)DebugComponent (dds debugging information to a request)
  (3)FacetComponent (@countFacets ;class DistribFieldFacet used for through each facet.field, adding results from this shard ;包含操作refine
    @facet_fields or facet_queries)
  (4)HighlightComponent (@usePhraseHighlighter 完全匹配才高亮 @highlightMultiTerm模糊匹配高亮 used with usePhraseHightlighter=false)
  (5)QueryComponent 查询类,涉及url参数的处理,获取查询结果集等
  (6)QueryElevationComponent 优先级提交类,elevate.xml中设置置前显示的以及排除的id doc
  (7)SpellCheckComponent  拼写检查、匹配 inform在tomcat启动时运行,加载spellcheck dic以及 convert都有默认的 (用途:你是不是要找那个关键词)
  (8)StatsComponent 根据字段类型获取facet stats
  (9)TermsComponent 实现自动提示功能,Return TermEnum information,rb.req.getSearcher().getReader().terms获取符合的term enum,涉及到term的词频等
  (10)TermVectorComponent  Return term vectors for the documents ,包含tv、tf、offsets、position、df、tf-idf等,TVMapper

7 org.apache.solr.highlight
  DefaultSolrHighlighter getPhraseHighlighter getSpanQueryScorer getFormatter getFragmenter
 简单的顺序
 TermQuery query = new TermQuery(new Term(“field”, “textFragment”));
 Scorer scorer = new QueryScorer(query);      //QueryScorer 是内置的计分器
 Highlighter highlighter = new Highlighter(scorer);
 TokenStream tokenStream =new SimpleAnalyzer().tokenStream(“field”,new StringReader(text));//是由分析器生成, 文本中高亮部分的起始和结束位置。
 System.out.println(highlighter.getBestFragment(tokenStream,text));  //利用Fragmenter将原始文本分割成多个片段

转自:http://www.cnblogs.com/ai464068163/archive/2012/03/06/2382209

相关问答

更多

在form_for中没有秒,分钟或小时的Datetime_select(Datetime_select without seconds, minutes, or hours in a form_for)

首先,我会使用date_select ( api doc here ),然后执行:discard_year => true 。 例: date_select("article", "written_on", :discard_year => true) Well first off I would use date_select (api doc here) and then do :discard_year => true. Example: date_select("article", "

Rails验证attribute_a是否存在或者attribute_b存在(Rails validate that attribute_a exists OR attribute_b exists)

validates_numericality_of :charge, allow_nil: true validates_numericality_of :payment, allow_nil: true validate :charge_xor_payment private def charge_xor_payment if !(charge.blank? ^ payment.blank?) errors.add(:base, "Specify a charge or a

绑定的ComboBoxItem不显示(Bound ComboBoxItem not displaying)

如果SelectedItem在绑定点没有包含在ItemsSource中,那么ComboBox有一个将SelectedItem设置回null的恶习。 尝试并暂缓更新SelectedItem直到填充ItemsSource 。 If SelectedItem is not contained within ItemsSource at the point of binding, then ComboBox has a nasty habit of setting SelectedItem back t

在地图矢量中使用地图中的查找(How can I use find() on a map inside a vector of maps?)

迭代器就像一个指向vector的指针,所以使用arror,即间接,表示法: vecIter->find("alpha"); The iterator acts like a pointer to the vector, so use the arrow, i.e., indirection, notation: vecIter->find("alpha");

包括指针中的对象,指针是数组中对象的项目(Include object from pointer which is item of an object within an array)

如果您的数据结构如下: {"ingredients": [ {"__type":"Pointer", "className":"Unit", "objectId":"FKxndF7X9H"}, {"__type":"Pointer", "className":"Unit", "objectId":"mXVGZDglZN"} ] } 然后你可以通过只使用recipeQuery.include("ingredients");来获取这些单位recipeQuery.inclu