[Heritrix/crawler-beans.cxml]DISPOSITION CHAIN-dispositionProcessors
212.Heritrix_설정파일/05. PROCESSING CHAINS 2016. 8. 1. 17:37DispositionProcessors
- 실행할 DispositionChain의 Processor collection 리스트
기존 설정 값
<!-- now, processors are assembled into ordered DispositionChain bean -->
<bean id="dispositionProcessors" class="org.archive.modules.DispositionChain">
<property name="processors">
<list>
<!-- write to aggregate archival files... -->
<ref bean="warcWriter"/>
<!-- ...send each outlink candidate URI to CandidateChain,
and enqueue those ACCEPTed to the frontier... -->
<ref bean="candidates"/>
<!-- ...then update stats, shared-structures, frontier decisions -->
<ref bean="disposition"/>
<!-- <ref bean="rescheduler" /> -->
</list>
</property>
</bean>
- [process 진행 과정 설명]
- crawling 된 data를 warc 파일로 저장한다. -> warcWriter
- 발견된 outlink들을 CandidateChain으로 보내고 queue에 ACCEPT 형식으로 push한다. -> candidates
- 하나의 routine이 끝났으므로 상태를 재정비함 -> disposition
'212.Heritrix_설정파일 > 05. PROCESSING CHAINS' 카테고리의 다른 글
[Heritrix/crawler-beans.cxml]DISPOSITION CHAIN-disposition (0) | 2016.08.01 |
---|---|
[Heritrix/crawler-beans.cxml]DISPOSITION CHAIN-candidates (0) | 2016.08.01 |
[Heritrix/crawler-beans.cxml]DISPOSITION CHAIN-warcWriter (0) | 2016.08.01 |
[Heritrix/crawler-beans.cxml]FETCH CHAIN-fetchProcessors (0) | 2016.08.01 |
[Heritrix/crawler-beans.cxml]FETCH CHAIN-extractorSwf (0) | 2016.08.01 |