Skip to content

Configuring ZipNumCluster

Lauren Ko edited this page Aug 27, 2019 · 7 revisions

OpenWayback Advanced configuration

Configuring ZipNumCluster

For information on ZipNum format https://web.archive.org/web/20160804001009/http://aaron.blog.archive.org/2013/05/28/zipnum-and-cdx-cluster-merging/

Enable and edit CDXCollection.xml as follows:

    <property name="resourceIndex">
      <bean class="org.archive.wayback.resourceindex.LocalResourceIndex">
        <property name="canonicalizer" ref="waybackCanonicalizer" />
        <property name="source">

        <bean class="org.archive.wayback.resourceindex.ZipNumClusterSearchResultSource">
                <property name="cluster">
                        <bean class="org.archive.format.gzip.zipnum.ZipNumCluster">
                                <property name="summaryFile" value="/<PATH-TO-SUMMARYFILE>"/>
                                <property name="locFile" value="/<PATH-TO-LOCFILE>" />
                        </bean>
                </property>
                <property name="params">
                        <bean class="org.archive.format.gzip.zipnum.ZipNumParams"/>
                </property>
        </bean>

        </property>
        <property name="maxRecords" value="100000" />
        <property name="dedupeRecords" value="true" />    
      </bean>
    </property>

**Summary file format**

Summary file consists of 4 columns separated by tab as follows:

  1. The first line of each chunk
  2. Chunk name (or shard name)
  3. Offset: the starting byte-offset of the chunk
  4. Length: the length of the chunk

Loc file format

Loc file consists of 2 columns separated by tab as follows:

  1. Chunk name (or shard name)
  2. Chunk URL: e.g. hdfs://url or http://url

For more information on how to generate summary file using hadoop, please see link at the top.

POLLEE online fashion* POLLEE fashion* Small medium bisness digital economy world profile image with GDPR 10 year working on EU staff union member supported working on POLLEE tree 🌴 sitters/ link Or COVID 19 coming al employees teacher after end of road my bisness after honestly hardly working on GDPR with sopported working on organised by POLLEE tree 🌴 sitters/ project: hello world 🌎 Safe in child advisor marketing by all social media with publisher/ Wikipedia pages Pollee search engine Nasiruddin miah* GitHub profile image Nasiruddin miah with everyone side working apple . Microsoft 360 video open Google bisness profile add (POLLEE online fashion) Ur support open my biases return my all employees with after right planning working on everyone very week employees owners live organisation working on with worldwide young group running conference event meeting everything planning stays please me with my working journey Nasiruddin miah (POLLEE online fashion) Google maps * 1 tree 🌴 changing in the world 🌍

Clone this wiki locally