public class DeltaIndex
extends java.lang.Object
The index can be passed a result buffer, and output an instruction sequence
that transforms the source buffer used by the index into the result buffer.
The instruction sequence can be executed by
BinaryDelta
to recreate the
result buffer.
An index stores the entire contents of the source buffer, but also a table of
block identities mapped to locations where the block appears in the source
buffer. The mapping table uses 12 bytes for every 16 bytes of source buffer,
and is therefore ~75% of the source buffer size. The overall index is ~1.75x
the size of the source buffer. This relationship holds for any JVM, as only a
constant number of objects are allocated per index. Callers can use the
method getIndexSize()
to obtain a reasonably accurate estimate of
the complete heap space used by this index.
A DeltaIndex
is thread-safe. Concurrent threads can use the same
index to encode delta instructions for different result buffers.
Modifier and Type | Field and Description |
---|---|
(package private) static int |
BLKSZ
Number of bytes in a block.
|
private long[] |
entries
Pairs of block hash value and
src offsets. |
private static int |
MAX_CHAIN_LENGTH
Maximum number of positions to consider for a given content hash.
|
private byte[] |
src
Original source file that we indexed.
|
private static int[] |
T |
private int[] |
table
Pointers into the
entries table, indexed by block hash. |
private int |
tableMask
Mask to make block hashes into an array index for
table . |
private static int[] |
U |
Constructor and Description |
---|
DeltaIndex(byte[] sourceBuffer)
Construct an index from the source file.
|
Modifier and Type | Method and Description |
---|---|
private void |
copyEntries(DeltaIndexScanner scan) |
private int |
countEntries(DeltaIndexScanner scan) |
void |
encode(java.io.OutputStream out,
byte[] res)
Generate a delta sequence to recreate the result buffer.
|
boolean |
encode(java.io.OutputStream out,
byte[] res,
int deltaSizeLimit)
Generate a delta sequence to recreate the result buffer.
|
static long |
estimateIndexSize(int sourceLength)
Estimate the size of an index for a given source.
|
private static int |
fwdmatch(byte[] res,
int resPtr,
byte[] src,
int srcPtr) |
long |
getIndexSize()
Get an estimate of the memory required by this index.
|
long |
getSourceSize()
Get size of the source buffer this index has scanned.
|
(package private) static int |
hashBlock(byte[] raw,
int ptr) |
private static int |
keyOf(long ent) |
private static int |
negmatch(byte[] res,
int resPtr,
byte[] src,
int srcPtr,
int limit) |
private DeltaEncoder |
newEncoder(java.io.OutputStream out,
long resSize,
int limit) |
private static long |
sizeOf(byte[] b) |
private static long |
sizeOf(int[] b) |
private static long |
sizeOf(long[] b) |
private static int |
sizeOfArray(int entSize,
int len) |
private static int |
step(int hash,
byte toRemove,
byte toAdd) |
java.lang.String |
toString() |
private static int |
valOf(long ent) |
static final int BLKSZ
private static final int MAX_CHAIN_LENGTH
All positions with the same content hash are stored into a single chain. The chain size is capped to ensure delta encoding stays linear time at O(len_src + len_dst) rather than quadratic at O(len_src * len_dst).
private final byte[] src
private final int[] table
entries
table, indexed by block hash.
A block hash is masked with tableMask
to become the array index
of this table. The value stored here is the first index within
entries
that starts the consecutive list of blocks with that
same masked hash. If there are no matching blocks, 0 is stored instead.
Note that this table is always a power of 2 in size, to support fast normalization of a block hash into an array index.
private final long[] entries
src
offsets.
The very first entry in this table at index 0 is always empty, this is to
allow fast evaluation when table
has no values under any given
slot. Remaining entries are pairs of integers, with the upper 32 bits
holding the block hash and the lower 32 bits holding the source offset.
private final int tableMask
table
.private static final int[] T
private static final int[] U
public DeltaIndex(byte[] sourceBuffer)
sourceBuffer
- the source file's raw contents. The buffer will be held by the
index instance to facilitate matching, and therefore must not
be modified by the caller.public static long estimateIndexSize(int sourceLength)
This is roughly a worst-case estimate. The actual index may be smaller.
sourceLength
- length of the source, in bytes.1.75 * sourceLength
.private int countEntries(DeltaIndexScanner scan)
private void copyEntries(DeltaIndexScanner scan)
public long getSourceSize()
public long getIndexSize()
getSourceSize()
, as well as a rough approximation of JVM
object overheads.private static long sizeOf(byte[] b)
private static long sizeOf(int[] b)
private static long sizeOf(long[] b)
private static int sizeOfArray(int entSize, int len)
public void encode(java.io.OutputStream out, byte[] res) throws java.io.IOException
There is no limit on the size of the delta sequence created. This is the
same as encode(out, res, 0)
.
out
- stream to receive the delta instructions that can transform
this index's source buffer into res
. This stream
should be buffered, as instructions are written directly to it
in small bursts.res
- the desired result buffer. The generated instructions will
recreate this buffer when applied to the source buffer stored
within this index.java.io.IOException
- the output stream refused to write the instructions.public boolean encode(java.io.OutputStream out, byte[] res, int deltaSizeLimit) throws java.io.IOException
out
- stream to receive the delta instructions that can transform
this index's source buffer into res
. This stream
should be buffered, as instructions are written directly to it
in small bursts. If the caller might need to discard the
instructions (such as when deltaSizeLimit would be exceeded)
the caller is responsible for discarding or rewinding the
stream when this method returns false.res
- the desired result buffer. The generated instructions will
recreate this buffer when applied to the source buffer stored
within this index.deltaSizeLimit
- maximum number of bytes that the delta instructions can
occupy. If the generated instructions would be longer than
this amount, this method returns false. If 0, there is no
limit on the length of delta created.java.io.IOException
- the output stream refused to write the instructions.private DeltaEncoder newEncoder(java.io.OutputStream out, long resSize, int limit) throws java.io.IOException
java.io.IOException
private static int fwdmatch(byte[] res, int resPtr, byte[] src, int srcPtr)
private static int negmatch(byte[] res, int resPtr, byte[] src, int srcPtr, int limit)
public java.lang.String toString()
toString
in class java.lang.Object
static int hashBlock(byte[] raw, int ptr)
private static int step(int hash, byte toRemove, byte toAdd)
private static int keyOf(long ent)
private static int valOf(long ent)