II.
BLOCKCHAIN SYSTEM SETTINGS
We define Blockchain as a distributed system of machines,
each running the same program consisting of executable
statements.
We represent this distributed system by a set of
n state machines called nodes that communicate with each
other.
Every node can communicate with its neighbors either
by exchanging messages,
or using shared memory.
When using
shared memory,
all nodes must know one another,
forming
a complete undirected graph of nodes.
When using message
passing,
each node has to be connected to at least one other
node,
and a (reliable,
non-Byzantine) path must exist between
every two nodes,
forming an undirected tree graph.
Communication by writing in,
and reading from,
the shared
memory usually fits systems where the nodes are geograph[1]ically close to each other.
A message-passing distributed
model fits both nodes that are located close to each other
and wide-area distributed systems such as communication
networks.
In a message-passing model nodes communicate
by sending and receiving messages over the wire.
In such a
system,
network latency can have a big influence over the
synchronization model of the system.
This has been examined
to some extent by Garay,
Kiayias and Leonardos [4].
We will
further elaborate how it affects the Blockchain,
taking into
account the abbreviation times.
III.
PRELIMINARY
It has been proved that for any distributed system of size
n,
for consensus to be reached,
no more than n3 nodes can be
faulty [3] .
Faulty processes can exhibit a number of behaviors,
such as presenting different messages to different nodes,
dropping messages,
and altering messages on transit.
However,
Blockchain,
which is also a distributed system,
does not have
these limitations.
The reason lies in the mining process,
and
in the ability to verify the correctness of newly published
blocks.
In order to distribute two different but legitimate
blocks,
a node will have to beat the entire network twice in
a row thus possessing an unrealistic amount of computational
power.
Even if a node manages to ditribute two different but
legitimate blocks,
causing a fork in the network,
eventually
when more blocks are added the network will converge to
a single consensus chain,
based on the difficulty of all the
blocks.
Once a block is published,
all other nodes can verify
the transactions in it are valid,
and that enough work has been
put into creating this block.
This guarantees that messages
accepted in the Blockchain are legitimate.
IV.
MESSAGE PASSING BLOCKCHAIN ABBREVIATION
Overview.
Message Passing Blockchain is a type of
Blockchain where nodes send each other messages.
This
Blockchain is usually cryptocurrency-based,
but other use[1]cases have also been demonstrated,
such as voting systems
[6].
In this architecture,
not every node has to know every
other node but there must exist a channel between each node
to at least one other node.
Nodes joining the network have a
number of ways of discovering their peers.
Over the years,
IRC (Internet Relay Chat) channels have been used to locate
IP addresses of running nodes.
When joining IRC channels,
the client receives messages broadcasted from other clients
already subscribed to that channel.
This method has been
abandoned in favor of DNS seeds and hard-coded IP lists.
The DNS seeds resolve to a group of IP addresses which are
known to be stable.
If a node is experiencing a DNS failure
it will try the hard-coded IP addresses.
We will address the common situation where an asset-based
Blockchain (such as cryptocurrency) uses the message-passing
model for communication.
In this scheme,
every participant
has an account which contains its balance of assets (for
example,
number of coins).
We will use this feature in the
abbreviation of the chain,
as capturing the balances of all
participants at a certain point in time can serve as a reference
point for all future transactions.
This point in time can
become the new “beginning of time”,
or the new Genesis
block.
Abbreviation.
Let B be a Blockchain consisting of blocks
B1,
.
.
.
,
Bn,
where B1 is the genesis block.
Abbreviating this
chain is replacing it with a new genesis block Bg which
summarizes the account balances of all participants.
Network Latency.
In our context,
we define network latency
as the maximal amount of time between a block publication
and the last node in the network receiving this block.
The network latency has a direct effect on the Blockchain
abbreviation,
since it might influence whether an abbreviated
Blockchain gets accepted by the vast majority or not.
If a
new block is mined faster than the time is takes to abbreviate
a chain it might lead to a situation where abbreviated chains
never get accepted.
This might happen because by the time
the new abbreviated chain reaches everybody it will not be
up to date as other nodes would have received the next block
which is not included in the abbreviation.
We present two
lemmas in connection with network latency.
The first lemma
deals with the uniform latency case,
in which the rate of
information propagation is at least a linear function of the
elapsing time.
Namely,
there is a function of time (since
a message is sent) to the percentage of system nodes that
receive the message.
This is similar to what happens in a
network of nodes spread out across a single LAN,
in a small
geographical location such as a university campus.
The second
lemma deals with non-uniform network latency which defines
an unpredictable rate of propagation through the network.
Namely,
there is no information on the portion of the nodes
that received the message following the transmission of the
message,
but there exists a bound on the time it takes for
all nodes to receive the message.
This is more relevant in a
network spread out across a large geographical area (WAN),
such as the Ethereum network.
Uniform Network Latency.
Uniform Latency is defined as
uniform propagation through the network.
This means that
a message propagates through the network in a predictable
manner.
Formally let l be the network latency in a uniformly
latent network of size s,
and let x be a constant in the range
0 .
.
.
1.
Suppose a block is published in this network.
After x·l
time,
at least x·s nodes in the network have received the block.
Lemma IV.
1.
Assuming uniform latency through the network,
abbreviation of the Blockchain must take less time than the
time it takes to mine a new block.
Otherwise,
the abbreviated
chain may never be accepted and the network may continue
to mine on the original chain.
Proof Sketch.
Assume that the lemma is not true.
For the sake
of simplicity,
assume that mining takes a constant time of x
and abbreviating a chain takes a constant time of y>x.
Suppose a block b1 is published at time t1 and assume a
best case scenario where node n1 starts abbreviating the chain
(including b1) at time t1.
Now suppose a block b2 is published
at time t2 = t1 + x.
n1 is not done abbreviating yet,
it will
finish at time t3 = t1 + y>t2.
b2 will start propagating
through the network before the new abbreviated Blockchain
is released.
Uniform latency ensures a vast majority of the
nodes will receive the block b2 before receiving the new chain.
When the new chain will eventually reach everyone,
it will be
discarded by the vast majority because it will not be up to date
(it will not include transactions from b2).
Nodes will continue
mining on the original chain including the newly published
block b2.
So y cannot be larger than x for abbreviation to
work.
Non-Uniform Network Latency.
Non-Uniform Latency is
defined as unpredictable propagation through the network.
This means that there is no guarantee on the percentage of
nodes that receive a message after a certain amount of time.
Formally,
let l be the network latency in a non-uniformly
latent network of size s.
Suppose a message is transmitted
in this network.
After m time passes,
where m<l,
there
is no guarantee on the amount of nodes that have received
the message.
Note,
that when l time has passed,
the entire
network has received the message.
Lemma IV.
2.
Let l be the network latency in a non-uniformly
latent network.
Let t be the time it takes to mine a block
and t>l.
Then Abbreviation time must take less than t − l.
Otherwise,
the abbreviated chain may never be accepted and
the network may continue to mine on the original chain.
Proof Sketch.
Assume that the lemma is not true.
For the sake
of simplicity,
assume that abbreviating a chain takes a constant
time of y where t>y>t−l.
Suppose a block b1 is published
at time t1 and assume a best case scenario where node n1 starts
abbreviating the chain (including b1) at time t1.
Now suppose
a block b2 is published at time t2 = t1 +t.
n1 is already done
abbreviating at this point but the abbreviated chain has not
reached all nodes yet because the network latency l ensures it
will reach all nodes at time t1+y+l.
Notice that the following
holds: t1 + y + l>t1 + t.
Since there is no guarantee on the
propagation rate of b1,
it is possible that more than half of the
nodes receive it by the time the abbreviated chain reaches half
of the network.
In that case,
the vast majority of the network is
going to mine on the new block b2 and not on the abbreviated
chain.
So abbreviation time must take less than t − l.
A.
Blockchain protocol changes.
Currently the rule of Proof-of-Work in different Blockchain
platforms is that if a node encounters multiple chains it must
mine on the one which required the most computational power
to produce (generally the longest one).
If there are several
chains with equal length the node may choose arbitrarily on
which chain to mine and eventually the network will settle
on a single longest chain which everyone will adopt.
This
rule conflicts with Blockchain abbreviation since abbreviating
means shortening the chain.
So how will nodes know the
shorter chain needs to be adopted? The solution to this is
modifying the difficulty of the genesis block to be the sum
of difficulties of all blocks summarized by the abbreviation.
If we are abbreviating a chain of three blocks A → B → C,
each of difficulty 10,
we create a new chain consisting of
one genesis block G of difficulty 30.
This difficulty is only
symbolic,
since it does not represent the amount of work
spent on mining this new genesis block,
but the amount of
work spent on all blocks summarized by this genesis block.
The protocol must be changed so nodes do not verify the
hash of the genesis against its difficulty.
The genesis hash
will not match the claimed difficulty.
Another minor change
is that participants must allow the genesis block to increase
in size beyond the configured limits (usually Blockchain
platforms limit the size of published blocks).
The Blockchain protocol needs to change so everyone in
the network must accept an abbreviated Blockchain once such
a chain is available.
Otherwise,
the system is susceptible to
attack from Byzantine nodes,
which can generate much more
revenue than they are supposed to.
Claim IV.
3.
Formally,
consider three groups of nodes mining
together on a Blockchain.
Group F (Fast),
adopt an abbre[1]viated chain once it is released.
Group S (Slow),
want to
keep the transaction history,
so they continue to mine on the
unabbreviated chain for a few blocks,
in hopes of taking over
the abbreviated chain.
Eventually,
they comply and switch.
Group B (Byzantine),
abbreviate the chain after every new
block,
and try to mine on it.
The computational power of
the three groups F,
S and B are f,
s and b respectively.
If group B is large enough that together with another group
they constitute a vast majority,
group B can generate more
revenue than they are supposed to.
Proof Sketch.
Formally,
consider three groups of nodes.
Group F (Fast),
adopt an abbreviated chain once it is released.
Group S (Slow),
want to keep the transaction history,
so they
continue to mine on the unabbreviated chain for a few blocks,
in hopes of taking over the abbreviated chain.
Eventually,
they comply and switch.
Group B (Byzantine),
abbreviates
the chain after every new block,
and tries to mine on it.
The computational power of the three groups F,
S and B
are f,
s and b respectively.
Suppose these inequalities hold:
s>b>f and s<f + b.
Here we assume group B is the
middle group in size,
but the lemma is true even if it is the
smallest group in the network.
Since the group s does not
adopt abbreviated chains right away,
they never mine on the
chain the vast majority is mining on.
The groups B and F
constitute a vast majority and they get all the revenue.
The
fair share of the Byzantine group is b+sb+f .
However since
they split the revenue with the Fast group their actual revenue
is b b+f .
Because b>f,
the Byzantines have actually majority
control though they do not form a vast majority in the entire
network.
Besides generating more revenue than they should,
the B group can even drop blocks mined by the F group
and publish only their own blocks.
Since they are bigger in
size than the F group they are guaranteed to take over any
chain of any length published by group F.
For this reason,
an
abbreviated Blockchain needs to be accepted by all honest
nodes.
We suggest including a (strong) cryptographic hash
(e.
g.
SHA-3) result of the entire abbreviated blocks in the
new genesis block,
as a proof for the original block values.
Thus,
ensuring proof of continuation.
Ethereum based implementation.
As a proof of concept,
we implemented the above schema using the open source
Ethereum project,
which is the code driving the Ether digital
coin.
Obviously,
we could not mine on the public branch since
we wanted to test the abbreviation and also guarantee short
mining time.
We used Ethereum’s private chain functionality
to create our own chain.
We set the difficulty to be very low
to guarantee successful mining within a short period of time
(a few seconds).
Next,
we created two virtual machines that
would act as mining nodes (e.
g.
,
peers,
as described by the
Ethereum protocol) and started mining on both.
Once enough
blocks have been mined,
we stopped the mining process on
both virtual machines,
and planted a new Genesis block on
each machine.
The new Genesis block was similar to the
default one given by the Ethereum project,
but it also had
the balances of the two nodes as captured the moment they
stopped mining.
This was possible due to the new ‘alloc’
directive defined in Ethereum,
which cannot be found in all
cryptocurrency projects yet.
Next,
we manually deleted the
chain directory from both machines (effectively deleting all
blocks from the Blockchain) and started mining again.
We
verified that both machines continued to generate revenue on
top of their preallocated balances and that the block count
started from the beginning again.
V.
SHARED MEMORY BLOCKCHAIN ABBREVIATION
Overview.
Shared Memory Blockchain is a type of Blockchain
where nodes connect to one another to maintain Blockchain
consistency,
rather than send each other blocks they have
mined.
This Blockchain is not cryptocurrency-based,
but
deals with publishing records (or arbitrary data) in a public
ledger,
therefore no accounts or wallets are required.
This is
crucial since when we introduce Blockchain abbreviation,
the
lack of miner state will greatly affect the procedure.
Assumptions:
• Fixed pool of miners: A pool of fixed size where every[1]one knows everyone.
• Honest Majority: Byzantine nodes can comprise up to
half of the entire network.
• No miner state: No record is kept of each miner,
such
as balance,
wallets,
history of transactions,
etc.
• Abbreviation is decided by majority vote: Byzantine
nodes are included,
meaning that if a group of honest
nodes and a group of Byzantine nodes together constitute
a vast majority,
their vote to abbreviate the chain or not
will be decisive.
• No down time: Nodes do not disconnect from the net[1]work.
Genesis Block.
The Genesis block in this Blockchain is
simply the earliest block ever mined.
If a certain prefix of the
Blockchain is deleted then the Genesis block is simply the
earliest one that exists.
Mining.
Nodes continue to mine as before,
where they need
to find a hash smaller than a certain threshold in order to be
able to publish their data.
Blockchain Structure.
The Blockchain is a directory structure
stored on each node separately.
Each miner has a user account
on every other nodes’ machine.
All miners must belong to the
same UNIX group “miners”.
The structure is as follows:
• A general Blockchain directory – owner root,
group
miners,
UNIX permissions: 770
• A directory for each block – owner root,
group miners,
UNIX permissions 770
• A directory for each miner – owner minerX,
group miners,
UNIX permissions 740
• A file confirming validation of this block – owner minerX,
group miners,
UNIX permissions 700
(a) Node1
Blockchain
B1 B2 B3 B4
Node1
Confirm
Node2
Confirm
Node3
Confirm
Data
(b) Node2
Blockchain
B1 B2 B3 B4
Node1
Confirm
Node2
Confirm
Node3
Confirm
Data
(c) Node3
Blockchain
B1 B2 B3 B4
Node1
Confirm
Node2
Confirm
Node3
Confirm
Data
Figure 2: A Shared Memory Blockchain with three nodes.
A
total of four blocks have been mined.
Dashed boxes are files.
All other boxes are directories.
The structure makes sure the Blockchain at each machine
is open for all users belonging to group “miners”,
but no user
can interfere with another user’s validation due to separation
of directories between users,
and Read privileges only given
to the group.
Block.
A block is a single unit of data in the Blockchain.
Usually it is complex and contains lots of fields but for our
purpose this structure is enough:
• Serial Number
• Miner id
• Previous Block hash
• Nonce
• Data
The Genesis Block is the same,
except that it may contain
arbitrary data in the Previous Block hash,
since there is no
previous block.
Abbreviation.
Let B be a Blockchain consisting of blocks
B1,
.
.
.
,
Bn.
Abbreviating this chain at abbreviation point t is
cutting the prefix of B1,
.
.
.
,
Bt from this chain to produce the
new abbreviated chain: Bt+1,
.
.
.
,
Bn.
Confirmation.
Nodes regularly connect to all other nodes,
and
write confirmation files into each block in their Blockchains.
They are used when nodes want to vote for Blockchain abbre[1]viation and delete their files from a particular block.
Nodes
confirm blocks sequentially,
so block x will be confirmed
before block x + 1.
Nodes constantly look if their chains need to be abbreviated.
Consider a network of size n.
Each node searches its chain
for a maximum block x,
such that block x has less than n2 confirmations,
while block x + 1 has more than n2
confirmations.
This ensures that the missing confirmations in
block x were indeed erased and not just delayed by network
latency.
Once such a block is found,
all blocks preceding and
including x can be erased.
No further changes are required
since the Previous Block Hash of the now new genesis block
x + 1 is unchanged and so the hash of the block x remains
also unchanged.
So block x + 2 Previous Block Hash is still
correct and this propagates to the rest of the blocks.
A.
Blockchain Consistency Among Byzantine Nodes
Byzantine nodes have been a part of Blockchain research
from the very beginning.
Having to decide on a single,
main chain is similar to the Byzantine Generals problem [5],
where a group of army generals must decide on a strategy in
the presence of traitors.
The assumption of honest majority
has played an important part of mitigating attacks from
Byzantines.
In particular,
nothing can effect the Blockchain if
only a minority of the network deviate from the protocol.
As
long as the vast majority continue to mine and publish blocks
as always,
they will control the network.
The most prominent
attack from malicious nodes is selfish mining,
which was
detailed in a paper by Eyal and Sirer [2].
But this attack did
not have an effect on the consistency of the Blockchain,
only
on the revenue of the attackers.
By introducing abbreviation,
we also introduce weak points which can undermine the
consistency of the Blockchain.
If a minority of honest miners want to abbreviate the chain
at a certain point,
they delete their confirmations from the
corresponding block in all nodes.
Suppose there exists a
group of Byzantine nodes which delete their confirmations
from the same block as the honest nodes but only from a
part of the entire network.
If the honest nodes which voted
for abbreviation and the Byzantine nodes together constitute
a vast majority this may lead to a situation where a part of
the network mine on an abbreviated chain while another does
not.
There needs to be a mechanism for converging to one
single chain when such inconsistencies occur.
In Algorithm
A1,
we show how nodes keep track of whether they are out
of synchronization with the majority of the network.
Let n denote an arbitrary honest node in the network.
While connecting to other nodes to write confirmations,
n
counts how many nodes in the network mine on a chain
which starts with a different Genesis block.
n also saves all
the last blocks from all the chains in the network.
If more
than half of the nodes have a different Genesis block than n,
and the median of all last blocks in the network is ahead of
the last block of n by a predetermined value,
then the node
assumes it has fallen behind on its chain,
and needs to catch
up with the majority.
This predetermined value depends on
the specific Blockchain platform.
For example in Bitcoin,
it is
generally accepted to wait for six blocks before considering a
transaction to be confirmed.
Probabilistic analysis has shown
that the chances for a chain to be overtaken after it has
advanced six blocks over a given block are negligible.
This
value of six can be adopted for our purposes too.
We look at the median of all last blocks since that way all
the Byzantine nodes cannot declare they have an extremely
high or low serial number in the last block and manipulate
the network into making decisions based on that.
When the
median block is significantly ahead of our block it is safe
to say that the vast majority is mining on a different chain,
since our chain is not catching up with it.
Next,
we present a
lemma which shows that Byzantine nodes cannot undermine
the consistency of the Blockchain.
Lemma V.
1.
Let n be a network of miners,
divided into
two groups with mining power g1 and g2.
Without loss of
generality,
suppose g1 = g2 · y,
and y > 1.
Let us assume
they mine on two separate chains starting at length 0.
After
sufficient time passes,
g1 will have x more blocks than g2,
for
any x > 0.
Proof Sketch.
Since g1 has more computing power,
over time
they will commit more blocks than g2 by a factor of y.
After
a certain period of time g1 will have b1 · y blocks on its chain
while g2 will have b1.
Thus b1 · y − b1 = x and this yields
b1 = yx−1 .
So after g2 mines xy−1 blocks it will be x blocks
behind g1.
Lemma V.
2.
A Byzantine group of nodes z cannot undermine
the consistency of the Blockchain by erasing confirmations
from a subgroup of nodes t of the whole network n,
t<n.
Proof Sketch.
Since the group z is less than half of the
entire network (by the honest majority assumption),
they can
only succeed in making nodes abbreviate their chain if more
legitimate nodes also vote for abbreviation.
Let l be a group
of legitimate nodes that want to abbreviate the chain at some
block b1.
Assume that l < n2 and z < n2 but l + z > n2 .
Since l are honest they erase their confirmations at block b1
in the entire network but z erase their confirmations only from
a group of nodes t.
Case 1: t > n2 – the minority is mining on the original long
chain.
In this case more than half of the network abbreviates their
chain so the rest of the network falls behind and continue to
mine on the original chain.
Let m denote the minority group.
By Algorithm A1,
the group m will continue to mine until
the median of the serials of the last blocks in the network
is ahead of their chain by a predetermined x blocks.
This is
bound to happen by the previous lemma.
After that happens,
m will switch chains to a longer chain.
This chain is going
to be verified by Proof-of-Work Algorithm A2 so Byzantine
nodes cannot cheat nodes into adopting inconsistent chains.
Case 2: t < n2 – the minority is mining on the new
abbreviated chain.
In this case less than half of the network abbreviates their
chain so this group fall behind and mine on a separate chain.
Let m denote the minority group.
By Algorithm A1,
the
group m will continue to mine until the median of the serials
of the last blocks in the network is ahead of their chain by
a predetermined x blocks.
This is bound to happen by the
previous lemma.
After that happens,
m will switch chains to
a longer chain.
This chain is going to be verified by Proof-of[1]Work (Algorithm A2) so Byzantine nodes cannot cheat nodes
into adopting inconsistent chains.
Implementation over Unix file system.
As a proof of con[1]cept,
we implemented a complete Shared Memory Blockchain
framework using the Java language.
We used three small
virtual machines,
each running a mining node written in
Java and an SSH server for communication between the
nodes.
Each virtual machine contained a user group called
“miners”,
and three user accounts,
one for each node.
We
created a directory structure in each machine,
similar to that
described above,
where every block in the chain corresponds
to a directory.
In each such directory there are three more
directories,
one for each node,
and a file containing the data
in that block.
The data published by nodes was an arbitrary
256 bit string generated by a random function.
For mining we
implemented a scheme similar to other Blockchain systems,
where nodes have to compute a hash smaller than a certain
value in order for it to be valid.
Periodically,
each node would
connect to every other node,
and write confirmation files to
their designated directory.
During this communication,
if a
machine notices that another machine has a more advanced
Blockchain it will copy it,
after verification,
and resume
normally.
For maintaining consistency,
we used Algorithm
A1,
and ran some manual tests by erasing confirmation files.
All tests showed that all nodes eventually converged to a single
Blockchain,
and continued to mine normally.
ACKNOWLEDGEMENT
We thank Yossi Gilad and Sergio Rajsbaum for comments