Theory – Himitsu project

Elliptic curves and how to use them (part four)

One of the basic parameters of processors is the word length of a single arithmetic operation. This value can range from 8 bits for simple micro-controllers through 32 bits for 80386 derivatives to the most popular 64-bit length today. In addition, processors may have extended instruction sets (such as MMX, SSE, etc.) that can extend the processor’s capabilities to 128-bit (or larger) words.
The word lengths provided are therefore too short for the implementation of cryptographic algorithms. For RSA, the suggested (in terms of security) key length is today (year 2019) 4096 bits, which is many times more than standard processors provide. For elliptic curves, the length of the prime number in the suggested curve is 521 bits (no, it’s not an error: $2^{521}-1$ is one of Mersenne prime numbers).
So, if we want to create an implementation of elliptic curves, we must have implementations of large integer numbers. We can use an existing one (mathematical libraries usually implement not only ‘large numbers’ but also the elliptic curves themselves) or write it yourself (it is enough to implement addition, subtraction, multiplication, modulo and division). The presented examples are written in Python, which has built-in ‘large numbers’ arithmetic, but when writing an implementation in a typical embedded we do not have this comfort – we have to implement everything in C / C ++.
Let’s check the performance of the implementation presented in the previous part, but let’s use the more demanding curve:

#3.3.  521-Bit Random ECP Group
print "ECPoint 521-Bit"
p = (2**521)-1
tv = FieldElement(p, 1)
neutral = ECPoint(p, -3, 0x0051953EB9618E1C9A1F929A21A0B68540EEA2DA725B99B315F3B8B489918EF109E156193951EC7E937B1652C0BD3BB1BF073573DF883D2C34F1EF451FD46B503F00)
gx = FieldElement.factoryMethod(tv, 0x00C6858E06B70404E9CD9E3ECB662395B4429C648139053FB521F828AF606B4D3DBAA14B5E77EFE75928FE1DC127A2FFA8DE3348B3C1856A429BF97E7E31C2E5BD66)
gy = FieldElement.factoryMethod(tv, 0x011839296A789A3BC0045C8A5FB42C7D1BD998F54449579B446817AFBD17273E662C97EE72995EF42640C550B9013FAD0761353C7086A272C24088BE94769FD16650)
g = ECPoint.factoryMethod(neutral, gx, gy)

#8.3.  521-Bit Random ECP Group
i = 0x0037ADE9319A89F4DABDB3EF411AACCCA5123C61ACAB57B5393DCE47608172A095AA85A30FE1C2952C6771D937BA9777F5957B2639BAB072462F68C27A57382D4A52
r = 0x0145BA99A847AF43793FDD0E872E7CDFA16BE30FDC780F97BCCC3F078380201E9C677D600B343757A3BDBF2A3163E4C2F869CCA7458AA4A4EFFC311F5CB151685EB9
start = monotonic.time.time()
gi = g*i
stop = monotonic.time.time()
print "Time: %s"%(stop-start)
start = monotonic.time.time()
gr = g*r
stop = monotonic.time.time()

ECPoint 521-Bit
Time: 0.146764039993
Time: 0.149250984192

So, one point multiplication on one Intel G4560 3.50GHz core amounted to over 400 ms, and at the time of establishing the connection you need to perform at least 4 multiplications (key negotiation and authentication), i.e. mathematical calculations alone would take almost 2 seconds. For processors whose speeds are measured not in GHz, but in MHz, the results would be measured in minutes, which is completely unacceptable. So we can try to optimize several implementation elements:

large numbers arithmetic,
operations in the field of $F_p$ ,
operations in the EC group ( $F_p$ ),
EC point multiplication ( $F_p$ ).

We can optimize the arithmetic of large numbers by standard in terms of computational complexity, conditional jumps, data copy elimination, etc. In our examples we assume for the purposes of our examples that in Python this implementation is well optimized.
Operations in the field of $F_p$ use the arithmetic of large numbers with additional requirements, which results in the need to use operations: modulo and element inversion algorithm. Division and modulo are the most demanding operations for the processor (division is at least 30 times slower than adding). In the implementation of part 3, each operation required the use of modulo. In addition, one could replace modulo operation by comparing to the number p and subtraction, but the conditional instruction would break the instruction piping in the processor, so that despite the prediction of jumps, the gain could not be large. For multiplication, however, there is a way to get rid of the expensive modulo operation – Montgomery’s reduction algorithm.
To be able to use the Montgomery algorithm, we must first calculate the residual $R=2^k$ which is the smallest approximation greater than p ( $2^{k-1} \; \leq \; p \; \lt \; 2^k$ ).
The operation of converting the number to a residuum form looks like this:

$a' \equiv a\ast R \;(mod\; p)$

The operation of the return conversion from the residuum form looks like this:

$a \equiv a' \ast R^{-1}\;(mod\;p)$

The operation of adding numbers in the form of residuum looks identical to the usual addition in $F_p$ :

$c' \equiv a' + b'\;(mod\;p)$

because:

$a'+b' \equiv a \ast R + b \ast R \equiv (a+b) \ast R \equiv c \ast R \equiv c' \;(mod\;p)$

The operation of multiplying numbers in the form of residuum looks as follows:

$c' \equiv a \ast b \ast R^{-1} \;(mod\;p)$

because:

$a'\ast b' \equiv a \ast R \ast b \ast R \equiv (a \ast b) \ast R^2 \equiv c \ast R^2 \equiv c' \ast R \;(mod\;p)$

However, the above multiplication formula is much less optimal than multiplying in $F_p$ . Therefore, the Montgomery algorithm is used to optimize the multiplication of numbers in the form of residuum, for which it is necessary to calculate $inv_p = \frac{R \ast R^{-1} - 1}{ p }$

The algorithm looks like this:

$T=a' \ast b'$
$m= ((T \; mod \;R) \ast inv_p \; )mod \; R$
$t=\frac{T+m \ast p}{R}$
If $t\geq p$
then $c'=t-p$
else $c'=t$

In the above algorithm we have 3 demanding operations: twice modulo and once dividing. However, the basis for modulo is the number $R=2^k$ , so we can simplify these operations:

$c \; mod \; R=c\wedge(R^{-1})$ [bit mask]
$\frac{c}{R}=c\gg k$ [shifting bits to the right]

So our implementation of $F_p$ arithmetic using the Montgomery algorithm might look like this:

#!/usr/bin/python
#...
class FieldElement:
    def __init__(self, p, rvalue, Rmask, k, invR, invp):
        self.p = p
        self.rvalue = rvalue
        self.Rmask = Rmask
        self.k = k
        self.invR = invR
        self.invp = invp
        
    def __str__(self):
        if self.rvalue < 0:
            return "-0x%x"%(self.convertFromResiduum()*(-1))
        return "0x%x"%self.convertFromResiduum()
    
    def __add__(self, other):
        if type(other) == int or type(other) == long:
            return self.add((other << self.k)%p)
        else:
            return self.add(other.rvalue)
    
    def add(self, rvalue):
        rvalue += self.rvalue
        if rvalue >= self.p:
            rvalue -= self.p
        return FieldElement(self.p, rvalue, self.Rmask, self.k, self.invR, self.invp)
    
    def __sub__(self, other):
        if type(other) == int or type(other) == long:
            return self.sub((other << self.k)%p)
        else:
            return self.sub(other.rvalue)
    
    def sub(self, rvalue):
        res = self.rvalue
        if rvalue > res:
            res += self.p
        res -= rvalue
        return FieldElement(self.p, res, self.Rmask, self.k, self.invR, self.invp)
    
    def __mul__(self, other):
        if type(other) == int or type(other) == long:
            return self.mul((other << self.k)%p)
        else:
            return self.mul(other.rvalue)
    
    def mul(self, rvalue):
        T = rvalue * self.rvalue
        #m = ( (T & self.Rmask) * self.invp) & self.Rmask
        m = ( T * self.invp) & self.Rmask
        t = (T + m * self.p) >> self.k
        if t >= self.p:
            rvalue = t - self.p
        else:
            rvalue = t
        return FieldElement(self.p, rvalue, self.Rmask, self.k, self.invR, self.invp)
    
    def __eq__(self, other):
        return (self.rvalue == other.rvalue)
    
    def __ne__(self, other):
        return (self.rvalue != other.rvalue)

    def convertFromResiduum(self):
        return ((self.rvalue*self.invR)%self.p)
    
    @staticmethod
    def createMethod(value, p):
        R = 1
        k = 0
        while R <= p:
            R <<= 1
            k += 1
        invR = invers(R, p)
        invp = ((R * invR) - 1) / p
        rvalue = (value * R)%p
        return FieldElement(p, rvalue, R - 1, k, invR, invp)
    
    @staticmethod
    def factoryMethod(otherFieldElement, value):
        rvalue = (value << otherFieldElement.k)%otherFieldElement.p
        return FieldElement(otherFieldElement.p, rvalue, otherFieldElement.Rmask, otherFieldElement.k, otherFieldElement.invR, otherFieldElement.invp)
#...

ECPoint 521-Bit
Time: 0.161954164505
Time: 0.163055896759

And … it’s worse 🙂
My first version of this code was even slower. Therefore, the above optimization makes sense only in implementations of large numbers in which operations on bits (shifting, masking) are efficient – it seems that in Python these operations are not. So you have to take my word for it that in C / C ++ implementation the above algorithm noticeably speeds up operations on the $F_p$ field.
In the next part there will be presented optimization, which will significantly accelerate the operations on the curves: Jacobian coordinates.

ec_part4 (full code without optimization)
ec_part4m (full code with Montgomery multiplication)

Elliptic curves and how to use them (part three)

Go to part two

Modern popular processors are able to accurately and efficiently handle integer operations. Therefore, when implementing actions on elliptic curves, it is worth using this property. The type of elliptic curve that is best suited for use in a software solution will be discussed below.

Let’s choose a certain number p, which is a prime number. The bigger this number is the better safety will be assured. For ease of understanding, I suggest choosing number 7 for our next examples.

Let’s create an algebraic structure consisting of a set of numbers {0, p-1} and an operation $+_p$ such that for a $+_p$ b = (a + b) modulo p. We see that the operation $+_p$ is well defined – modulo p always returns an unequivocal result in the range {0, p-1}. Let’s check now whether this structure is a group. Operation $+_p$ is associative:
(a $+_p$ b) $+_p$ c = ((a $+_p$ b) + c) modulo p = (((a + b) modulo p) + c) modulo p = (a + b + c) modulo p = (a + ((b + c) modulo p)) modulo p = (a + (b $+_p$ c)) modulo p = a $+_p$ (b $+_p$ c).

The neutral element (as in the normal addition) is 0:
0 $+_p$ a = (0 + a) modulo p = (a) modulo p = a

Every element have got also an inverse element:
$a+_p\;\;a^{-1}$ = (a + (p – a)) modulo p = (p) modulo = 0

The operation is also commutative:
a $+_p$ b = (a + b) modulo p = (b + a) modulo p = b $+_p$ a

So the structure ({0, p-1}, $+_p$ ) is an abelian group.

Let’s add the second operation $*_p$ (a $*_p$ b = (a * b) modulo p) to this structure and see if the structure is a field. The first of the three conditions is met above (({0, p-1}, $+_p$ ) is a group). Let’s check if ({1, p-1}, $*_p$ ) is also a group.

The $*_p$ operation is associative, which can be proved exactly as for $+_p$ .
The neutral element (as in normal multiplication) is 1:
1 $*_p$ a = (1 * a) modulo p = (a) modulo p = a

Every element have got also an inverse element:
1 = $a\;*_p\;\;\;a^{-1}$ = ( $a\;*\;a^{-1}$ ) modulo p, so from modulo we have $a\;*\;a^{-1}$ = l * p + 1 – it’s linear diophantine equation , so it has a solution because GCD(a, p) = 1 (greatest common divisor).

The operation is also commutative:
a $*_p$ b = (a * b) modulo p = (b * a) modulo p = b $*_p$ a

So the structure ({1, p-1}, $*_p$ ) is also an abelian group.

The last requirement is that operation $*_p$ should be distributive over $+_p$ : (a $+_p$ b) $*_p$ c = ((a + b) modulo p) $*_p$ c = ((a + b) * c) modulo p = (a * c + b * c) modulo p = (a * c modulo p) $+_p$ (b * c modulo p) = a $*_p$ c $+_p$ b $*_p$ c

So we can conclude that the structure Fp = ({0, p-1}, $+_p$ , $*_p$ ) is a field.

The inverse element of $*_p$ operation can be calculated using the diophantic equation, but it is easier to do so using the modified Euclid’s algorithm to calculate the GCD between a and p:

#!/usr/bin/python

def invers(a, p):
    #this only work when p is prime!
    i = p
    j = a
    y2 = 0
    y1 = 1
    while(j > 0):
        q = i/j
        r = i - (j * q)
        y = y2 - (y1 * q)
        i = j
        j = r
        y2 = y1
        y1 = y
    return (y2 % p)

A simple implementation of operations in the Fp field could look like below:

#!/usr/bin/python

...

class FieldElement:
    def __init__(self, p, value):
        if(value >= p):
            raise ValueError('value %s outside field (p=%s)'%(value, p))
        self.p = p
        self.value = value
        
    def __str__(self):
        if self.value < 0:
            return "-0x%x"%(self.value*(-1))
        return "0x%x"%self.value
    
    def __add__(self, other):
        if type(other) == int or type(other) == long:
            return FieldElement(self.p, ((self.value + other) % self.p) )
        else:
            return FieldElement(self.p, ((self.value + other.value) % self.p) )
    
    def __sub__(self, other):
        if type(other) == int or type(other) == long:
            return FieldElement(self.p, ((self.value - other) % self.p) )
        else:
            return FieldElement(self.p, ((self.value - other.value) % self.p) )
    
    def __mul__(self, other):
        if type(other) == int or type(other) == long:
            return FieldElement(self.p, ((self.value * other) % self.p) )
        else:
            return FieldElement(self.p, ((self.value * other.value) % self.p) )
    
    def __truediv__(self, other):
        if type(other) == int or type(other) == long:
            inv = invers(other, self.p)
        else:
            inv = invers(other.value, self.p)
        return FieldElement(self.p, ((self.value * inv) % self.p) )
    
    def __eq__(self, other):
        return (self.value == other.value)
    
    def __ne__(self, other):
        return (self.value != other.value)
    
    @staticmethod
    def factoryMethod(otherFieldElement, value):
        return FieldElement(otherFieldElement.p, value % otherFieldElement.p)

Let’s see how we can use the Fp field to create elliptic curves. At first glance, the EC formula looks identical:
$y^2 = x^3 \;+_ p\;\; ax\;+_p\;\;b$
however, in this case we have used elements and operations from the field Fp instead of the field of real numbers. The curve here must meet an additional condition: (4 * $a^3$ + 27 * $b^2$ ) modulo p = 0.

The curves above the Fp fields are less intuitive than the curves above the R body, as shown in the following graphs:

In general, the points of the EC(Fp) curve appear to be “randomly” arranged with some symmetry. Points do not lie at every x position. Usually there are two points on one x position excluding points whose position y = 0 (in the example above it is point (1, 0), which, however, is in symmetry with itself – theoretically it is also point (1,7)).

The operations on the EC(Fp) curve look quite identical to those on the EC(R) curve with the difference that + * opeations are replaced by $+_p$ and $*_p$ .

And so point negation looks like (the only major difference):
$-A = (x_a,\;(-y_a)(mod \;p))$

Adding two points ( $A+B=C$ , $A\neq B$ , $A\neq -B$ ):

$s=(y_a-y_b)\ast(x_a-x_b)^{-1}\;\;\;(mod p)$
$x_c=s^2-x_a-x_b\;\;\;(mod p)$
$y_c=-y_a+s(x_a-x_b)\;\;\;(mod p)$

Doubling the point ( $2A=C$ , $y_p\neq 0$ ):

$s=(3x^2_a+a){\ast}(2y_a)^{-1}\;\;\;(mod p)$
$x_c=s^2-2x_a\;\;\;(mod p))$
$y_c=-y_a+s(x_a-x_c)\;\;\;(mod p)$

Special cases are also identical:
Doubling $A=(x_a,0)$ gives neutral point (point in infinity).
Adding negation $A+(-A)$ gives neutral point.
Adding $A$ to neutral point gives point $A$ .

The last needed element of EC(Fp) arithmetic is point multiple. Most often it is called point multiplication, but this operation is more understood here as exponentiation in the additive group version.
Traditional exponentiation is the number of multiplications of a given number that we must do to obtain the result of exponentiation. And so if we have $3^4$ then we must multiply the number three four times: 3 * 3 * 3 * 3 = 81. But what if we have to calculate $3^{113}$ ? Of course, we can do 3 * 3 * … * 3 – 112 multiplications (and each multiplication operation costs some computational power), or write $3^{113}$ as $3 * (((((3 * (3 * 3 ^ 2) ^ 2 ) ^ 2) ^ 2) ^ 2) ^ 2$ , where we have only 9 multiplications.

And so, if we have a certain point on EC and want to multiply it by 117, then instead of 116 additions, we can write this number to binary form 1110101 and moving from the least significant bit follow the steps of a simple algorithm:

For the start the base point B equals the multiplied point P, and the ‘result’ point R equals the neutral point.
If the considered bit is one, add point B to point R.
If this is not the last bit, double the B point and go to step 2 considering the next bit.

The result of the algorithm is point R. In our example for 117 * P, the algorithm will perform only 5 additions and 6 doubles.

Using the previous Fp implementation, we can now also implement EC(Fp) arithmetic:

#!/usr/bin/python

import copy

...

class ECPoint:
    def __init__(self, p, a, b):
        testec = (4*a*a*a) + (27*b*b)
        if(testec %p == 0):
            raise ValueError('Bad EC')
        self.a = FieldElement(p, a)
        self.b = FieldElement(p, b)
        self.neutral = True
    
    def __str__(self):
        if self.neutral:
            return "(neutral)"
        return "(0x%x,0x%x)"%(self.x.value, self.y.value)
    
    @staticmethod
    def testOnCurve(a, b, x, y):
        test_left = y*y
        test_right = x*x*x+a*x + b
        if test_left.value == test_right.value:
            return True
        return False
    
    def testOnSelfCurve(self):
        if self.neutral:
            return True
        return ECPoint.testOnCurve(self.a, self.b, self.x, self.y)
    
    @staticmethod
    def factoryMethod(otherECPoint, x, y):
        if not ECPoint.testOnCurve(otherECPoint.a, otherECPoint.b, x, y):
            raise ValueError('Point not on EC')
        res = copy.deepcopy(otherECPoint)
        res.x = x
        res.y = y
        res.neutral = False
        return res
    
    def double(self):
        if self.y.value == 0:
            return ECPoint(self.a.p, self.a.value, self.b.value) #return neutral
        s1 = (self.x*self.x*3) + self.a
        s2 = self.y*2
        s2 = invers(s2.value, s2.p)
        s = s1*s2
        
        
        x = (s*s) - (self.x*2)
        y = s*(self.x - x) - self.y
        return ECPoint.factoryMethod(self, x, y)
    
    def __add__(self, other):
        if self.neutral:
            return copy.deepcopy(other)
        elif other.neutral:
            return copy.deepcopy(self)
        elif other.x == self.x and other.y == self.y:
            return self.double()
        
        s1 = self.y - other.y
        s2 = self.x - other.x
        s2 = invers(s2.value, s2.p)
        s = s1*s2
        
        x = s*s - self.x - other.x
        y = s*(self.x - x) - self.y
        return ECPoint.factoryMethod(self, x, y)

    def __mul__(self, other):
        if type(other) == FieldElement:
            other = other.value
        elif type(other) != int and type(other) != long:
            raise ValueError("Can't multiply by %s"%(type(other)))
        
        res = ECPoint(self.a.p, self.a.value, self.b.value) #start with neutral
        base = copy.deepcopy(self)
        mask = 1
        while True:
            if mask & other:
                res += base
            mask <<= 1
            if mask > other:
                break
            base += base
        return res

We must be sure that the implementation we created is correct. The easiest way to verify it is to select a point and perform some operation on it (e.g. doubling it, then adding it to the doubling result) – the obtained point should be a point that also lies on the curve (satisfies the curve equation). This is a test to verify basic errors, but it is not enough.
The main verification tool are test vectors. For example, in the document rfc5903 in section 8.1 we have a test vector for a 256 bit curve (from section 3.1). if we take the point ‘g’ (from 3.1) and multiply it by ‘i’ (from 8.1), we should get ‘gix’ and ‘giy’ values:

#!/usr/bin/python

...

p = (2**256)-(2**224)+(2**192)+(2**96)-1
tv = FieldElement(p, 1)

neutral = ECPoint(p, -3, 0x5AC635D8AA3A93E7B3EBBD55769886BC651D06B0CC53B0F63BCE3C3E27D2604B)

gx = FieldElement.factoryMethod(tv, 0x6B17D1F2E12C4247F8BCE6E563A440F277037D812DEB33A0F4A13945D898C296)
gy = FieldElement.factoryMethod(tv, 0x4FE342E2FE1A7F9B8EE7EB4A7C0F9E162BCE33576B315ECECBB6406837BF51F5)

g = ECPoint.factoryMethod(neutral, gx, gy)

i = 0xC88F01F510D9AC3F70A292DAA2316DE544E9AAB8AFE84049C62A9C57862D1433
r = 0xC6EF9C5D78AE012A011164ACB397CE2088685D8F06BF9BE0B283AB46476BEE53
gi = g*i
gr = g*r
print "gi(%s)=%s"%(gi.testOnSelfCurve(), gi)
print "gr(%s)=%s"%(gr.testOnSelfCurve(), gr)
gir = gi*r
gri = gr*i
print "gir(%s)=%s"%(gir.testOnSelfCurve(), gir)
print "gri(%s)=%s"%(gri.testOnSelfCurve(), gri)

gi(True)=(0xdad0b65394221cf9b051e1feca5787d098dfe637fc90b9ef945d0c3772581180,0x5271a0461cdb8252d61f1c456fa3e59ab1f45b33accf5f58389e0577b8990bb3)
gr(True)=(0xd12dfb5289c8d4f81208b70270398c342296970a0bccb74c736fc7554494bf63,0x56fbf3ca366cc23e8157854c13c58d6aac23f046ada30f8353e74f33039872ab)
gir(True)=(0xd6840f6b42f6edafd13116e0e12565202fef8e9ece7dce03812464d04b9442de,0x522bde0af0d8585b8def9c183b5ae38f50235206a8674ecb5d98edb20eb153a2)
gri(True)=(0xd6840f6b42f6edafd13116e0e12565202fef8e9ece7dce03812464d04b9442de,0x522bde0af0d8585b8def9c183b5ae38f50235206a8674ecb5d98edb20eb153a2)

And we quickly proved that the presented implementation is correct.

In the next part we will deal with the optimization of this curve arithmetic implementation.

ec_part3 (full code)

go to part 4

Elliptic curves and how to use them (part two)

Go to part one

There should be presented a bit simplistic theory from the so-called “algebraic structures” before showing method of implementing elliptic curves on digital devices.
So what is this algebraic structure? This is a non-empty set with at least one operation in this set: $S= \{ A, \ast_{1}, \ast_{2}, ... \}$ . What can contain such a set? Numbers, colors, cities etc: $A= \{ a_{1}, a_{2}, a_{3}, ...} \}$ . What can this operation be like? Addition, multiplication, but also returning a city lying between two selected cities: $\ast_{i}: a_{x} \times a_{y} \rightarrow a_{z}$ .

For the needs of EC, we will consider structures with finite sets with two-arguments operation or two two-arguments operations: $S= \{ A, \ast \}$ or $S= \{ A, \ast_{1}, \ast_{2} \}$ .

If the structure $S= \{ A, \ast \}$ meets the following requirements:

operation $\ast$ is associative: $a_{1} \ast a_{2}\) \ast a_{3} = a_{1} \ast \( a_{2} \ast a_{3}$
set A has a identity element e for operation $\ast$ : $a_{i} \ast e = e \ast a_{i} = a_{i}$
each element $a_{i}$ from set A have it’s inverse element $a_{j}$ : $a_{i} \ast a_{j} = a_{j} * a_{i} = e$

we can call this kind of algebraic structure a group.
Additionally, if the $\ast$ operation is commutative for each $a_{i}, a_{j}$ : $a_{i} \ast a_{j} = a_{j} * a_{i}$ , then this group is called an commutative or abelian group.

There are two ways of group’s write:

additive: G = {A, +}, where the identity element is presented as ‘0’ and the inverse element of a as -a (like in ‘traditional’ adding)
multiplicative: G = {A, $\ast$ }, where the identity element is presented as ‘1’, and the inverse element of a as $a ^{-1}$ (like in ‘traditional’ multiplication).

If element A which is not a identity element is selected from the set A, which will be used to the following operations: $a_{g} \ast a_{g} = a_{1}, a_{g} \ast a_{1} = a_{2}, a_{g} \ast a_{2} = a_{3}, ..., a_{g} \ast a_{n} = e$ , it creates a subset $H = \{e, a_{g}, a_{1}, a_{2}, a_{3}, ..., a_{n} \}$ ( $H\subseteq G$ – H is subset of G) which with $\ast$ operation is a subgroup that is called a cyclic group, and $a_g$ is a generator of this group. The set H can be also two-element $\{e, a_g\}$ , and exactly a set G (H = G). So the whole group G is a cyclic group, if there is at least one element $a_g \in G$ , from which any other element of the group can be created: $a_i = i \ast a_g$ (additive write) or $a_i = a_g^i$ (multiplicative write).
If a set of all points lying on a certain elliptical curve is taken and as a set EC and addition of two of these points will be marked as + operation (described in the previous section), then G = {EC, +} is an abelian group. A point g is also selected for creating a cyclic sub-group H = {ECH, +}.
And that’s all to show how elliptic curves work as a theory, however, this is not enough yet to show the way the EC is implemented.

If the structure $S=\{A, \ast_1, \ast_2\}$ meets the following requirements:

structure $S_1=\{A, \ast_1\}$ is an abelian group (commutative)
structure $S_2=\{A/ \{0\}, \ast_2\}$ (without a identity element of group $S_1$ ) is also an abelian group (commutative)
operation $\ast_2$ is distributive over $\ast_1$ : $(a_i \ast_1 a_j) \ast_2 a_k = (a_i \ast_2 a_k)\ast_1(a_j\ast_2 a_k)$ and $a_i \ast_2 (a_j \ast_1 a_k) = (a_i \ast_2 a_j)\ast_1(a_i\ast_2 a_k)$

this is the structure we call the field and, for simplicity, we write K = {A, +, $\ast$ }.

Let’s now look back at the EC chart from the previous section:

The graph consists of two X, Y coordinates on which the real numbers are located. Real numbers can be added and multiplied. If an algebraic structure consisting of an infinite set of real numbers, additions and multiplication is constructed, it will turn out that such a structure is a field. Thus, it can be seen that the EC lies on a dimensional space over field.

We know that the field of real numbers is not suitable for implementation on digital devices due to its infinite number of elements and the need for infinite accuracy of calculations. So what field can we use? It turns out that there are two types of finite fields that are ideally suited for the software and hardware implementation. In the nearest parts, we will deal with fields for software implementation.

Go to part three

Elliptic curves and how to use them (part one)

“Elliptic curves” is a dangerous name that suggests an extremely complicated mathematical creation. However, only with a bit of theory and without going into too much detail, you can understand them enough to use them and their interesting properties.

At the beginning, we will simplify the theory so that it resembles what we learned in high school. Let’s look at the following formula:
$y^2 = x^3 -2x + \frac{1}{2}$
it creates a quite interesting plot:

Thus, we can see that the elliptical curve is a set of points that meet the equation, which for real numbers looks like: $y^2 = x^3 +ax +b$ . In the above example, we assumed that a = -2 and b = $\frac{1}{2}$ , but we can not use here any numbers – they must meet the additional condition: $4a^3+27b^2 \neq 0$ .
Is this everything? Well, no. The points on the elliptical curve are characterized by the fact that they can be “added” as well as adding “normal” numbers (just like 1+2=3). Let’s take the example of two points P = $\left ( 0, \frac{\sqrt{2}}{2} \right )$ , Q = $\left ( 2, \frac{3\sqrt{2}}{2} \right )$ and let’s add them together to receive a R point. To do this, we must simply draw a straight line that crosses these two points. It turns out that such a line will cut exactly one more point $\left ( -\frac{3}{2}, -\frac{\sqrt{2}}{4} \right )$ , as in the figure below:

But wait, why -R? We want to find R! Let’s look at the reverse situation, when we are “adding” P and -R points to each other, we get a Q point back, instead of -Q (like 1+(-3) don’t equals 2, but -2). Well, in order to correctly “add” two points we must do second step: negate the received -R point. We only have to draw next line, crossing received point and parallel to the Y axis. This line will cut one more point, which is searched negation.
And what about the points lying on the X axis? From the plot it is clearly visible that such a line will not cut another point. Well, such points are their negation like -0 is also 0 (but note, they are not points alike zero, but more on that later).

The above plot clearly shows that for the points P, Q, R the rules of addition and subtraction are met by following the described two-step method: P+Q=R, R-Q=P, R-P=Q, -P-Q=-R etc.

And what about adding one point to itself (eg P+P)? The procedure looks similar, the only difference is the method of determining the first line (we do not have two points, only one). This line must cross this point and be tangent to the plot at this point. Why? Imagine that the selected point has a neighbor on the plot, which is located as close as possible to the selected point (mathematicians would use here the expression ‘+epsilon’, which is the smallest possible, but larger than zero). So we have two points and we can lead the line. Both points on the plot are very, very, very close to each other, so if we enlarge the plot in this place, it will look like the plot between these points is a straight line, so the first line of adding these points “contacts” the plot line.
The tangent line to the plot at the doubled point (P + P = 2P) will cross with one more point, which will be -2P.

And what about (again) the points that lie on the X axis. They also now have to be treated separately because the tangent lines to the plot at these points are parallel to the Y axis, so they will never cross with the plot at another point. If such a line never crosses the plot in another place, it means that it will cut with it at a point in infinity! Well, the elliptical curve has such a special point O, which “is in infinity”. This is the point that is treated with alike zero, so we will also get it by trying to add points P and -P to each other. It can not be marked on the plot, but you can add another point to it, and the result will be added point (O + P = P because the point O is alike zero).

Drawing plot intuitive, but impractical and inaccurate. Therefore, all the above actions on points of the elliptical curve can be written with formulas.

Addition of two points P+Q = R, where:

$P\neq Q$
$P\neq -Q$
$P=\left (x_{p}, y_{p}\right )$ , $Q=\left (x_{q}, y_{q}\right )$ , $R=\left (x_{r}, y_{r}\right )$

looks as follows:

$s=\frac{y_p-y_q}{x_p-x_q}$
$x_r=s ^2 - x_p - x_q$
$y_r=-y_p+s\left( x_p-x_r\right)$

Doubling the point P + P = 2P = T, where:

$P=\left(x_p,y_p\right)$ , $T=\left(x_t,y_t\right)$
$y_p \neq 0$
$y^2=x^3+ax+b$

looks as follows:

$s=\frac{{3x_{{p}}}^{2} + a}{2y_p}$
$x_t=s^2-2x_p$
$y_t=-y_p+s\left(x_p-x_t\right)$

In addition, special cases:
Doubling the point $P=\left(x_p,y_p\right)$ , where $y_p=0$ : P+P=O
Adding negation P+(- P)=O
Adding a point in infinity: P+O=P

This is the end of the high school level theory. The above formulas are easy for humans, but impractical for a computer because they use real numbers. Let’s look at the previously used number $\frac{\sqrt{2}}{2}$ , which is approximately 0.7071067811865476… The computer mainly uses fixed-point numbers but can simulate real numbers using floating-point numbers, which, however, can only be approximations of numbers that can have infinitely many decimal places (floating point type ‘float’ can store something like a dozen or type ‘double’ dozens of decimal places). However, such inaccuracy is unacceptable from the point of view of cryptography, therefore the above elliptical curves of real numbers are not used in IT.

The next part will show what curves and how they are used in digital cryptographic protocols.

Go to part two

Protocol properties

The protocol for establishing a secure connection should provide the following services:

confidentiality
authentication
undeniability
integrity
negotiation of the session key
perfect forward secrecy

Confidentiality

It is the protection of information transmitted in an public channel by preventing it from being known by an unauthorized person. This protection is realized by converting (encrypting) this information with the ‘key’ (that is, additional information known only to the persons authorized to read this information). If the ciphertext (encrypted information) gets into unauthorized person’s hands, then this person won’t be able to decrypt it (decoding of the ciphertext) because he does not have a key.
It must be assumed that ciphertexts transmitted in an public channel are ALWAYS eavesdropped.

A separate issue is the distribution of keys, which must be carried out in such a way that the key has only authorized persons. Generally, secure key distribution can look like this:

in symmetrical cryptography through another secure channel (for example a personal meeting, diplomatic mail, a channel of a trusted provider who will not be interested in ciphertext’s content)
in asymmetric cryptography keys are partially transmitted in an explicit channel in such a way that getting the full keys through the eavesdropper was “computationally difficult”
in quantum cryptography keys are sent directly through a public channel, eavesdropping of the key is immediately detected.

Information encryption is carried out using ciphers operating in a specific mode (ECB, CBC, counter mode, etc.). Being created protocol will probably use AES-Twofish-Serpent cascade working in counter mode (as in Truecrypt, but counter mode will be simplified, and thus more robust)

Authentication

This is proving own identity to other side. It is a protection against the impersonation for someone else. If you receive data from an public channel, you can never be sure of the real sender – you should assume that transmission in the public channel AT EVERY TIME can be changed or intercepted (for example in”Man in the middle” attack).

Authentication allows to “gives credence” to received data. It is usually realized by:

data signing – in asymmetric cryptography, data is signed by a private key, and the signature obtained is verified with a public key (but we must be sure that this public key belongs to the sender); you can also use the symmetric cryptography authentication function (hmac)
encrypting data in such a way that it can be decrypted only by the recipient (that is, the confidentiality service can theoretically also serve as authentication: in symmetric cryptography we must be sure that only sender and receiver are having the key; in asymmetric encryption we encrypt data with the public key of the recipient – only the recipient will be able to decrypt the ciphertext with his private key).

Distribution of public keys from the authentication’s point of view (making sure that the public key definitely belongs to a given person) may look like this:

transmission via another secure channel – à la PGP model (in this way Mr. Snowden wanted to communicate with journalists at the beginning of his activity)
authentication by a certificate issued for a given public key by a third party, trusted for both parties (authentication center – CA)
In the being created protocol, the PGP model will be used with ECDSA algorithm (the default curve will probably be the P-521 elliptic curve, but it may also be possible to use own curves over the Fp field)

undeniability

This is protection against others party’s denial or negation of active participation in done activities, which takes place in the following cases:

the other party wants to deceive us by saying that the negotiated connection is a manipulated older negotiation (this scenario is difficult to implement and not practical but not impossible)
the other party didn’t participate in negotiation, which was reproduced by a third party older negotiation (the attacker may for some reason forcibly forge a secure connection between us and him, although he will not be able to generate correct ciphertexts, he will be able to send a prepared “garbage” or fragments of eavesdropped earlier transmission)

Authentication alone may not be enough, and especially the latter case can be quite dangerous if the negotiation protocol is too simple.

A little fairy-tale example: Secret military base. At night, an unexpected convoy comes to its gates. They claim that it is “urgent” and base have to quickly contact the Ministry. The base is connected to the Ministry by a direct transmission channel with a defective secure protocol. The base commander calls the ministry, the secured connection is negotiated and link is created (terminal shows that the connection is “safe”). There are strange squeaks in the receiver, but before the base commander can ask for the third time “Hello?” he hears the voice of a familiar general “… this convoy, it is urgent! Open them immediately, this order! …” after which there are only squeaks. The base commander may think at the moment that this convoy is really something urgent, and the communication line with the ministry must be little broken – it is better to let the convoy and report a communication failure in the morning. Unfortunately, the commander of the base did not connect with the Ministry, but with a hostile attachment, who plugged themselves in a badly secured communication line somewhere in nearest forest. The General actually issued such an order in a conversation with the base, but it was half a year ago, and the commander didn’t recall this conversation in time.

How the negotiation protocol should be secured from similar cases:

a temporary session key generated for each new session (it is not possible to use a previously recorded other transmission)
using the unique challenge for each new session – in this case manipulation attempt will be detected already at the final stage of the negotiations.

The being created protocol will have above safeguards ensuring undeniability, because it should be assumed that in public channel the transmission can be AT ANY TIME intercepted and manipulated.

integrity

This is protection against data misrepresentation. It is not a cryptographic service. To done it sometimes a parity bit is enough, sometimes a checksum (for example, crc), sometimes a cryptographic hash function. However, these are only safeguards against unintentional changes arising during data transport or storage. Cryptographic protection against intentional change could be done by the signature of data with a private key or the hmac function – the attacker does not know the key, so he can not change the data in such a way that it will not be detected (in previous mechanisms he could recalculate the checksum or hash).

When sending data through an public channel, it must be assumed that data AT ANY TIME can be manipulated. In the being created protocol, data integrity against unintentional changes will be ensured by the lower layer transport protocols. However, in the negotiation phase indirect detection of intentional changes will be ensured (through the other cryptographic services mentioned here). As for the transmission phase, there will probably be no such protection – manipulations will be immediately detected by the user (incorrect ciphertext decryption). Entering cryptographic checksums before encryption would generate vulnerability (like for example, in the protocol for GSM call encryption – see A5 breaking history). The only thing that needs to be assured here is that ciphertext traffic cannot be repeated in the same session.

negotiation of the session key

To obtain a secure, encrypted connection, both parties must have a key of the same value. This can be done in the following ways:

the (symmetric) key has been previously agreed between the parties (another secure channel) and is used for each connection between these pages (pre-shared key)
initiator of a new connection generates randomly a session key, then sends it in an encrypted form – encrypting it with either pre-shared symmetric key or an asymmetric public key of remote party
sides using asymmetric cryptography send themselves “prekey” data to the calculation of a common secret (for example using the Diffie-Hellman or Menezes-Qu-Vanstone algorithms), based on which a key will be created by key derivation function (KDF). Negotiation in this case is protected by the fact that obtaining a common secret using only eavesdropping “semi-products” is difficult to calculate (you can do it, but you need such computing power that using currently available means it is impractical in time – for example, it needs 100 billion years).
The being created protocol will use the ECDH algorithm for elliptical curves over Fp field probably with the prf-hmac-sha3 KDF.

perfect forward secrecy

It can not be assumed that the cryptographic mechanisms are perfect. With each past year there are more and more attacks created on used algorithms and protocols. The available computing power is increasing all time. The algorithms used 50 years ago (in the ‘sixties’) today would easily be disarmed both with the brute force of an ordinary PC commuter and with the possibility of ‘modern’ cryptanalysis (which was not known at that time and their existence was not even suspected). This situation can be compared to the inventors of knight armor who did not foresee that it can not protect from machine gun bullets.
Therefore, in secure communication protocols it should be foreseen that some of the components used may be compromised. And as it must be assumed that communication in the public channel is ALWAYS recorded and remembered, so breaking part of it must have the least impact on the security of the communication history.

Let’s look at the case of encrypted transmission using the pre-shared key. Two pages using the same key for each session for a long time risk that the attacker, having enough time for the computing power, will eventually break the key (or simply overhear it or break to system and copy it). Having the key and all recorded transmissions, the attacker can easily decipher them and get to know content.

Perfect forward secrecy is a protocol property that prevents historical transmission from being revealed based on a future successful attack. It can be achieved by using session keys generated and distributed independently from other owned keys (whose role will be limited to supporting other services such as authentication). An attacker in this situation has two options to choose from:

attack the key possessed by one party (if successful, it can steal its identity, but will not have access to previously recorded connections content)
attack the session key of one of the connections (in the case of success, the connection content is revealed, but access to other connections content will be still impossible).

The designed protocol will provide perfect forward secrecy by generating a random temporary key pairs to compute a session key using the session key negotiation service.