You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

483 lines
21 KiB

  1. July 13, 2022:
  2. Fix a security bug and an issue.
  3. Point::steg_encode was leaving the 24 high bits of the buffer as zero.
  4. It also ignored the size parameter. The size parameter has now been removed,
  5. the zeros fixed and a test added to make sure that it is fixed.
  6. Per https://github.com/MystenLabs/ed25519-unsafe-libs, deprecate eddsa signing
  7. with separate pubkey and privkey input. Instead decaf_ed*_keypair_sign.
  8. Release v1.0.2.
  9. October 10, 2020:
  10. A paper by Konstantinos Chalkias, François Garillot, and Valeria
  11. Nikolaenko, to be found at:
  12. https://eprint.iacr.org/2020/1244.pdf
  13. discusses malleability in EdDSA implementations. Their test
  14. vectors reveal unintentional malleability in libdecaf's version
  15. of EdDSA verify, in violation of RFC 8032. With this malleability,
  16. an attacker could modify an existing valid signature to create a
  17. new signature that is still valid, but only for the same message.
  18. Release v1.0.1, correcting this flaw.
  19. July 12, 2018:
  20. Release 1.0 with Johan Pascal's build scripts.
  21. October 13, 2017:
  22. OK, back to preparations for 1.0, today with major changes.
  23. Another group (Isis Lovecruft and Henry de Valence) implemented
  24. Decaf for Ed25519, whereas this code was implemented for IsoEd25519.
  25. These curves are isogenous, but not exactly the same, so the
  26. encodings all came out differently.
  27. To harmonize these two so that there is only one implementation
  28. for Ed25519, we've hammered out a compromise implementation called
  29. Ristretto. This is different from the old Decaf encoding in two
  30. major ways:
  31. * It checks the sign of x on Ed25519 instead of IsoEd25519.
  32. * It considers a number "negative" if its low bit is set,
  33. instead of its high bit.
  34. To avoid extra branches in the code, Ed448Goldilocks is also
  35. getting these changes to match Ristretto.
  36. The C++ class is also renamed to Ristretto, but IsoEd25519 is a
  37. synonym for that class.
  38. We might need to check the high bit again instead of low bit if
  39. E-521 is ever implemented, but I'll special case it then.
  40. April 22, 2017:
  41. Remove STROBE in preparation for 1.0 release. STROBE has its own
  42. repo now at https://strobe.sourceforge.io. I might re-integrate
  43. it into Decaf once I have produced a version that matches the
  44. STROBE v1 spec, but it's just confusing to keep v0.2 in here.
  45. Change x{25519,448}_generate_key to _derive_public_key.
  46. January 15, 2016:
  47. Lots of changes since the last entry in HISTORY.TXT.
  48. Pushing eventually toward a 1.0 release, at least for the curves
  49. themselves (i.e. not for STROBE), still a fair amount of stuff to
  50. do.
  51. I have pretty much all the functions I want implemented, except
  52. that maybe there should be a compatibility mode for whatever CFRG
  53. decides the real life format should be.
  54. The library now supports multiple curves at once. A decaffeinated
  55. curve isogenous to Curve25519 is now supported, but not especially
  56. fast. This is all still a little rough around the edges. To make
  57. it work in a sane way, most of the headers are generated using
  58. Python templates. Probably those should be turned back into .h
  59. files for syntax hilighting purposes; the code generation system
  60. in general needs quite a tuneup.
  61. The plus side is that this reduces the source code size, especially
  62. for supporting many curves over many fields.
  63. Currently the code only kind of halfway works on ARM, and not as
  64. fast as it used to (on NEON anyway), by maybe 15-20%. I'm
  65. investigating why. It's about as fast as it used to be on x86,
  66. maybe a hair slower.
  67. Montgomery ladder is currently out. Putting it back in might help
  68. pin down the ARM NEON performance regression.
  69. The BAT is currently broken.
  70. Tracking at 55 TODO items, about half of which are important-ish.
  71. Source code size is currently 12.8k wc-lines, including tests and
  72. old fields (p480 and p521). I'm still trying to get that down, but
  73. with things like 600 lines of NEON f_impl.c, that's not an easy task.
  74. April 23, 2015:
  75. Removed the original Goldilocks code; Decaf now stands on its own.
  76. This cuts the source code approximately in half, to a still-large
  77. 13.7k wc-lines. (Most of these lines are in the arch-specific
  78. field implementations.)
  79. Note that the decaf_crypto routines are not intended to set
  80. standards. They should be secure, but they're intended more as
  81. examples of how the core ECC library could be used.
  82. The SHAKE stuff is also mostly an experiment, particularly the
  83. STROBE protocol/mode stuff. This is all fine, because the ECC
  84. library itself is the core, and doesn't require the SHAKE stuff.
  85. (Except for the C++ header, which should probably also be factored
  86. so that it doesn't need the SHAKE stuff.)
  87. I've started work on making a Decaf BAT, but not done yet.
  88. I haven't ripped out all old multi-field code, because I intend
  89. to add support for other fields eventually. Maybe properly this
  90. time, instead of with a million compile flags like the original.
  91. March 23, 2015:
  92. I've been fleshing out Decaf, and hopefully the API is somewhere
  93. near final. I will probably move a few things around and add a
  94. scalar inversion command (for AugPAKE and such).
  95. I've built a "decaf_fast" implementation which is about as fast as
  96. Goldilocks, except that verification still isn't as fast, because
  97. it needs a precomputed wNAF table which I haven't implemented yet.
  98. Precomputation is noticeably faster than in Goldilocks; while
  99. neither is especially optimized, the extended point format works
  100. slightly better for that purpose.
  101. While optimizing decaf_fast I also found a minor perf problem in
  102. the constant time lookup code, so that's fixed (I hope?) and
  103. everything is faster at least on my test machine.
  104. At some point soon-ish, I'd like to start removing the base
  105. Goldilocks code from this branch. That will require porting more
  106. of the tests. I might make a C++ header for Decaf, which would
  107. definitely simplify testing.
  108. March 1, 2015:
  109. While by no means complete or stable, I've done most of the ground
  110. work to implement the "Decaf" point encoding. This point encoding
  111. essentially divides the cofactor by 4, turning Goldilocks (or
  112. Ridinghood or E-521) into a prime-order group. Furthermore, like
  113. the Goldilocks encoding, this encoding avoids incompleteness in
  114. the twisted Edwards formulas with a=-1 by sticking to the order-2q
  115. subgroup.
  116. Because the group is prime order, and because the "isogeny strategy"
  117. is not needed, the decaf API can be very simple. I'm still working
  118. on exactly what it should be though. The goal is to have a single-
  119. file (or a few files) for a "ref" version, which is designed for
  120. auditability. The ref version won't be quite so simple as TweetNaCl,
  121. but nearly so simple and much better commented. Then there can also
  122. be an optimized version, perhaps per-platform, which is as fast as
  123. the original Goldilocks code but hopefully still simpler.
  124. I'm experimenting with SHAKE as the hash function here. Possibly I
  125. will also add Keyak as an encryption primitive, so that everything
  126. can be based on Keccak-f, but I'm open to suggestions. For example,
  127. if there's a way to make BLAKE2 as simple and useful as SHAKE
  128. (including in oversized curves like E-521), then the extra speed
  129. would certainly be welcome.
  130. October 27, 2014:
  131. Added more support for >512-bit primes. Changed shared secret
  132. to not overflow the buffer in this case. Changed hashing to
  133. SHA512-PRNG; this doesn't change the behavior in the case that
  134. only one block is required.
  135. E-521 appears to be working. Needs more testing, and maybe some
  136. careful analysis since the carry-handling bounds are awfully tight
  137. under the current analysis (the "< 5<<57" that it #if0 asserts is
  138. not actually tight enough under the current analysis; need
  139. it to be < (1+epsilon) << 59).
  140. So you actually do need to reduce often, at least in the x86_64_r12
  141. version.
  142. p521/arch_ref64: simple and relatively slow impl. Like
  143. p448/arch_ref64, this arch reduces after every add or sub.
  144. p521/arch_x86_64_r12: aggressive, fast implementation. This impl
  145. stores 521 bits not in 9 limbs, but 12! Limbs 3,7,11 are 0, and
  146. are there only for vector alignment. (TODO: remove those limbs
  147. from precomputed tables, so that we don't have to look them up!).
  148. The carry handling on this build is very tight, and probably could
  149. stand more analysis. This is why I have the careful balancing of
  150. "hexad" and "nonad" multiplies in its Chung-Hasan mul routine.
  151. (TODO: reconsider whether this is even worthwhile on machines
  152. without AVX2.)
  153. The 'r12 build is a work in progress, and currently only works on
  154. clang (because it rearranges vectors in the timesW function).
  155. Timings for the fast, aggressive arch on Haswell:
  156. mul: 146cy
  157. sqr: 111cy
  158. invert: 62kcy
  159. keygen: 270kcy
  160. ecdh: 803kcy
  161. sign: 283kcy
  162. verif: 907kcy
  163. Same rules as other Goldi benchmarks. Turbo off, HT off,
  164. timing-channel protected (no dataflow from secrets to branches,
  165. memory lookups or known vt instructions), compressed points.
  166. October 23, 2014:
  167. Pushing through changes for curve flexibility. First up is
  168. Ed480-Ridinghood, because it has the same number of words. Next
  169. is E-521.
  170. Experimental support for Ed480-Ridinghood. To use, compile with
  171. make ... FIELD=p480 -XCFLAGS=-DGOLDI_FIELD_BITS=480
  172. I still need to figure out what to do about the fact that the library
  173. is called "goldilocks", but in will soon support curves that are not
  174. ed448-goldilocks, at least experimentally.
  175. Currently the whole system's header "goldilocks.h" doesn't have
  176. a simpler way to override field size, but it does work (as a hack)
  177. with -DGOLDI_FIELD_BITS=...
  178. There is no support yet for coexistence of multiple fields in one
  179. library. The field routines will have unique names, but scalarmul*
  180. won't, and the top-level goldilocks routines have fixed names.
  181. Current timings on Haswell:
  182. Goldilocks: 178kcy keygen, 536kcy ecdh
  183. Ridinghood: 193kcy keygen, 617kcy ecdh
  184. Note that Ridinghood ECDH does worse than 480/448. This is at least
  185. in part because I haven't calculated the overflow handling limits yet
  186. in ec_point.h (this is a disadvantage of dropping the automated
  187. tool for generating that file). So I'm reducing much more often
  188. than I need to. (There's a really loud TODO in ec_point.h for that.)
  189. Also, I haven't tested the limits on these reductions in a while, so
  190. it could be that there are actual (security-critical) bugs in this
  191. area, at least for p448. Now that there's field flexibility, it's
  192. probably a good idea to make a field impl with extra words to check
  193. this.
  194. Furthermore, field_mulw_scc will perform differently on these two
  195. curves based on whether the curve constant is positive or negative.
  196. I should probably go optimize the "hot" routines like montgomery_step
  197. to have separate cases for positive and negative.
  198. September 29, 2014:
  199. Yesterday I put in some more architecture detection, but it should
  200. really be based on the arch directory, because what's in there really
  201. was a terrible hack. So I've changed it to use $arch/arch_config.h
  202. to get WORD_BITS.
  203. I've tweaked the eBAT construction code to rename the architectures
  204. using test/batarch.map. Maybe I should also rename them internally,
  205. but not yet.
  206. I added some new TODO.txt items. Some folks have been asking for a
  207. more factored library, instead of this combined arithmetic, curve code,
  208. encodings and protocol all-in-one jumble. Likewise the hash and RNG
  209. should be flexible.
  210. I've also been meaning to put more work in on SPAKE2EE, which would
  211. also mean finalizing the Elligator code.
  212. September 18, 2014:
  213. Begin work on a "ref" implementation. Currently this is just the
  214. arch_ref64 architecture. The ref implementation always weak_reduces
  215. after arithmetic, and doesn't use vectors or other hackery. Currently
  216. it still must declare field elements as vector aligned, though,
  217. other code outside the arch directory can be vectorized.
  218. Change goldilocks.c to use field_eq instead of calling deep into field
  219. apis.
  220. September 6, 2014:
  221. Pull in minor changes from David Leon Gil and Nicholas Wilson, with
  222. some adjustments. I hope the adjustments don't break their compiles.
  223. `make bat` now makes a bat which passes supercop-fastbuild, though
  224. the benchmarks are rather different from `make bench`. I need to track
  225. down why.
  226. August 4, 2014:
  227. Experiments and bug fixes.
  228. Add really_memset = memset_s (except not because I'm setting -std=c99),
  229. thanks David Leon Gil. I think I put it in the right places.
  230. Try to work around what I think is a compiler bug in GCC -O3 on non-AVX
  231. platforms. I can't seem to work around it as -Os, so I'm just flagging
  232. a warning (-Werror makes it an error) for now. Will take more
  233. investigation. Thanks Samuel Neves.
  234. Added an experimental (not ready yet!) ARM NEON implementation in
  235. arch_neon_experimental. This implementation seems to work, but needs
  236. more testing. It is currently asm-heavy and not GCC clean. I am
  237. planning to have a flag for it to use intrinsics instead of asm;
  238. currently the intrinsics are commented out. On clang this does ECDH
  239. in 1850kcy on my BeagleBone Black, comparable to Curve41417. Once this
  240. is ready, I will probably move it to arch_neon proper, since arch_neon
  241. isn't particularly tuned.
  242. July 11, 2014:
  243. This is mostly a cleanup release.
  244. Added CRANDOM_MIGHT_IS_MUST config flag (default: 1). When set, this
  245. causes crandom to assume that all features in the target arch will
  246. be available, instead of detecting them. This makes sense because
  247. the rest of the Goldilocks code is not (yet?) able to detect features.
  248. Also, I'd like to submit this to SUPERCOP eventually, and SUPERCOP won't
  249. pass -DMUST_HAVE_XXX on the command line the way the Makefile here did.
  250. Flag EXPERIMENT_CRANDOM_BUFFER_CUTOFF_BYTES to disable the crandom
  251. output buffer. This buffer improves performance (very marginally at
  252. Goldilocks sizes), but can cause problems with forking and VM
  253. snapshotting. By default, the buffer is now disabled.
  254. I've slightly tweaked the Elligator implementation (which is still
  255. unused) to make it easier to invert. This makes anything using Elligator
  256. (i.e. nothing) incompatible with previous releases.
  257. I've been factoring "magic" constants such as curve orders, window sizes,
  258. etc into a few headers, to reduce the effort to port the code to other
  259. primes, curves, etc. For example, I could test the Microsoft curves, and
  260. something like:
  261. x^2 + y^2 = 1 +- 5382[45] x^2 y^2 mod 2^480-2^240-1
  262. ("Goldeneye"? "Ridinghood"?) might be a reasonable thing to try for
  263. 64-bit CPUs.
  264. In a similar vein, most of the internal code has been changed to say
  265. "field" instead of p448, so that a future version of magic.h can decide
  266. which field header to include.
  267. You can now `make bat` to create an eBAT in build/ed448-goldilocks. This
  268. is only minimally tested, though, because SUPERCOP doesn't work on my
  269. machine and I'm too lazy to reverse engineer it. It sets a new macro,
  270. SUPERCOP_WONT_LET_ME_OPEN_FILES, which causes goldilocks_init() to fall
  271. back to something horribly insecure if crandom_init_from_file raises
  272. EMFILE.
  273. Slightly improved documentation.
  274. Removed some old commented-out code; restored the /* C-style */ comment
  275. discipline.
  276. The AMD-64 version should now be GCC clean, at least for reasonably
  277. recent GCC (tested on OS X.9.3, Haswell, gcc-4.9).
  278. History no longer says "2104".
  279. May 3, 2014:
  280. Minor changes to internal routines mean that this version is not
  281. compatible with the previous one.
  282. Added ARM NEON code.
  283. Added the ability to precompute multiples of a partner's public key. This
  284. takes slightly longer than a signature verification, but reduces future
  285. verifications with the precomputed key by ~63% and ECDH by ~70%.
  286. goldilocks_precompute_public_key
  287. goldilocks_destroy_precomputed_public_key
  288. goldilocks_verify_precomputed
  289. goldilocks_shared_secret_precomputed
  290. The precomputation feature are is protected by a macro
  291. GOLDI_IMPLEMENT_PRECOMPUTED_KEYS
  292. which can be #defined to 0 to compile these functions out. Unlike most
  293. of Goldilocks' functions, goldilocks_precompute_public_key uses malloc()
  294. (and goldilocks_destroy_precomputed_public_key uses free()).
  295. Changed private keys to be derived from just the symmetric part. This
  296. means that you can compress them to 32 bytes for cold storage, or derive
  297. keypairs from crypto secrets from other systems.
  298. goldilocks_derive_private_key
  299. goldilocks_underive_private_key
  300. goldilocks_private_to_public
  301. Fixed a number of bugs related to vector alignment on Sandy Bridge, which
  302. has AVX but uses SSE2 alignment (because it doesn't have AVX2). Maybe I
  303. should just switch it to use AVX2 alignment?
  304. Beginning to factor out curve-specific magic, so as to build other curves
  305. with the Goldilocks framework. That would enable fair tests against eg
  306. E-521, Ed25519 etc. Still would be a lot of work.
  307. More thorough testing of arithmetic. Now uses GMP for testing framework,
  308. but not in the actual library.
  309. Added some high-level tests for the whole library, including some (bs)
  310. negative testing. Obviously, effective negative testing is a very difficult
  311. proposition in a crypto library.
  312. March 29, 2014:
  313. Added a test directory with various tests. Currently testing SHA512 Monte
  314. Carlo, compatibility of the different scalarmul functions, and some
  315. identities on EC point ops. Began moving these tests out of benchmarker.
  316. Added scan-build support.
  317. Improved some internal interfaces. Made a structure for Barrett primes
  318. instead of passing parameters individually. Moved some field operations
  319. to places that make more sense, eg Barrett serialize and deserialize. The
  320. deserialize operation now checks that its argument is in [0,q).
  321. Added more documentation.
  322. Changed the names of a bunch of functions. Still not entirely consistent,
  323. but getting more so.
  324. Some minor speed improvements. For example, multiply is now a couple cycles
  325. faster.
  326. Added a hackish attempt at thread-safety and initialization sanity checking
  327. in the Goldilocks top-level routines.
  328. Fixed some vector alignment bugs. Compiling with -O0 should now work.
  329. Slightly simplified recode_wnaf.
  330. Add a config.h file for future configuration. EXPERIMENT flags moved here.
  331. I've decided against major changes to SHA512 for the moment. They add speed
  332. but also significantly bloat the code, which is going to hurt L1 cache
  333. performance. Perhaps we should link to OpenSSL if a faster SHA512 is desired.
  334. Reorganize the source tree into src, test; factor arch stuff into src/arch_*.
  335. Make most of the code 32-bit clean. There's now a 32-bit generic and 32-bit
  336. vectorless ARM version. No NEON version yet because I don't have a test
  337. machine (could use my phone in a pinch I guess?). The 32-bit version still
  338. isn't heavily optimized, but on ARM it's using a nicely reworked signed/phi-adic
  339. multiplier. The squaring is also based on this, but could really stand some
  340. improvement.
  341. When passed an even exponent (or extra doubles), the Montgomery ladder should
  342. now be accept points if and only if they lie on the curve. This needs
  343. additional testing, but it passes the zero bit exponent test.
  344. On 32-bit, use 8x4x14 instead of 5x5x18 table organization. Probably there's
  345. a better heuristic.
  346. March 5, 2014:
  347. First revision.
  348. Private keys are now longer. They now store a copy of the public key, and
  349. a secret symmetric key for signing purposes.
  350. Signatures are now supported, though like everything else in this library,
  351. their format is not stable. They use a deterministic Schnorr mode,
  352. similar to EdDSA. Precomputed low-latency signing is not supported (yet?).
  353. The hash function is SHA-512.
  354. The deterministic hashing mode needs to be changed to HMAC (TODO!). It's
  355. currently envelope-MAC.
  356. Probably in the future there will be a distinction between ECDH key and
  357. signing keys (and possibly also MQV keys etc).
  358. Began renaming internal functions. Removing p448_ prefixes from EC point
  359. operations. Trying to put the verb first. For example,
  360. "p448_isogeny_un_to_tw" is now called "twist_and_double".
  361. Began documenting with Doxygen. Use "make doc" to make a very incomplete
  362. documentation directory.
  363. There have been many other internal changes.
  364. Feb 21, 2014:
  365. Initial import and benchmarking scripts.
  366. Keygen and ECDH are implemented, but there's no hash function.