Fortran 90 base64 encoding of Reals

To store big amounts of reals encoding the IEEE754 bit pattern in big endian order to base64 is a good solution.

This encodes 6 bits (64 values) in one char (8 bits) using "A-Za-z0-9+/", which means that it is 1/3 larger than binary encoding. If done well it is fully portable, and doable in fortran without external libraries.

This code has the ability to encode and decode reals to/from base64 even across precision (i.e. decode single precision to double and double precision to single), and to handle infinity and NAN consistently.

In base64 an array of reals is represented like this:

<reals bit_mantissa=23 bit_exp=8 dims="10 6 8"> Y4jImSka/AEAAAAAAAAA4AAAAAAAAA9NAAAAAAAAQgD ... </reals>

Bit mantissa and exponent are stored so that the precision of the model is known and transfer between precisions is possible.

I wrote a small utility to convert to/from base64, so that you can get the data in/out easily. You can find it in cp2k/tools/base64.

It was much harder than I thought to make it work across the compilers, see CompilerWoes for a small description of the problems.

In the directory there is a fortran file for the tool (convertBase64.F90) and also some supporting files to test it, in particular the python script =testBase64.F90. If you have problems on some architectures/compilers try running it to pinpoint the problem.

I think that the code is close to optimal from the feature point of view.

Problems

Unfortunately in all the compilers that I came across formatted write/read is much less optimized than binary, and sting handling is not efficient. Base64 is faster than a formatted write, but slower than binary. If formatted write was an option, base64 is a clear winner, but if you need top performance writing to a local disk then it might not be for you.

Benchmark

Done with an older version, the actual version is 10-20% faster for base64. Reading is similar. Opteron writing to a local disk:

20x1e6 reals:
intel, binary: 0.17s total, 20x7.8MB
intel, base64: 6.54s total, 38% strcopy, 48% write, 20x10.4MB
intel formatted write: 37.7s total, 78% in sys calls (write whole array at once) 20x34MB

20x1e7 reals:
intel, binary 1.5s total, all in system calls, 20x78MB
intel, base64 71s total, 34% strcopy, 45% formatted write (530 characters at once) 20x104MB
intel, formatted write 364s total, 78% in systems calls (write whole array at once) 20x346MB

20x1e8 reals:
intel, binary 16.4s total , (31s elapsed), all in system calls 20x782MB
intel, base64 650s total, 47% formatted write, 32% strcopy , 20x 1047MB
intel formatted not done

-- FawziMohamed - 13 Jun 2006

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 13 Jun 2006 - FawziMohamed
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback