Memo's Island: c programming

Showing posts with label c programming. Show all posts

Wednesday, 3 April 2013

Reading Binary Data Files Written in C: Python - Numpy Case Study

One of the major components of scientific computation is producing results into a file. Usually bunch of numbers, usually structured, is written into so called data files. Writing binary files as outputs is practised commonly. It is the choice due to good speed and/or memory efficiency compare to plain ASCII files. However, a care on documenting endianness must be observed for good portability.

There are many popular languages to achieve the above task. For speed and efficiency reasons usually C or Fortran is used in writing out a binary file. Let's give an example of writing one integer (42) and three doubles (0.01, 1.01, 2.01) into binary file in C:

#include <stdio.h>
#include <stdlin.h>

int main() {
  FILE * myF;
  int i,j;
  double *numbers, kk;

  myF     = fopen("my.bindata", "wb") ;
  numbers = malloc(3*sizeof(double));
  i = 42;
  fwrite(&amp;i, sizeof(int), 1, myF);
  for(j=0; j<3; j++) {
    kk = (double)j+1e-2;
    numbers[j] = kk;
  }
  fwrite(numbers, sizeof(double), 3, myF);
  fclose(myF);
  return(0);
}

This code would produce a binary file called my.bindata. Our aim is to read this into Python so we can post-process the results i.e. visualisation or further data analysis. The core idea is to use higher language in processing the outputs directly instead of writing further C code; so to speak avoiding one more step in our work flow and avoiding cumbersome compilation of extra C code.

In order to read from files byte by byte, the standard library of Python provides a module called struct. Basically this module provides packing and unpacking of data into or from binary sources, in this case study our source is a file. However it is tedious and error prone to use this in a custom binary file where format would contain different types. Well at least needs an effort to read our custom binary file. At this point, our friend is Numpy facilities. Specially two functionality;
numpy.dtype and numpy.fromfile. The former provides an easy way of defining our file's format similar to Fortran syntax via creation of a data type object as its name stands. The later is a direct way of reading the binary file in one go that would return us a Python object that contains the all information present in the data file.
Here is the Numpy code that reads our binary file created by the above C code.

import numpy as np
dt      = np.dtype("i4, (3)f8")
myArray = np.fromfile('my.bindata', dtype=dt)
myArray[0] 
#(42, [0.01, 1.01, 2.01])
myArray[0][1] 
#array([ 0.01,  1.01,  2.01])
myArray[0][0] 
#42
myArray[0][1][1] 
#1.01

I have tested this case study on GNU/Linux PC, so the binary file is little-endian hence the writing and reading patterns. Ideally a generic wrapper around this Python code would help to simplify things.

Sunday, 18 September 2011

XML and command line parsing in C++/C

Dynamic languages like python, tcl or alike are taking over in driving many C/C++ codes I/O operations and option parsing due to quick and easy implementation, however it is still possible to use libraries/APIs for parsing options via command line (or XML ) quite efficiently in C/C++. Here is the list of some API/libraries:

Xerces is Apache's XML parser for C++.
GCCXML is gcc extention for XML parsing in C++.
Commons CLI is Apache's command line parsing API.
Argtable is ANSI C command line parser.
getop is gnu's commang line parser.
tclap is templatized C++ command line parser.

Monday, 27 September 2010

Reverse 32-bit Hexadecimal Value (with C)

A 32-bit hexadecimal value such as 0xABCD1234 may need to be reversed as 0x4321DCBA. This might be needed. The following is a naive implementation of reversing a hexadecimal. It would be interesting quiz for undgrad CS student to write a n-bit version of this.

signed int reverse_hex(signed int num) {
/* stupid reverse hex */
int rev= 0x00000000;
int digit= 0x00000000;
int mask1=0x0f000000;
int mask2=0x00f00000;
int mask3=0x000f0000;
int mask4=0x0000f000;
int mask5=0x00000f00;
int mask6=0x000000f0;
int mask7=0x0000000f;

digit=num << 28;
rev=num << 20;
rev=rev & mask1;
rev=digit+rev;
digit=(num << 12) & mask2;
rev=digit+rev;
digit=(num << 4) & mask3;
rev=digit+rev;
digit=(num >> 4) & mask4;
rev=digit+rev;
digit=(num >> 12) & mask5;
rev=digit+rev;
digit=(num >> 20) & mask6;
rev=digit+rev;
digit=(num >> 28) & mask7;
rev=digit+rev;
return(rev);
}

However, the above procedure is not so usual while it is reversed by chunks of 4-bit. More realistic situation is reversing from between big and little endian representation. Such as, 0xABCD1234 would be reversed as 0x3412CDAB, so byte ordering matters. The following is the C function doing this. Similarly n-byte version of
this function will be left as a further exercise.

signed int reverse_hex_byte(signed int num) {
/* stupid reverse hex */
int rev= 0x00000000;
int digit= 0x00000000;
int mask1=0x000000ff;
int mask2=0x0000ff00;
int mask3=0x00ff0000;
int mask4=0xff000000;

/* Move 1st byte */
digit=num >> 24;
rev=digit & mask1;
/* Move 2nd byte */
digit= num >>8;
digit=digit & mask2;
rev=digit+rev;
/* Move 3rd byte */
digit= num <<8 br="br"> digit=digit & mask3;
rev=digit+rev;
/* Move 4rd byte */
digit= num <<24 br="br"> digit=digit & mask4;
rev=digit+rev;
return(rev);
}

Wednesday, 20 February 2008

CPU Timing in a C code

Examine this simple example :



#include < stdio.h >
#include < stdlib.h >
#include < time.h >
#include < math.h >

int main();

int main() {
  long int i,a;
  clock_t cputime,cputime1;
  double timing,timing0;
   cputime= clock();
    for(i=0;i<10000000;i++){
       a=pow(4,2);
       a=pow(4,2);
       a=pow(4,2);
       a=pow(4,2);
       a=pow(4,2);
      }

   cputime1= clock();
   timing0=((double) cputime1-cputime);
   timing=((double) cputime1-cputime) / CLOCKS_PER_SEC;
   printf("timing =%12.9f clock per sec=%d timing0=%12.9f\n",
          timing,CLOCKS_PER_SEC,timing0);
   exit(0);
}

Memo's Island