Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

C++ on the mainframe

Status
Not open for further replies.

rhein99

Programmer
Oct 7, 2003
4
CA
Hello, I just joined.
Looking for a code example that will read a file into memory and then write it to an output file.

Input file is a fixed length file of 25 bytes.

How do I figure out the size of the input file and then allocate enough memory. Then read it into memory, then be able to access certain fields which I would then write to my output file.
I have a version of a C++ program that does not read the input file into memory and it takes 15 hours to run. Input file has 7 millions records.
 
>Input file is a fixed length file of 25 bytes.
>Input file has 7 millions records.

Somethings not right here.

>How do I figure out the size

Didn't you say it was 25 bytes?

>be able to access certain fields

Clarify please. You're somewhat vague.

/Per

if (typos) cout << &quot;My fingers are faster than my brain. Sorry for the typos.&quot;;
 
If is manadatory to load the 7.10**6 records into memory then you have to do:
-define a structure or a class for the record e.g. instances of these will hold the fields of the record
-use an array of this structure or the above class in which the records will be stored. Depending on your application I recommend to use a list of such objects.

-for each record read from the file :
-create a instance to store the record by parsing the record into fields
- add the instance in the array or in the list
-iterate the array (list) and filter the records and fields to be sent to output file.
To read a record of 25 bytes:

FILE *stream;
char buff[100];
if( (stream = fopen( &quot;myfile&quot;, &quot;rb&quot; )) != NULL )
{
fread( buff, sizeof( char ), 100, stream ); // buff contains the 25 bytes
fclose( stream );
}
}

-obislavu-
 
My input file has 25 bytes records and contains 7 million on them. A record is made up of 3 fields.

PolkID - 10 bytes
GeoCode - 6 bytes
UnitAmt - 9 bytes

Total of 25 bytes.

Can you show me how to create a class that would define these fields, how to figure out the size of memory I would need to hold the contents of this rather large file (25 bytes * 7 million) in some sort of array.
Then to access this large array and write them to an output file.

Thanks for you quick response.
 
Here is a simplest class definition and use:

//----Record.h ----
#ifndef _RECORD_
#define _RECORD_
#define LEN_ID 10
#define LEN_CODE 6
#define LEN_AMT 9
class Record
{
public:
Record(){};
Record(char* line);
char ID[LEN_ID + 1];
char Code[LEN_CODE + 1];
char Amt[LEN_AMT + 1];


};
#endif

//------------Record.cpp ----------------
#include &quot;record.h&quot;
Record::Record(char* line)
{
strncpy(this->ID,line,10);
this->ID[LEN_ID]='\0';
strncpy(this->Code,line+10,6);
this->Code[LEN_CODE]='\0';
strncpy(this->Amt,line+16,9);
this->Amt[LEN_AMT]='\0';
}
void main ()
{

FILE *stream;
char buff[100];
Record* arr[7000];
Record curr = NULL;
int count=0;
if( (stream = fopen( &quot;myfile.txt&quot;, &quot;rb&quot; )) != NULL )
{
while(fread( buff, sizeof( char ), 100, stream ))
{
curr = new Record(buff);
arr[count++]= curr;
}

fclose( stream );
}
else
Console::WriteLine(S&quot;Cannot open myfile.txt&quot;);
// Free memory here if allocated........................
return ;

}
The memory to store 7000000 records is 7000000*25/1024 = 175 MB
So, you have to forget to keep that in a predefined array like arr[7000000];
You should implement or use an existing list class to store the Record objects or when you have a Record object in memory , decide if it will be sent to output and not to keep in memory.

-obislavu-
 
Some additions:
Key point to effective file i/o - allocate large buffers. Call setvbuf after opening (before reading) file, for example:
setvbuf(stream,0,_IOFBF,25*1300); Your buffer size must be less than 32768. It's true key point! Default buffers are too small for huge disk files with small records. You can read 175 Mb file in memory for 1 - 10 seconds (few seconds average for local drives, PIV/1700 PC, Win2000).
If you have medium-scale PC with 512M core mem, no need to complicate matters unnecessarily. Declare simplest struct Record { char PolkID[10]; char ...[6]; char ..[9]; };
Structure is a class too (with public members; don't worry about OOP and rely upon default constructor/destructor;).
Allocate new Record[7000000], it's OK in VC6+ - and go on!
Of course, it's a general sketch only. Need more?..
 
Gentlemen, (I'm assuming your both male)

thank again for the quick responses.
arkM, yes, I would like to see more. I have included
my C++ program below that takes 15 hours to run.
A collegue a work wrote a C progam using the same
input file and it ran in 45 minutes.
Big difference when using file IO in memory.
Question: if you don't know the size of your input file,
how can you get the size of it and then use that
value as your array size? Here is the dog C++
program... try not to laugh to much....

// testMF2.cpp : Defines the entry point for the console application.
//

#include <iostream.h>
#include <stdlib.h>
#include <time.h>
#include <fstream.h>
#include <string.h>

int main()
{

ifstream OpenFile;
OpenFile.open(&quot;DD:VEHCNT&quot;);
ifstream OpenFil2;
OpenFil2.open(&quot;DD:CENSUS&quot;);
ofstream OpenFil3;
OpenFil3.open(&quot;DD:OVEHCNT&quot;,ios::eek:ut);

cout << &quot;FILE OPEN STATS&quot; << endl;

if (!OpenFile)
{
cout << &quot;Unable to open INPUT FILE 1\n&quot;;
return 1;
}
else
{
cout << &quot;INPUT FILE 2 opened&quot; << endl;
}

if (!OpenFil2)
{
cout << &quot;Unable to open the INPUT FILE 2\n&quot;;
return 1;
}
else
{
cout << &quot;INPUT FILE 2 opened&quot; << endl;
}

if (!OpenFil3)
{
cout << &quot;Unable to open the OUT FILE 1\n&quot;;
return 1;
}
else
{
cout << &quot;OUTPUT FILE 1 opened&quot; << endl;
}

char arr[27];
char arr2[5];
char polkID[11];
char geoCode[7];
char unitCnt[10];
char censusYr[5];

cout << &quot;this is a marker after array of 27&quot; << endl;
char char_delim;
long x = 0;
long counter = 0;
long counter2 = 0;
int size = 0;
while (!OpenFile.eof())
{
OpenFile.getline(arr,27,char_delim='\n');
counter = counter + 1;
if (OpenFile.eof())
{
memset(arr2,0,0);
memset(censusYr,0,sizeof(censusYr));
OpenFile.close();
OpenFil2.close();
OpenFil3.close();
cout << &quot;what is the number of records:&quot; << counter << endl;
cout << &quot;what is the number of record2:&quot; << counter2 << endl;
break;
}
if (counter2 < 1)
{
OpenFil2.getline(arr2,6,char_delim='\n');
counter2 = counter2 + 1;
}
strncpy(polkID, arr, 10);
strncpy(censusYr, arr2, 4);
strncpy(geoCode,&arrÕ10þ,6);
strncpy(unitCnt,&arrÕ16þ,9);
OpenFil3 << polkID << censusYr << geoCode << unitCnt << endl;

memset(arr,0,sizeof(arr));
x++;
if (x < 5)
{
cout << arr << endl; //displays the full record
cout << arr2 << endl; //displays the full record
cout << &quot;what is the size of the array &quot; << sizeof(arr) << endl;
cout << &quot;what is the size of the array2 &quot; << sizeof(arr2) << endl;
cout << &quot;what is the polk id number &quot; << polkID << endl;
cout << &quot;what is the census year &quot; << censusYr << endl;
}
if (x == 50000)
{
cout << &quot;records read &quot; << x << endl;
}
if (x == 100000)
{
cout << &quot;records read &quot; << x << endl;
}
if (x == 1000000)
{
cout << &quot;records read &quot; << x << endl;
}
if (x == 2000000)
{
cout << &quot;records read &quot; << x << endl;
}
if (x == 3000000)
{
cout << &quot;records read &quot; << x << endl;
}
if (x == 4000000)
{
cout << &quot;records read &quot; << x << endl;
}
if (x == 5000000)
{
cout << &quot;records read &quot; << x << endl;
}
if (x == 6000000)
{
cout << &quot;records read &quot; << x << endl;
}
if (x == 7000000)
{
cout << &quot;records read &quot; << x << endl;
}
memset(polkID,0,sizeof(polkID));
memset(geoCode,0,sizeof(geoCode));
memset(unitCnt,0,sizeof(unitCnt));

}

cout << &quot;what is the number of records:&quot; << counter << endl;
cout << &quot;what is the number of record2:&quot; << counter2 << endl;
OpenFile.close();
OpenFil2.close();
OpenFil3.close();

return 0;
}
 
Oppps, when I transferred the file from the mainframe to my PC some of the characters did not convert correctly.
Here is the code that extracts the fields to the output.

}
strncpy(polkID, arr, 10);
strncpy(censusYr, arr2, 4);
strncpy(geoCode,&arr[10],6);
strncpy(unitCnt,&arr[16],9);
OpenFil3 << polkID << censusYr << geoCode << unitCnt << endl;
 
If you think the knowing the size of the file will help you to cut the run time here is a way that should work on Mainframe . e.g use _fstat() function which gives you the _stat structure in which st_size member contains the size of the file in bytes.

Example:

....
#include <sys/stat.h>
....

void main( void )
{
struct _stat statistic ;
int fh, result;

fh = _open( &quot;myfile.txt&quot;, _O_RONLY ))

result = _fstat( fh, &statistic);

/* Check if statistics are valid: */
if( result != 0 )
printf( &quot;Bad file handle\n&quot; );
else
{
printf( &quot;File size : %ld\n&quot;, statistic .st_size );


}
_close( fh );
}


-obislavu-
 
rhein99, I never make fun of serious ground. I will study your code later but for the present I confined myself to some replicas only.
With huge i/o I prefer C-library calls via <cstdio>, not C++ streams. I think, in such case we have more precise control over buffers and other i/o performance aspects in most C++ implementations.
No standard library function to get file size (platform-specific topic or what else). Common way to get file size in standard manner is: FILE* f = fopen(fname,&quot;rb&quot;); fseek(f,0,SEEK_END); long fsz = ftell(f); rewind(f); /* read or close, add error handling... */.
What about &quot;file IO in memory&quot;? 10 seconds - common i/o time to read 7e6 your records if you have dedicated (on PC) disk channel. On mainframe virtual machine we may wait for 1 sec or days and nights...
 
Hi rhein99,
I am back and reading your program. As you programmed , you no need to load the files in memory. What you do is: read records from two files (one record from first, one record from the 2nd etc...) and send some fields to the output files. As ArkM said, you could improve your program (reduce the run time) by replacing the cout with fwrite() or another function in order to write a structure in one shot and do not call &quot;cout&quot; from each field. This will be a big improvment.
Next is to restructure your code in order to eliminate IF statemnts that are executed for each read of the input file.
See if (OpenFile.eof()) {....}. This &quot;if&quot; should be called only one time in the program.
if (counter2 < 1){...} . This &quot;if&quot; should be called one time in the program.
if (x< 5){...}....
if (x == 100000, 200000,300000){}. For example it could be replaced by only one &quot;if&quot; as follows:

if (x%100000 == 0)
{
cout << &quot;records read &quot; << x << endl;
}

You should see big cut of the run time.

-obislavu-
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top