×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Maximum file size when using FGETS
3

Maximum file size when using FGETS

Maximum file size when using FGETS

(OP)
Large CSV files is a topic that has cropped up before, I know, but I can't find a reference to this particular problem.

I'm writing a tool import address data from the UK PAF.CSV postcodes/addresses file. This is a CSV file containing (at present) 31,827,747 individual UK postal addresses. I'm opening the file with FOPEN and reading each line with FGETS.

To get around the maximum size limit for a DBF I'm writing these to a set of ten identical tables in blocks of 3,500,000 records which is working well. The problem I'm encountering is when it reaches line 25,214,532 the FGETS generates an error that the file is too big. This is when it's already 714,531 to table 8 which should still have room for another 2,785,468 records. Remember there's still 7,000,000 capacity in tables 9 and 10.

So, this appears to be a limitation of the FOPEN/FGETS functionality in VFP.

Short of using something to split the PAF.CSV file into two files and importing them sequentially, I'm open to suggestions.

Regards
Gary

RE: Maximum file size when using FGETS

The 2GB limit does not only apply to DBFs, FGETS also can only read up to 2GB files.

For longer files you could use API functions, ReadFile https://learn.microsoft.com/en-US/windows/win32/ap...
Or you can use Scripting.FileSystemObject with its OpenTextFile and Readline methods.

Chriss

RE: Maximum file size when using FGETS

(OP)
Ah, that probably explains it, then, Chris. The PAF.CSV file is roughly 2.5gb in size.

Thanks. I guess I'm going to have to split the CSV file and import each file sequentially.

Edit: I found a utility called Huge CSV Splitter that appears to have done the job.

Regards
Gary

RE: Maximum file size when using FGETS

No, you don't need to split the file, just use other code to read it, as said above.

Chriss

RE: Maximum file size when using FGETS

(OP)
That would be neater, I agree, Chris.

I'll take a look at doing it that way,

Regards
Gary

RE: Maximum file size when using FGETS

For using FileSystemObject, see thread184-1811358: Import or Append what else could work for a big comma-delimited file to be transferred into *.dbf ?

Quote (Mike Lewis)

CODE -->

loFS = CREATEOBJECT("Scripting.FileSystemObject")
loFile= loFS.OpenTextFile("c:\MyFile.txt", 1)
DO WHILE NOT loFile.AtEndOfStream
  lcLine = loFile.ReadLine()
  * code here to add the lines to an array or to 
  * write them out to multiple smaller files
ENDDO 	
loFile.Close() 

While you're at the single lines, you can also import them into your DBFs, just as you did with FGETS.

Chriss

RE: Maximum file size when using FGETS

(OP)
Thanks Chris, much appreciated.

Regards
Gary

RE: Maximum file size when using FGETS

If performance is an issue, you can also use the ReadFile API function I mentioned. It should work just as fast as VFPs FREAD/FGETS.

Chriss

RE: Maximum file size when using FGETS

what about FILETOSTR()? the 2GB limit applies for that too?

RE: Maximum file size when using FGETS

For a 32bit process, such as what VFP9.exe is or any EXE you build with it, the OS only allows 2GB of memory accessable, so you don't even have to try. That your PC has more RAM or more memory can be stored to pagefile.sys is of no importance. The moment you read in something into a string we only talk about the addressable RAM of the process limited to 2GB.

On 64bit Windows systems 32bit processes can have 4GB process memory if they are built large memory aware, which VFP9 does not do, so even on any current Windows, which are usually 64bit on PCs/Laptops with 8,16,32GB RAM or even more, VFP processes are limited to 2GB overall of addressable RAM.

Out of curiosity I just zipped up some iso files so the ZIP file becomes larger than 2GB and if you try loading it with FILETOSTR() you encounter what I would have predicted:
Error 43: There is not enough memory to complete this operation.

And if instead you try APPEND MEMO it should be clear by the 2GB dbf limit you get
Error 1190: File c:\users\chrissm\appdata\local\temp\0000ewi100iq.tmp is too large.
(for example).

If something of the VFP language would be able to read larger than 2GB files it would be FREAD or FGETS as it would be able to read it in smaller portions, but even that is something VFP does not support. It could, because the ReadFile Windows API function I now mentioned 3 times is a 32bit function that you can declare and call and get up to (well, theoretically) 2GB blocks of files that are as large as the filesystem used allows. More practically you read in single lines - in other scnearios maybe blocks of a few MB size - one by one.

Chriss

RE: Maximum file size when using FGETS

Quote (Chris Miller)


If something of the VFP language would be able to read larger than 2GB files it would be FREAD or FGETS as it would be able to read it in smaller portions, but even that is something VFP does not support. It could, because the ReadFile Windows API function I now mentioned 3 times is a 32bit function that you can declare and call and get up to (well, theoretically) 2GB blocks of files that are as large as the filesystem used allows. More practically you read in single lines - in other scnearios maybe blocks of a few MB size - one by one.

A few notes:
1) VFP internaly use "signed int" data type for low level file functions - this is a base limitation.

2) VFPA 10.1 a VFPA 10.2 has not this limitation.

3) WINAPI ReadFile() function can read data 0xFFFFFFFF length in one step only (it's possible in VFPA 10.2), but it's very slow because sending big string buffer from VFP to API is very slow (VFP allocate new buffer).
[affective buffer size for frequent calls API from VFP it's 0xFFFF bytes]

4) WINAPI ReadFile() function can read data from file greather 2GiB, because internal WINAPI structure for file position use two DWORD.



mJindrova

RE: Maximum file size when using FGETS

Gary et al

Please do note that the PAF ( Postcode Address File ) which you are working with, is the property of the Royal Mail here in the UK, and requires a licence to have a copy and to work with it !

Generally, one purchases a commercial package that would have such licensing by the authors and the relevant parts of the licence passed to their users.

Please do not leave yourself open to a legal challenge

From the Royal Mail

https://www.poweredbypaf.com/

Colin

RE: Maximum file size when using FGETS

1) It's not the reason VFP can't read from a >2GB file, as you say in 4) the position within a file is maintained outside of VFP, VFP only has a file handle and the filesystem manages information about the file (which file, file position and likely more) internally, you just need to refer to a file handle. So it would have been poassible to make use of that in the implementation of VFPs FREAD/FGETS functions, too. That does just require basing VFPs functions on the Windows API functions that enable large file access. That has nothing to do with VFP only supporting signed integers.

2) Yes, I'm only talking of VFP9, I don't think it's a solution to replace VFP9 with VFPA10.1/10.2 to have that limit removed, if the intent is to write a tool that can be used in VFP9 applications. Solving that with VFP10(A) you'd limit that solution to VFP10(A) developers, who don't need that, as they can simply use APPEND or IMPORT with its extended abilities anyway.

3) I said you could get up to 2GB blocks, I didn't say that's the way you should operate. Getting blocks of 256 Bytes or something of that magnitude isn't much of an overhead of shared memory needing to be copied within memory twice, the bottleneck still is the hdd access, all you need is FGETS like access, all you have is ReadFile() loading blocks, but you can easily turn that into line by line processing with ALINES of read in file blocks of sizes between 256B or perhaps 8kB, depends on record size.

I'm not seeing the big impact of a little slower than VFPs FREAD processing, also when using FileSystemObject, because the main pain is the 2GB file reading limit, and to solve that with file splits like the tool "Huge CSV Splitter" would mean reading the full file in, writing it out in smaller files and then reading these in, which means the base time factor here is 3, even worse as writing a file is usually slower than reading it. You can easily beat that, and prevent the need to split the file, even when ReadFile would only be half or 33% of FREADs performance.

Chriss

RE: Maximum file size when using FGETS

I just compared FSO FileStream with ReadFile on a 3GB zip file, both reading in blocks of 8KB, and Readfile is taking 20% of the time FSO needs.

To be able to compare that to native VFP function I then created a ~1GB file and read this with VFPs FREAD, FSO FieStream.Read() and Windows API ReadFile and then
VFPs FREAD processing is taking about the same time as WinAPI ReadFile() and FSO takes double the time. No idea why FSO times are so inconsistent.

I would redo the tests with same starting conditions for all measurements by restarting the computer between measurements, as there's a great likelyhood a test running after any other profits from hardware and filesystem/OS caching when reading the same file multiple times. But I'm already sure about FSO being the slowest candidate and timings of ReadFile vs VFPs FREAD being on an equal level. Because I read the same files multiple times so each test profits from the caching the same amount.

The only disadvantage of WinAPI is not giving you a method to read in a file in text mode line by line, you can only read in blocks. But there are easy ways to split blockswith ALINES, even enabling to split depending on how the CSV splits lines with CRLF, CR only or LF only. You just need to think about this lines splitting a bit to not process incomplete lines.



Chriss

RE: Maximum file size when using FGETS

2
Colin is perfectly right about the PAF being subject to licensing. But given that Gary already has access to it, I think we can assume it is legitimate. As far as I know, it is not available to be freely downloaded from the internet. If it is, it would almost certainly be a pirate copy.

That said, you do occasionally find copies of the outward postcodes available for download. That file is of course very much smaller. There used to be one on GitHub, for example, but appears to have been taken down - possibly because of the licensing issue.

Interestingly, when Fox Software launched FoxPro 2.0 (the first version that used Rushmore), they used the PAF to demonstrate the speed of index searches. They said that the PAF was the largest file they could find that would fit on a hard drive. That was in 1992.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: Maximum file size when using FGETS

(OP)
Yes, the customer for whom I'm developing the software has purchased a license for the PAF file. There is also a set of sample files (much smaller) that are freely available to download.

Any, thanks for all the replies and the interesting background. The software is finished and on-site.

Regards
Gary

RE: Maximum file size when using FGETS

Hi again

The company i work for has commercial applications for address management. (Simply Postcode)

for which we get a great set of tools, and an API / web service that I can use with VFP or C # depending on what section of the business needs I'm trying to cater for.

and most importantly - the PAF is not sent out on a regular basis, whereas our API is updated Daily and will have new addresses available within hours of being created
and the searches are awesome - by part of address, part of postcode, loads of others too.

Anyway - I am NOT here to sell someone's products for them, there are loads of alternatives.

What I am trying to say - and not very well - is that getting a copy of the PAF and then reading from it / searching for what your after / extracting addresses etc, is not necessarily the right way to go about things, in this case, it seems as if you are trying to reinvent the wheel !

Find out what the customer's needs are then propose a solution ! - in this case it already exists - and in quite a few different products all with different licensing and pricing models... this methodology does not just apply to the PAF

Colin

RE: Maximum file size when using FGETS

Colin, that's very good advice.

I would add that, if your software or licensing scheme allows, consider a vertical subset of the data file. For example, I once worked on an application that only needed the postcode and the corresponding grid refs from the PAF - not the postal address or any of the other geographic data. (This was for a process that calculated distances between places.)

I don't remember the details, but I think that, by only taking those two items, the data came well into the 2 GB limit.

This applies to any external data of course, not just the PAF.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: Maximum file size when using FGETS

Mike,

Thanks... Love your website and collection of articles - I often take a look if I need a brain break from being productive ! !

All the commercial packages that I am aware off, encrypt and compress the data and perform their searches on that, and so get around data size limits, that is the course of action that I as a professional would take, but approaches like that require proper funding, research and development and rather a lot of time. and that is why where appropriate I would use a ready made solution. I am sure that we have all done so many times... A simple example being the use of Active-X controls.

Gary - Please do not take any of my comments as any type of criticism. I am just pointing out that as in the old adage, there is more than one way to skin a cat. Also I am not suggesting that skinning cats is a good idea - Other small animals are available !

Colin

RE: Maximum file size when using FGETS

Quote (Gary Sutherland )

The software is finished and on-site.

Good. I assume now it was just a one time job to read in that CSV and since you found the Huge CSV splitter tool before diving into alternatives for reading large files you solved it that way. No need to solve something twice that's done, of course. I recognize the legal aspect discussion, thanks for that advice, Colin. Gary addressed that already, no need to follow up.

Technically, I can now recommend to use ReadFile. What's necessary to use that is first getting the hFile file handle from another API function: CreateFile.
https://learn.microsoft.com/en-us/windows/win32/ap...

Sounds implausable, but it should rather be named CreateFileHandle, among the parameters you can specify to only need read access and open exsiting files only, see details in the documentation:

Quote (CreateFile documentation)

[in] dwDesiredAccess

The requested access to the file or device, which can be summarized as read, write, both or 0 to indicate neither).

The most commonly used values are GENERIC_READ, GENERIC_WRITE, or both (GENERIC_READ | GENERIC_WRITE).

Quote (CreateFile documentation)

[in] dwCreationDisposition

An action to take on a file or device that exists or does not exist.
...

OPEN_EXISTING (3) - Opens a file or device, only if it exists. If the specified file or device does not exist, the function fails and the last-error code is set to ERROR_FILE_NOT_FOUND (2).

Finally you also need CloseHandle
https://learn.microsoft.com/en-us/windows/win32/ap...

And that makes it like the triple of FOPEN, FREAD and FCLOSE, just with API functions instead of native VFP functions. Available in Windows 2000 or later, so practically everywhere, even in XP scenarios.
To get something similar to FGETS, reading lines of a file you do that with ALINES of the blocks you read. Since you need to have some constants defined I'd implement that as a class (custom) for LargeFiles and round it up with WriteFile and further API functions of the family of handleapi, then define methods as the interface to VFP usage.

Chriss

RE: Maximum file size when using FGETS

(OP)
No problem, Colin. I'm constrained in part by the application I have to integrate this with.

Regards
Gary

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login


Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close