×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Metering Data Volume

Metering Data Volume

Metering Data Volume

(OP)
Hi,

Any ideas on best approach on collecting as much data through put and/or meter it as a user uses the system. The system is a document management system where the data and images are stored in SQL remotely. Our new billing model will based on a bucket system, where the user purchases a bucket of X size to be consumed within a defined period. Therefore, I need to journal outgoing query sizes and their returned result sizes as well as image sizes that are actually returned.

Also wondering if there is some sort of control/software that I can place in the app where all data flowing in and out of the app will be passing thru it, and of-course be able to communicate with it.

This topic is more about discussing pros and cons of how to do this, so please, if you see an issue, lets talk about it...

Thanks,
Stanley

RE: Metering Data Volume

(OP)
Also, how do I get the name of the underlying table like (_4JW0IPKO6.TMP) that was created from a command like this...
SELECT * FROM users INTO cursor 'curUsers', as I can then get the size of the .tmp file.

Or, is there an easy way to get the size of the in-memory cursor?

Thanks,
Stanley

RE: Metering Data Volume

I'd say if you use a cloud provider you get traffic information in a log file and so all you'd need to find out is where that log is. Drive/storage usage should also be no problem.

In regard of cursor size, well: recsize()*reccount() gives you the size of all constant width fields if you have memo/general/blob fields you need to iterate them.

The file name of a cursor simply is dbf(alias), so in this case DBF('curUsers'). You'll notice a file with a very similar name, one off in the last letter. This is related, but not the cursor itself.

Look into %TEMP% with the order set to descending date/time before you star VFP. Notice when you start VFP you get two TMP files, that's a DBC, it'll store all DBC features of cursors, like default values or long field names. Cursors are tables of this DBC. But the cursors themselves stay in RAM. The next difficulty, even if that files would measure cursor size, VFP has these files in exclusive access internally, so you get no second handle to act on them.

If you configure a DSN you can specify a log file for long running queries but also a log file for performance monitoring data, which includes columns for bytes sent and read. this log file is a text file and can, of course, be manipulated by your customer, but it would have the information you need.

Bye, Olaf.




Olaf Doschke Software Engineering
https://www.doschke.name

RE: Metering Data Volume

(OP)
Hi mgagnon,

Yes, that is what I thought and tried, but is not working for me... It is showing the name of the source table and NOT the cursor...

SELECT * FROM users INTO CURSOR curTest
?DBF('curTest') reports "d:\folder1\data\users.dbf"

Thanks,
Stanley

VFP9sp2

RE: Metering Data Volume

(OP)


RE: Metering Data Volume

That's happening when the cursor is just a filter to the original DBF.

Instead do:

CODE

SELECT * FROM users INTO CURSOR crsTest NOFILTER
? DBF('crsTest') 

Besides, FSIZE is field size in the default settings of VFP, you need to SET COMPATIBLE ON (an unfortunate setting you should keep off) to let it get you file size. It won't find the TMP file, as it doesn't get written to disc.

It will never happen to a cursor you fetch from a remote server via SQLExec() or remote view or cursoradapter.

Bye, Olaf.

Olaf Doschke Software Engineering
https://www.doschke.name

RE: Metering Data Volume

Stanly,

I get on

CODE --> vfp

SELECT * FROM naw INTO CURSOR tmp
? DBF("tmp") 

C:\users\koenp\appdata\local\00000QLFT001N.TMP


Apart from that, why do you need the internal name? You can always address a cursor with its given alias, in this case "tmp" (Donot get confused, the into cursor tmp here tmp is the alias.

So maybe the example

CODE --> vfp

SELECT * FROM naw INTO CURSOR myCursor
? DBF("myCursor") 
is not so confused since here the alias is myCursor and .TMP the extension of the cursor.

Regards,

koen

RE: Metering Data Volume

Koen, the idea is to get file size, but try FILE(DBF("myCursor")), this will be .F., so Stanlyn doesn't have that easy way to determine cursor byte size. And recsize()*reccount() will not reflect the size of memo fields (and other types stored in the secondary fpt file you also won't find on hdd).

you could alsways do

CODE

SELECT alias
COPY TO mydata TYPE DBF WITH CDX 

And then determine file size, but you do what VFP avoids just to get the file size, you write the cursor data to hdd.

Bye, Olaf.

Olaf Doschke Software Engineering
https://www.doschke.name

RE: Metering Data Volume

(OP)
Hi,

Olaf, adding the nofilter points to a non-existent file. The tmp file it points to is NOT on the disk (see screenshot). Explorer is sorted by name. You can also the command window commands I've been running along with the results that ? prints. Note that file() is also reporting the .tmp file is not found.

Stanley



A better image can be found here as engineering.com is changing the res to much...
http://stanlyn.com/public/screenshot1.png

RE: Metering Data Volume

The file reported by DBF() is not necessary a "real" file.

RE: Metering Data Volume

(OP)
Then, if that cannot be trusted, then copying it to a dbf looks like one way. Is there any other way, akin to what flush does with updated records still not written to disk?

RE: Metering Data Volume

(OP)
The strange behavior of copying curTest using "copy to mydata foxplus" destroys its source cursor "curTest". Why is that, since it is only copying and not moving?

If it has to close it before it can be copied, then isn't the handle gone therefore stopping a copy? It appears to just rename it without anything going to disk as I see no references to either curTest, the ***.tmp from curTest as reported by dbf(), or the "copy to" table or cursor?

I was hoping to avoid copying the structure and populate it by iterating, which may be the only way.

RE: Metering Data Volume

I might have missed something here (I haven't been following this thread in detail). But, instead of SELECTing into a cursor, why don't you SELECT into a physical table? In other words, INTO TABLE XXX rather than INTO CURSOR xxx. There would be no effect in performance, and you will be sure that the physical table will remain in existence until you explicitly delete it.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: Metering Data Volume

(OP)
Thanks Mike, that should work, will try tomorrow...

RE: Metering Data Volume

I don't know why you COPY TO FOXPOLUS when I suggested TYPE DBF.

Let me test. Ok, I see TYPE DBF doesn't work, there is no such type, and the reason is that's the default when you do COPY TO mydata you get a mydata.DBF and FPT (when necessary). The same works with TYPE FOXPLUS. From the resulting DBF/DBT files the DBT is larger than VFPs FPT in a case I tested, but creating the files doesn't change any workarea. In both cases, the cursor remains open. I don't know what you're doing, stanlyn, closing the form and datasession? Do you have a USE afterward?

And I only second Mike in the aspect of getting a real DBF right away, but again said, a cursor has a performance advantage as it is in memory, and as you say you get data from a remote database, there is no option of SQLEXEC() to generate a DBF file instead of a DBF file. Edit: I meant to say: ...there is no option of SQLEXEC to generate a DBF file instead of a cursor, of course.

If you go for what SQLEXEC or a cursoradapter or a remote view give you as a cursor, one other very general way would be determining SYS(1016) before and after the query.

Bye, Olaf.

Olaf Doschke Software Engineering
https://www.doschke.name

RE: Metering Data Volume

(OP)

CODE -->

SELECT alias
COPY TO mydata TYPE DBF WITH CDX 

Tried that, does not work, no type of dbf, however foxplus did create the cursor and when copying it using the copy to command, I get a zero size set of files. You have to close the table before it gets written to disk. Still, fsize() would not work either, so I did what I've done in the past and it works, but maybe a lot more overhead than I wanted. Here it is...

CODE -->

Clear
Set Safety Off

Set Procedure To d:\stanlyn\Server\programs\masterprocs.prg
Set Libr To d:\stanlyn\Server\misc\Phdbase.v50

If !Used('documents')
	Use 'documents' In 0
Endif

Select * From documents Into Table 'temp' WHERE RECNO() < 100
Use In Select('temp')

For x=1 To 2
	If x=1
		ext='dbf'
	Else
		ext='fpt'
	Endif

	lcFile = 'temp.' + ext

	aa = Adir(MyArray, lcFile)

	If aa > 0
		If x = 1
			lnDbfSize = MyArray(1,2)
			? lnDbfSize
		Else
			lnFptSize = MyArray(1,2)
			? lnFptSize 
		Endif
	Endif

	Erase lcFile
NEXT

?lnDbfSize + lnFptSize 

Set Safety On 


Anyone know a way to do this completely in memory? SSDs do help, but an in-memory solution would be much faster, unless we use the nvme u2 drives, but thats a different story..

Thanks, Stanley

RE: Metering Data Volume

Back in the "old days" we used what we called a RAM Disk, a virtual drive in memory. Google RAMDISK and you will find many solutions.

RE: Metering Data Volume

Quote (stanlyn)

using the copy to command, I get a zero size set of files

I didn't, I got the files with size, but I made the cursor with a NOFILTER clause, which is important also to get the data separate and not just a filter.

Please simply do this, it queries all of browser.dbf, a table in Home() and then copies that to hdd:

CODE

LOCAL lcPathAndFileStem
lcPathAndFileStem = ADDBS(GETENV("TEMP"))+sys(2015)
Select * from browser into Cursor crsTemp Nofilter
* SELECT crsTemp && not necessary, as the newly generated cursor automatically is selected
Copy To (lcPathAndFileStem)
Adir(laDBFSize,lcPathAndFileStem+".*")
? laDBFSize[1,1] , laDBFSize[1,2]
? laDBFSize[2,1] , laDBFSize[2,2]
? laDBFSize[1,2] + laDBFSize[2,2] 

A RAM disc would be the least overhead you can have, keep the time short and keep things in memory, it's unlikely your cursors get to the 2GB limit, but you need to reserve enough memory for the RAM disc to be able to copy out any cursor for measuring its size.

Another idea would be using INTO ARRAY to measure data size:

CODE

Select * from browser into Cursor crsTemp Nofilter
Select * from crsTemp into array laSize
 
Create Cursor crsArrayStore (mArray M)
Append Blank 
Save To Memo mArray ALL LIKE laSize
? Len(mArray) 

The data SAVE creates is obviously much larger than the original cursor itself is, because the array name is repeated for every array element, so the name "LASIZE" appears reccount()*fcount() times, followed by a letter for the element type and then the value itself and a lot of spaces are used as separators of all this. It still would be a measure for the size of the cursor, you could account for the size of metadata.

And measuring the file sizes also measures unfair as it adds a header size not contributing to the data traffic but generated locally.

If I were you I wouldn't use any of this as it just means doing copies of what you have just to measure its size, you're degrading the performance of your application and it's still yet not acting on buffered data. You should go for the traffic measurements you can get from your web hoster/cloud provider and split your overall traffic into percentages you calculate by means of rough estimates like number of queries or reccount()*averagerecsize.

As this seems to be about document management the essential size of the traffic also seems to me to be in one memo/blob field. You could simply use the LEN() of that. Besides, when you would use MSSQL backend with filestream and file tables you could even simply measure the directory size of the directories MSSQL creates and manages.

Bye, Olaf.

Olaf Doschke Software Engineering
https://www.doschke.name

RE: Metering Data Volume

(OP)
Olaf,

Looks like the best I can hope for is getting the image size coming from mssql blob (easy) and get the size of the result cursor from their query. I will pulling down the images on demand, and not when the query results are returned. There is a parent/child field that links the main table to the image table.

I see no way of going the isp way and separating each customers traffic in near real time as I'm accumulating this usage data into a separate sql non-metered table. What will be metered is query return size, image size if viewed, and query strings sent and results received. I will need this level of granularity to support a customers billing inquiry.

Currently the bottleneck is the ISP link (50gb). The query above for 100 records takes about 2 second to run and returns 78,002K for the dbf and 141,888K for the fpt. It also takes 2 seconds for 1000 records with 771,902K for the dbf and 1,441,728K for the fpt. 10,000 records takes 2.5 seconds and returns 7,710,902K for the dbf and 16,685,046K for the fpt. Finally, 100,000 takes about 13 seconds and returns 77,100,902K for dbf and 156,894,528K. My workstation is an i7 quad gen2 with ssds. The mssql server is on the same gb lan and is an i7 six core with nvme-u2 drives for sql and standard ssd for boot. Note there are no images being returned here, however the ocr'ed text is in the mix. The record count returned will be limited to maybe 100 per fetch.

Gotcha, my customers will have a full mix of machines, processors and cpu configs, that could have a dramatic effect on these numbers.

Tore,
I'll be testing the ram disk setup next...

RE: Metering Data Volume

(OP)
Update...
Using imDisk with a 2gb ram disk, the speed was about 20-25% better. 1000 records took 1.2 seconds and 100000 took 9-10 seconds...

RE: Metering Data Volume

>I will pulling down the images on demand, and not when the query results are returned

I don't know what you're meaning with this, at some point you will fetch an image, won't you? Then you meter that.

I understand it's hard to separate the traffic of each individual customer from the overall, a number like the record count only would be a possibility to determine a rough estimate of the share, but not realtime, rather as end of month estimate, all traffic / total of all reccount = traffic per record, traffic per customer = this base number times his reccount.

Anyway, I see you already made use of the RAM disk idea, sounds like an awful performance, or are you talking about the data retrieval from remote to the cursor? I can't believe your figures, because when you fetch into a cursor that has a 2GB limit and you talk of 7,710,902K, which would be 7GB, wouldn't it? That speed actually sounds very nice, if you get that volume of data in 2.5 seconds from a remote MSSQL db.

Any action on the RAM disc below the 2GBsize a cursor can have at max should only take split seconds, if I consider how fast RAM speed is today capable to process volumes like 12 GB/s. And 7 GB in 2.5 seconds sound like almost RAM speed.

If you write out cursor with blobs to RAM disk to determine their size, I'd say in case you have a field which contains >90% of the size you're interested you simple determine LEN(field) as an estimate, that means no copy needs to be done and is fastest.

Bye, Olaf.

Olaf Doschke Software Engineering
https://www.doschke.name

RE: Metering Data Volume

(OP)
Olaf,

Quote:

I don't know what you're meaning with this, at some point you will fetch an image, won't you? Then you meter that.

Remember when I said the images are in a separate table and there is a parent/child relationship? The data that is returned into a cursor/table will not contain the image, however it will contain a key into the image table and if the customer wants to view the image, we fetch that image using the key. This keeps the return cursor small and we view the images on demand. Each image fetched will be metered and saved locally so a re-fetch is not needed, in case they want to print it, or ???.

In my numbers I should have said B (bytes) instead of K which dramatically changes the actual size, sorry... After you said what you said, I too was wondering why it did not complain with the 2gb limit. Here is a look at their real sizes, and also note this is not using a ram or ssd disk, but actually a platter disk. Also, again the sql machine is on the same local gb lan and nothing is considered remote (non-lan).



Quote:

determine a rough estimate of the share, but not realtime, rather as end of month estimate, all traffic / total of all reccount

We need to do this as close to real time as possible, otherwise customer could easily rack up alot of usage (by sharing their credentials with co-workers, and etc). We also need to put up notifications (based on user defined threshold) when bucket falls below the threshold, and offer to upgrade.

Quote:

you have a field which contains >90% of the size you're interested
Question... Lets say a specific blob field in a table contains 2mb of real data (not padding) for one record and the same field for all the other records is empty, what would the resulting cursor size look like? Would each record be the same size as the largest? Or would the size actually reflect its true usage? How close would the table size be to what was transmitted over the wire? And I know that compression can play a big role here.

Thanks,
Stanley

RE: Metering Data Volume

>and if the customer wants to view the image, we fetch that image using the key.

Yes, so it's as I said:

Quote (myself)

at some point you will fetch an image, won't you? Then you meter that.

If images are contributing almost 100% to the traffic you could concentrate on metering them only, if the rest of transfer is neglectable. You're not working by the 80/20 rule if you meter everything precisely, while it would suffice to meter the major traffic part and estimate the total traffic has a factor of that size. In your case the data in documents seem to split up with about 1/3 in DBF and 2/3 in FPT. If all fpt datais in one blob field the measurement of all fpt data would simply be sum(len(blobfield)) and the dbf size can be determined with reccount()*recsize(), if you take thatroute you don't need towrite out the cursor to measure these sizes.

And stanlyn, don't confuse remote with another LAN, if you make a connection to a database, that is remote access, any SQLPassthrough is remote, no matter if the database is local in the LAN or on the internet. Any access through SQLEXEC is coming in as a cursor not being a filter is one of the consequences of that, so you don't have that problem at all.

There were too many things just being an unimportant side track, if you'd concentrate on the core problem. To meter the images all you need is LEN(blobfield) you fetch. You could go without storing the cursor to hdd to measure it.

Bye, Olaf.

Olaf Doschke Software Engineering
https://www.doschke.name

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close