Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Using PHP to process a big file 1

Status
Not open for further replies.

OsakaWebbie

Programmer
Feb 11, 2003
628
JP
I keep running into roadblocks on what seems like it ought to be simple. Perhaps someone else has some insight.

Bottom line goal: take a big (8 MB) file of SQL insert commands, convert the Shift-JIS text in it to Unicode, and insert the records into a waiting database on a web hosting server. I have Apache/PHP/MySQL running on my local Windows machine as well (in fact, that's where the big file came from - mysqldump). Obviously the "right" way is a Perl script run from the command line to convert the text, and piping the file into the mysql command. But I don't have access to the command line on the hosting server, and I tried using Perl as CGI but couldn't get it to work somehow. Customer support with this particular hosting service seems to be nonexistent. So I am forced to make PHP do all the work.

I'm running phpMyAdmin on the hosting server, so I started out thinking that I should just prepare an SQL file to upload using its utilities. But the Japanese text needs to be converted first. The hosting server has a PHP module with commands for doing the conversion, but opening files for writing doesn't work (no doubt the Apache user doesn't have write permission on the directory). So I wrote a script that opens a file of SQL, parses it into complete SQL commands, converts any Japanese text, and runs each SQL command against the database. It runs great on small files, but the one table that has 120,000 records causes the PHP processer to timeout after 30 sec. Figuring I only have to do this once, yesterday I spent the afternoon pulling 5000 line chunks out of the file using a text editor, ftp-ing each chunk, and running the script on it. But I used the same working file over and over, so I don't still have the data in chunks, and now, due to my own ignorance (my fields weren't long enough, as Unicode uses more bytes than Shift-JIS), I need to do it again.

I thought I had a great idea - I would write a quick little PHP script to run locally (where I have write permission) just to break the file into pieces. And it tests fine on small files. But when I try it with the big one, my browser gives me a standard error claiming that the file (the PHP one!) is not found or has a DNS problem (yeah, right - I'm running on localhost!). This happens no matter whether I specify the filename in the URL (mycode.php?f=sql.txt) or hardcode the filename. I assume it's some sort of timeout problem again (although it gives up in about 5-10 seconds, not 30), with a less informative error message and less productivity (the script on the hoster that runs the SQL commands would at least do as many as it could in 30 seconds!).

Do I need to give up and spend another afternoon breaking the file up by hand? Any suggestions on how I could automate this?
 
You are obviously hitting a timeout through the web server. SQL must be tinkered with through other means.

If this is not a regular DB copy, is it possible to use a freeware MySQL client like MySQL-CC or MySQL-Front to manually upload your large DB file?




I've used MySQL-Front to upload 20+ Mb SQL files.

- - picklefish - -

Why is everyone in this forum responding to me as picklefish?
 
One more note...

For your localhost timeout problems, check apache's httpd.conf, PHP's php.ini, and MySQL's my.ini for any time-out settings (most likely in the apache config).

Be sure to use and not in case you have a proxy server/firewall on your LAN that has its own short timeout length.

- - picklefish - -

Why is everyone in this forum responding to me as picklefish?
 
Jimoblak (apparently better known as picklefish [wink]), thanks for responding.

I am already using a frontend for MySQL - phpMyAdmin - but having to convert multibyte text first, and not being able to do that with output to a file (due to lack of write permission), has prevented me from being able to use that interface for this transfer. And I have been stating the URL as all along.

I searched for "timeout" in both httpd.conf and php.ini (my file-splitting script doesn't use the DB, so my.ini is unrelated). In httpd.conf, Timeout is 300 seconds, and KeepAliveTimeout is 15. Even though KeepAlive shouldn't relate to this at all, I tried lengthening it anyway, but no change. In php.ini the only hit on the word "timeout" was pfpro.defaulttimeout=30, but (a) that appears to be specifically for Verisign Payflow Pro, and (b) my script is giving up faster than that anyway (about 7 sec. when I timed it; I think it varies a little bit though). Are there other variables in httpd.conf that I should be looking for besides ones with the word "timeout"?
 
The reason that I suggested the alternate MySQL frontends is because phpMyAdmin is limited by the restrictions of Apache and PHP's configured timeouts. The frontends that I mentioned connect directly to the MySQL server so it is more efficient.

Regardless of this, the trouble still seems to be with the timeout length used in PHP when you convert the text file.

- - picklefish - -

Why is everyone in this forum responding to me as picklefish?
 
Well, we all didn't know that, but there are a lot of things I don't know. One of them was in your previous post, about max_execution_time - I noticed the word "runtime" and decided to research whether I could change that setting in the php file I was originally trying to run on the web hosting server (the script that does it all - opens the file, parses the SQL commands, converts the text, and runs the commands on the DB). A quick web search turned up an explanation of the php function set_time_limit(), and a simple addition of set_time_limit(0) to my script seems to be doing the trick. I tested it on a file big enough to take about 2 minutes, and now I am running it on the real thing. My rough estimate was that it would take 15-20 minutes, but the fact that it is still running after 45 minutes doesn't worry me - it's probably just slower when smoke is rising from the server's hard drive, right? We're talking about 120,000 insert statements with multibyte text strings in two of the three fields (in case you're wondering, it's the Japanese post office's database of all postal codes in the nation).

I think I'll stop timing this just to satisfy my curiosity about how long it will take, and go to bed and look at it in the morning (it's past 1 a.m.). Thanks, sleipnir214, for the tip - if it succeeds by morning, you'll have your star!
 
Here's your star, sleipnir214 - it eventually worked, and your tip of runtime directives would be useful to anyone who is trying to get stuff done but is not in control of the server (e.g. using a hosting service).

Here's what happened, for those who are curious. At about an hour the browser (or the http server) gave me the standard Page Not Found error. I went to phpMyAdmin to look and see if perhaps it finished and just hosed giving me the "I'm done" text, but was disappointed to see only about 39,000 records. But when I looked again, it was over 40,000, then 42,000. The PHP script was still running back there somewhere! So I went to bed.

This morning I saw that it had quit at about 62,000 (about half my file), so I made a new file with the second half of the statements and started it up again. This time I got the Page Not Found message much sooner (sometime between about 1 and 10 minutes - I wasn't watching), but again, the number of records in the DB continued to increase, and after about an hour it finished.

A mystery that will remain unsolved is why it was so slow. Previously I had fed the script smaller files of 5000 statements at a time, and it almost always succeeded within the 30 second time limit (usually well within). Calculating, that would mean that 120,000 records should take no more than 12 minutes. I know that servers are busier at some times than others, but this took at least two hours, and the two halves of the time were at totally different times of day. Oh well, I'm glad it wasn't my time, breaking the file apart by hand and spoon feeding my script again! [smile]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top