INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Memory and working with large files

Memory and working with large files

(OP)
Hello.

I am working on updating a Java application that was originally meant to load XLS files into a database table - MySQL, if you're curious. It utilizes the Apache POI libraries, in part, to make this happen. It's not fast, it uses a lot of memory, but it works.

Now that our organization has upgraded to Office 2010, I am having to change automation processes to work with XLSX (ugh). This leads us to my current issue. The file I am currently working with contains, uncompressed, approximately 350 MiB of data. If you are familiar with the new file formats, you'll know the XLSX files is simply a container for several other files; some of which hold the actual data in XML. That being said, I'm running into a java.lang.OutOfMemoryError when simply trying to instantiate the workbook object. I haven't even been able to READ it, at this point.

Now, before you go suggesting that I increase the heap space with -Xmx150T, I've done that. The max the JVM will allow me to go on my machine is 1250 MiB. Every time I run the application, I watch Java in the Task Manager suck up more, and more memory until it hits this limit, then it pukes with the aforementioned error.

This may be a good time to tell you about my environment:
JDK 7/JRE 1.7.0_05 (Oracle)
NetBeans 7.1.2
Windows7 SP1 x86 with 3.5 GB (usable) RAM
Need more info? Please ask.

Does anyone have any other suggestions for reading XLSX files with Java? Are there some better ways I can do this? I'd really rather not, but will if I have to, resort to saving each worksheet as a CSV first. Maybe I can step into the XLSX container, grab the XML file for the worksheet I wish to load, and use an XML parser? <grumble>

TIA!
I haven't been on here in a while, are we still saying that?


GTG

RE: Memory and working with large files

How big is that file? Did you consider loading the files directly to the database with MySQL tools?

Cheers,
Dian

RE: Memory and working with large files

(OP)

Quote (Diancecht)

How big is that file? Did you consider loading the files directly to the database with MySQL tools?

Dian,

Thank you for your response. The data, unzipped, is nearly 400MiB - my previous estimate was a little low. Why simply opening this file with POI is causing the heap is bloating to over a gigabyte is beyond me.

The main reason I'm using Java (aside from legacy support) is because most of these processes are automated; little to no human interaction. Using LOAD DATA INFILE... doesn't even support XLS, let alone XLSX. Did you have any other MySQL tools in mind? Truth be told, I don't have much experience loading data any other way; never had the need.

Thank you again for your input.


GTG

RE: Memory and working with large files

You must be using the POI DOM-like model for reading the XLS(x) file, as that tries to load the entire file into memory before returning any data.
You should in fact be using the streaming model (SXSSF, forward-read-only, SAX based) as described on this page: http://poi.apache.org/spreadsheet/index.html and that obviously requires you to update to the latest POI release, 3.8
That should solve any memory issues you have.

RE: Memory and working with large files

(OP)
TonHu,

I will give it a try. Thank you!


GTG

RE: Memory and working with large files

(OP)
TonHu,

According to the page you linked to, SXSSF does NOT support reading files. However, it seems that instead of using the usermodel of XSSF, using the eventmodel may be what I'm looking for. It also allows for streaming of the file.

Thank you again for your input!


GTG

RE: Memory and working with large files

GregTheGeek,

Sorry I mixed up the data from the table on the SXSSF vs the XSSD usermodel, but I'm glad you found the correct info!

RE: Memory and working with large files

(OP)
Thanks again for the input, TonHu. I finally have the ability to load large XLSX files using the XSSF eventmodel.

I started with this code, and did what we all do: tweaked it! thumbsup2


GTG

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close