Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How can I read the content of a PDF file using C#?

Status
Not open for further replies.

JCruz063

Programmer
Feb 21, 2003
716
US
Hi All,
I posted this message in the Adobe Acrobat forum and got no replies. Maybe you can shed some light...

I'm working on a project in which I might have to programmatically read the data in PDF files. I need to parse the PDFs (as if they were text files) and understand what each element of data is.

Let's say, for instance, that the content of one of the PDFs looks like this:
[tt]
001 - Microsoft Word
002 - Microsoft Excel
003 - Microsoft Access
004 - Microsoft Power Point[/tt]

I will need to parse this information, extract the numbers, and attach them to the corresponding strings. That is, 1 is "Microsoft Word", 2 is "Microsoft Excel", and so on. Hopefully all this makes sense. The bottomline is that I need to be able to know what the PDF file contains using a programming language. I won't be literally looking at the file; it'll all be done programmatically. Each number and each string has a speacial meaning in the project, thus I need to accurately extract each number and each string for the project to be successful.

The programming language to be used is either Visual Basic 6.0 or C#.NET.

Is there a specific way to do this?

Thanks!

JC

_________________________________________________
To get the best response to a question, read faq222-2244.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top