×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

GPT-4 Turbo with Vision for OCR

GPT-4 Turbo with Vision for OCR

GPT-4 Turbo with Vision for OCR

(OP)
Hi,

OPENAI’s “GPT-4 Turbo with Vision” should be able to perform OCR.

With help from VFP forum members here and elsewhere I’ve been able to create textproposals and use speechrecognition in our applications.

For OCR we’re now using Tesseract, but I think GPT-4 may be a good alternative.

Has anyone tried this already? Any hints or tips?

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

Not only does it perform OCR, it does so more effectively than any product or process I've ever encountered. It is also extremely capable of then translating that. I ran a letter from 1798 through it (hand written, extremely difficult to read), and it OCRd it around 95% accurately. If it is very clear, then even easier.

Yes, there are some major applicable uses for this and if it can be linked into VFP, it will be a mind bogglingly useful capability.
I'll start looking at this as well.

Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCAP, CDCP, CDCS, CDCE, CTDC, CTIA, ATS, ATD

"I try to be nice, but sometimes my mouth doesn't cooperate."

RE: GPT-4 Turbo with Vision for OCR

Quote:

I ran a letter from 1798 through it (hand written, extremely difficult to read), and it OCRd it around 95% accurately.

That is very impressive.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: GPT-4 Turbo with Vision for OCR

There was a great article in the April issue of Scientific American about one of good uses for AI that I've heard of. (I'm highly skeptical of LLMs and other things in that vein, particularly for ethical reasons.)

Here's a link, though I suspect it'll only be accessible to subscribers: https://www.scientificamerican.com/article/inside-....

The basic idea is that they used AI in a couple of ways to read ancient scrolls that were burned so badly, they couldn't be unrolled.

Tamar

RE: GPT-4 Turbo with Vision for OCR

So I sent this business card that I picked up (and is one of the craziest layouts for a card example I have), this was the response that ChatGPT came back with:


Company Logo: IX Terenine
Name: Aaron Shelley
Title: Senior Solutions Architect
Email: aaron.shelley@terenine.com
Office Phone: 866-379-3581
Direct Phone: 423-702-7038
Mobile Phone: 423-664-7383
Fax: 423-643-8031

The prompt I gave it was: identify the text on in this image, as if it were a business card:


It's 100% spot on.
Here's another example with a difficult card where I've also asked it to translate the card into English (without giving it a base langage):


The response was:
Logo: K V H
Name: 榊原 明秀 (Sakihara Akihide)
Position: データセンターサービス部 シニアスペシャリスト (Data Center Services Department, Senior Specialist)
Company: KVH株式会社 (KVH Co., Ltd.)
Address: 東京都港区南青山1-1-1 新青山ビル西館 T107-0062 (1-1-1 Minami Aoyama, Minato-ku, Tokyo, New Aoyama Building West Wing, T107-0062)
Telephone: TEL 03-4560-8842
Fax: FAX 03-5772-5940
Email: akihide.saito@kvh.co.jp
Website: www.kvh.co.jp

(And these are very old cards, these contacts no longer exist, so I'm not "spamming" someone.

There is certainly merit in exploring connecting ChatGPT to interface with VFP.
I haven't had time to explore this in depth yet, due to other commitments, but I'll certainly be circling back on it soon.

Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCAP, CDCP, CDCS, CDCE, CTDC, CTIA, ATS, ATD

"I try to be nice, but sometimes my mouth doesn't cooperate."

RE: GPT-4 Turbo with Vision for OCR

Quote (Tamar)

Here's a link, though I suspect it'll only be accessible to subscribers:

It's OK. I managed to read it OK even though I am not a subscriber.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: GPT-4 Turbo with Vision for OCR

(OP)
Hi,

Thanks everyone for your input. I haven't started this project yet, as I was in the US last week to visit New York.
Hopefully later this week I will start exploring my options to get this type of OCR working.

Does anyone know if GPT3.5 models will still be working and be accessible when I upgrade from GPT3.5 to GPT4? I guess I will need to create a new API code if it does?

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

Hi Gerrit,
Yes, you can use either 3.5 or 4.0 if you have the 4.0 upgrade.
I use both.

Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCAP, CDCP, CDCS, CDCE, CTDC, CTIA, ATS, ATD

"I try to be nice, but sometimes my mouth doesn't cooperate."

RE: GPT-4 Turbo with Vision for OCR

(OP)
Hi Scott,

I’m still using my 3.5 prepaid subscription and yesterday I made my first succesfull trial with gpt-4 vision.
So there doesn’t seem to be a reason to upgrade to the more expensive gpt-4 platform.

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

Gerrit,
The reason will depend on how many prompts you want to send in a day/hour.
You will hit a limit really fast (I think it's 25 in 3 hours) on the trial.

Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCAP, CDCP, CDCS, CDCE, CTDC, CTIA, ATS, ATD

"I try to be nice, but sometimes my mouth doesn't cooperate."

RE: GPT-4 Turbo with Vision for OCR

(OP)
Hi Scott,

Good to know that. I was advised by the community @ openai not to upgrade to GPT-4. As I may have said before I have a prepaid subscription for GPT-3.5 (not the “free” trial).

I may upgrade once I run into the limit you describe.

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

(OP)

RE: GPT-4 Turbo with Vision for OCR

I'm having a very hard time understanding what is being accomplished in this thread - I thought it was supposed to be about GPT-4 Turbo with Vision, not ChatGPT.

First of all, it appears that GPT-4 Turbo with Vision does not support OCR. See here: https://community.openai.com/t/gpt4-ocr-image-reco...

For more informatiom on GPT-4 Turbo with Vision:

For API documentation: https://platform.openai.com/docs/guides/vision
For the community forum: https://community.openai.com/tags/c/api/7/gpt-4-vi...

RE: GPT-4 Turbo with Vision for OCR

Vernpace,
I see this as an overall discussion, understanding the issue before we "dive into it". I am appreciative of what is presented and discussed here, though it might seem a little unorthodox compared to the way threads within this forum usually develop.
This is new to many of us. This is an evolution.

Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCAP, CDCP, CDCS, CDCE, CTDC, CTIA, ATS, ATD

"I try to be nice, but sometimes my mouth doesn't cooperate."

RE: GPT-4 Turbo with Vision for OCR

(OP)
Hi Vernpace,

I'm still working on this, but I already get good results using model "gpt-4-turbo-2024-04-09" for OCR. I have to compare this to the results I get with my VFP application's integrated OCR, but AI looks very promising. I'm sure there will be newer and even better models to come.

If you want you can join the party and get this working!

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

Gerrit,

Since you are getting good results for an OCR, could you please provide the VFP code for the JSON request? Since it's an OCR, I'm assuming you are using a local base64 image and not a URL, right?

RE: GPT-4 Turbo with Vision for OCR

(OP)
Hi Vernpace,

This works for an online image. Local images need base64 encoding.

{
"model": "gpt-4-turbo",
"max_tokens" : 2000,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What text is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://www.something.org/imageforOCR.jpg"
}
}
]
}
]
}

NOTE: the url above is not working, try this with your own url for testing. Unfortunately the indents were lost when copying the JSON.

I know this is just the beginning but we have to keep learning, don’t we?

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

(OP)
This is how I added support for a local image (only PNG, GIF, JPG and WEBP are allowed).

CODE --> foxpro

LOCAL lcImage, lcBase64Image
lcImage = "c:\demo\ImageToTextOCR\demo.png"
lcBase64Image = STRCONV(FILETOSTR(lcImage),13)


* Create the request payload for GPT-4 Vision with local image
TEXT To lcPayload TEXTMERGE NOSHOW
{
  "model": "gpt-4-turbo",
  "max_tokens" : 2000,
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What text is in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,{<<lcBase64Image>>}"
          }
        }
      ]
    }
  ]
}
ENDTEXT 

The results are ok for plain text recognition. However when trying with a page with multiple columns (like an Excel screenshot) doesn't work. In that case I don't get the entire text, but part of the text plus a page discription.
For example I get this at the start of the "content" value in the JSON response:

"The text in the image is a list showing the record of backorders or outstanding purchase order details. Here are the headers and some data entries:"

I'm trying to find a way to get the entire text for these image files too, but haven't succeeded yet. Adequate documentation is hard to find.

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

Here is what I came up with - similar to yours, but no curly brackets around <<lcBase64Image>>

CODE -->

TEXT TO lcRequest TEXTMERGE NOSHOW
{"model": "gpt-4-turbo",
 "messages": [{"role": "system","content": "<<lcDirective>>"},
              {"role": "user", "content": 
             [{"type": "text", "text": "<<lcQuestion>>"},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcBase64Image>>", "detail": "<<lcDetail>>"}}
]}]}
ENDTEXT 

I also included a "system" prompt to give additional directives if needed. Also included is "detail" which is either "high" or "low" depending on the size of the image - note that "auto" is automatic.

I think an Excel screen shot may be problematic for you. See below:

RE: GPT-4 Turbo with Vision for OCR

Almost forgot,

Last night, I ran a "face recognition" test sending 10 base64 images - the images were from my brother's wedding where there were hundreds of people:

lcDirective = "Please provide a response without using any asterisks (*) or pound signs (#). For headings, prepend only with large roman numerals. For subheadings, prepend only with small letters in parenthesis."

lcQuestion = "Is the woman in the first image shown in any of the other images based on her face only? If so, describe (in detail) the the other images where the woman is shown."

CODE -->

TEXT TO lcRequest TEXTMERGE NOSHOW
{"model": "gpt-4-turbo",
 "messages": [{"role": "system","content": "<<lcDirective>>"},
              {"role": "user", "content": 
             [{"type": "text", "text": "<<lcQuestion>>"},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage1>>", "detail": "<<lcDetail1>>"}},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage2>>", "detail": "<<lcDetail2>>"}},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage3>>", "detail": "<<lcDetail3>>"}},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage4>>", "detail": "<<lcDetail4>>"}},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage5>>", "detail": "<<lcDetail5>>"}},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage6>>", "detail": "<<lcDetail6>>"}},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage7>>", "detail": "<<lcDetail7>>"}},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage8>>", "detail": "<<lcDetail8>>"}},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage9>>", "detail": "<<lcDetail9>>"}},
              {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<<lcImage10>>", "detail": "<<lcDetail10>>"}}
]}]}
ENDTEXT 

The response was perfect:

The woman from the first image appears in several other images based on her facial features and attire.

1. In the third image, the woman is seen in the foreground, her face partially visible, observing a group of bridesmaids dressed in blue. This is during what seems to be a wedding ceremony set against a lush, mountainous backdrop.

2. The sixth image shows the woman engaging in a dance or friendly interaction at a social event. She is dressed in the same outfit, with a detailed beige dress and a similar hairstyle, confirming it's the same event and person. The setting appears to be an outdoor patio area with heaters and umbrellas, suggesting a lively evening reception or party.

These images illustrate her participation in the festivities of what seems to be a wedding event, depicted in various settings of the celebration.

RE: GPT-4 Turbo with Vision for OCR

(OP)
Hi Vernpace,

Very nice indeed.

I see online a lot of people trying to tweak this model to get all text from an image (true OCR).
I got a much better result for an image with lots of rows and columns using this code:

CODE --> foxpro

"type": "text",
"text": "Get all plain text from this image, allow repetitions, use all columns and all rows." 

I still think it's strange having to tweak like this. Getting all plain text is not the problem for OPENAI, they just have a too strict policy on this feature for whatever reason.

Another annoying thing is that most OCR applications prefer TIFF files, where GPT-4 Vision uses another file format (PNG, GIF, JPG and WEBP). Of course we can convert all these files (I use ImageMagick with VFP), but it's still causing extra work.

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

Gerrit,

Yes, the API's interpretation of the text prompt is tricky - must be precise with Vision.

Here is some good news: Yesterday, OpenAI released a new model: GPT-4o (“o” for “omni”). You can use it now - just replace "gpt-4-turbo" with "gpt-4o"

See here: https://openai.com/index/hello-gpt-4o
See the YouTube video presentation here: https://youtu.be/DQacCB9tDaw

This is truly amazing stuff. While the presentation is focused on ChatGPT, "gpt-4o" is already working for the APIs that use "gpt-4-turbo". I tested it yesterday with language translations and it was two to three times faster.

RE: GPT-4 Turbo with Vision for OCR

(OP)
Hi Vernpace,

I get similar results with this new GPT-4o model, only faster.

GPT-4 (and GPT-4o) are not made for OCR, but to a certain degree it still seems possible to do this. When asking GPT it responds that the AI portion should be used to process the text from other OCR packages (like Tesseract and Adobe). I already use Tessaract in my VFP applications and I intend to add some level of AI support.

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

So with the new ChatGPT 4o I thought I'd ask ChatGPT how it would achieve this and this was the response:

Integrating ChatGPT-4 into Visual FoxPro (VFP) 9 to pass images for information identification involves several steps. Here's a high-level overview of the process:

1. API Access
First, you need access to the OpenAI API, which provides ChatGPT-4 capabilities.

2. Image Processing API
ChatGPT-4 primarily handles text. For image processing, you might need to use additional services such as OpenAI's DALL-E for image generation or other image recognition APIs like Google Vision API, Microsoft Azure Cognitive Services, or Amazon Rekognition.

3. Setting Up API Calls in VFP
Visual FoxPro can make HTTP requests to external APIs. You will need to use these capabilities to interact with the necessary APIs.

4. Handling JSON
APIs typically return data in JSON format. VFP needs to parse this JSON to handle the responses.

Steps to Implement
Step 1: Obtain API Keys
Get API keys from OpenAI and your chosen image processing service.

Step 2: Make HTTP Requests from VFP
Use VFP's XMLHTTP or WinHttp.WinHttpRequest to make HTTP requests. Here’s a basic example of making a GET request:

CODE

LOCAL loHttp, lcUrl, lcResponse
loHttp = CREATEOBJECT("MSXML2.ServerXMLHTTP.6.0")
lcUrl = "https://api.example.com/endpoint"
loHttp.Open("GET", lcUrl, .F.)
loHttp.setRequestHeader("Content-Type", "application/json")
loHttp.setRequestHeader("Authorization", "Bearer YOUR_API_KEY")
loHttp.Send()
lcResponse = loHttp.responseText
? lcResponse 

Step 3: Uploading Images
For uploading images, you might use a POST request. Here's a basic example of a POST request:

CODE

LOCAL loHttp, lcUrl, lcImagePath, lcResponse, lcImageBase64
lcImagePath = "C:\path\to\your\image.jpg"
lcImageBase64 = FILETOSTR(lcImagePath) && You might need to encode this to base64 if required by the API

loHttp = CREATEOBJECT("MSXML2.ServerXMLHTTP.6.0")
lcUrl = "https://api.example.com/upload"
loHttp.Open("POST", lcUrl, .F.)
loHttp.setRequestHeader("Content-Type", "application/json")
loHttp.setRequestHeader("Authorization", "Bearer YOUR_API_KEY")

LOCAL lcRequestBody
lcRequestBody = '{"image": "' + lcImageBase64 + '"}'
loHttp.Send(lcRequestBody)
lcResponse = loHttp.responseText
? lcResponse 

Step 4: Parsing JSON Response
VFP doesn’t have native JSON parsing capabilities, but you can use a third-party library like VFP-JSON or write your own parser.

CODE

* Assuming you have a JSON parser
LOCAL loJson, loResponse
loJson = CREATEOBJECT("YourJsonParser")
loResponse = loJson.Parse(lcResponse) 

Step 5: Integrating with ChatGPT
You can send text obtained from the image processing API to ChatGPT for further analysis.

CODE

LOCAL loHttp, lcUrl, lcResponse
loHttp = CREATEOBJECT("MSXML2.ServerXMLHTTP.6.0")
lcUrl = "https://api.openai.com/v1/engines/davinci-codex/completions"
loHttp.Open("POST", lcUrl, .F.)
loHttp.setRequestHeader("Content-Type", "application/json")
loHttp.setRequestHeader("Authorization", "Bearer YOUR_OPENAI_API_KEY")

LOCAL lcRequestBody
lcRequestBody = '{"prompt": "Your text here", "max_tokens": 100}'
loHttp.Send(lcRequestBody)
lcResponse = loHttp.responseText
? lcResponse 

Example Workflow
Upload Image: Use VFP to send an image to the image processing API.
Get Text Information: Parse the response to extract text or other information from the image.
Send to ChatGPT: Use the extracted text to formulate a query to ChatGPT.
Process Response: Parse and handle the response from ChatGPT in your VFP application.
Considerations
Error Handling: Implement robust error handling for API calls.
Security: Securely store and handle API keys.
Performance: Consider the latency of API calls and handle them asynchronously if needed.
This approach should give you a foundational framework to integrate ChatGPT-4 and image processing capabilities into your VFP 9 application.

I haven't tried it yet, but thought this might be an interesting point to start from.

Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCAP, CDCP, CDCS, CDCE, CTDC, CTIA, ATS, ATD

"I try to be nice, but sometimes my mouth doesn't cooperate."

RE: GPT-4 Turbo with Vision for OCR

(OP)
Hi Scott,

Gpt4-o’s OCR using the API is definitely working! In my document archiving software I have two options now for OCR: (1) desktop OCR with Tesseract and (2) using Gpt-4o.

Most of my source material is pdf. So first I determine the number of pages. If there are multiple pages I convert all pages to either TIFF (for Tesseract OCR) or PNG (Gpt’ OCR). The conversion itself is done in the background with ImageMagick. In a form I can see all images and select the pages I want for OCR. I run OCR for a single page each time, so for multiple pages I use a loop. All tasks and results are looked up and stored in a central MySQL database. Once the OCR is ready I show a form to add all text files to the archive.

My JSON message to GPT is quit different from your code. I use MSXML2.ServerXMLHTTP as well. However the actual communication with MSXML2.ServerXMLHTTP is done in a separate application that runs several AI-services. This application is allways in my systray. Right now I have 3 services (text gerenation, speech recognition and OCR); the next one will follow soon (translation).

The call for OCR is quite difficult for GPT. It may only give you part of the text. It’s more an image descriptor actually. But if you add tot the task that it should allow repetitions and some other things perhaps, you should be able to get quite good OCR results, even for handwritings and uncommon languages.

If you need anything in particular, just let me know.

Goot luck!

Regards, Gerrit

RE: GPT-4 Turbo with Vision for OCR

Hi Gerrit,

When you are ready to tackle language translation, you should start another thread.

As of this writing, we have successfully implemented six OpenAI services for a client: (1) DALL-E-3 Image generation; (2) Text generation; (3) Speech-To-Text; (4) Text-To-Speech; (5) Vision; and (6) Language Translation. Language translation was by far the most difficult - for reasons that are not obvious. We can now use the OpenAI API to translate 139 languages:



We recently entered into an NDA for these services, so we can no longer share source code. However, we can offer tips which will be helpfull.

RE: GPT-4 Turbo with Vision for OCR

Gerrit,

With respect to Vision, we learned in the OpenAI forums that it is best to resize a base64 image (or images) before it is sent. See this:



We can provide the code for this - You may want to modify if you don't need tcDetail, tcDimension, or tiBytes info

CODE -->

*!*    Resizes a image. Assumes "System.App" is already loaded.
FUNCTION ResizeImageGPT4Vision(tcFileSource AS String, tcFileDestination AS String, tcDetail AS String, tcDimension AS String, tiBytes AS Integer) AS VOID
   LOCAL lcFileExt, lcDetail, loImage, liMaxSize, liWidth, liHeight, liWidthNew, liHeightNew, liDetailThreshold
   LOCAL loSrcImage AS xfcBitmap 
   LOCAL loResized AS xfcBitmap 
   LOCAL loGfx AS xfcGraphics 

   liMaxSize = 1024
   liDetailThreshold = 700
   lcFileExt = UPPER(JUSTEXT(tcFileDestination))

   WITH _Screen.System.Drawing 
        loImage  = .Bitmap.FromFile(tcFileSource, .T.)
        liWidth  =  loImage.Size.Width
        liHeight =  loImage.Size.Height

        loImage.Dispose()

        IF liWidth > liMaxSize OR liHeight > liMaxSize
           IF liWidth > liHeight
              liWidthNew  = liMaxSize
              liHeightNew = INT(liHeight * (liMaxSize / liWidth))
           ELSE
              liHeightNew = liMaxSize
              liWidthNew  = INT(liWidth * (liMaxSize / liHeight))
           ENDIF
        ELSE
           liWidthNew  = liWidth
           liHeightNew = liHeight
        ENDIF

        IF liWidthNew > liDetailThreshold OR liHeightNew > liDetailThreshold
           lcDetail = "high"
        ELSE
           lcDetail = "low"
        ENDIF

*!*     Load the original Image 
        loSrcImage = .Bitmap.New(tcFileSource) 
*!*     Create a New Image with the desired size 
        loResized = .Bitmap.New(liWidthNew, liHeightNew, .Imaging.PixelFormat.Format32bppARGB)
*!*     Set the image resolution to be the same as the original 
        loResized.SetResolution(loSrcImage.HorizontalResolution, loSrcImage.VerticalResolution) 
*!*     Obtain a Graphics object to get the rights to draw on it 
        loGfx = .Graphics.FromImage(loResized) 
*!*     Set some properties to ensure to have a better quality of image 
        loGfx.CompositingQuality = .Drawing2D.CompositingQuality.HighQuality
        loGfx.InterpolationMode = .Drawing2D.InterpolationMode.HighQualityBicubic 
        loGfx.SmoothingMode = .Drawing2D.SmoothingMode.AntiAlias
*!*     Draw the source image on the new image at the desired dimensions 
        loGfx.DrawImage(loSrcImage, 0, 0, liWidthNew, liHeightNew) 
*!*     Save the resized image 
        DO CASE
           CASE lcFileExt == "GIF"
                loResized.Save(tcFileDestination, .Imaging.ImageFormat.Gif)

           CASE INLIST(lcFileExt, "JPG", "JPEG")
                loResized.Save(tcFileDestination, .Imaging.ImageFormat.Jpeg)

           CASE lcFileExt == "PNG"
                loResized.Save(tcFileDestination, .Imaging.ImageFormat.Png)
        
        ENDCASE

        loSrcImage.Dispose()
        loResized.Dispose()
        loGfx.Dispose()

   ENDWITH

   tcDetail = lcDetail
   tcDimension = TRANSFORM(liWidthNew) + " x " + TRANSFORM(liHeightNew)
   tiBytes = GetFileSizeX(tcFileDestination)

ENDFUNC 

This is very usefull when sending multiple images. Each of the six images below originally were 1792 x 1024 and between 3.5 - 4.5 MB



RE: GPT-4 Turbo with Vision for OCR

(OP)
Hi Vernpace,

I use Base64 coding for OCR.
I will create a new thread for tranlations via the API.

Regards, Gerrit

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login


Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close