Transcription/Editing Trick using ChatGPT4o and WordTalk for MS Word

Status
Not open for further replies.

davejonescue

Puritan Board Junior
Hello all. Just wanted to share a transcription/editing formula I found useful using the new ChatGPT4o Model. It goes as follows:

1. Select and cut single pages from facsimiles, I prefer to use the "Windows Button+Shift+S" method.
2. Paste into ChatGPT 4o, and ask it to "Can you transcribe this page for me." While pasting the cut facsimile page into the same request.
3. Copy the outputted transcription.
4. Paste it, and how ever many pages you want to do into Microsoft Word.
5. Download WordTalk (or a similar program,) install, so that it is within your Word as an add-in.
6. Adjust the speed of the Text-to-Speech, so that you can easily follow along while reading a PDF of the Facsimile.
7. Make real-time adjustments to the transcribed text from Chat, by noticing the discrepancies listening to the outputted transcriptions, compared to reading the PDF facsimiles.

I find this method much more easier than trying to manually type out large portions of text. Been working on Ambrose's "Prima Media et Ultima," and found out that in later editions there were a few sections in Media that Ambrose added, that were not in the Project Puritas editions. So I spent about 2 days only getting to around 10 pages typed. Then I remembered the discussion I had with a brother on here a while back, and decided to try this method.

--- Take-Aways.

While this model of ChatGPT (4o) is probably the best OCR to date for archaic facsimiles, it still makes mistakes. So, you cannot yet trust it to simply transcribe and keep it moving. You do have to go over the text and make adjustments. But by listening to the audio of the Chat transcriptions, while simultaneously reading the Facsimile PDF's, it is easy to notice the errors, and edit accordingly. This saves time of trying to tediously do visual comparisons.

The one downside to Chat transcriptions page by page, is that it tends to format each page as a singular page; so, if you have sequences, like 1.,2.,3., etc. you will have to adjust those according to the original facsimiles. But all in all this does save a lot of time and effort of transcribing, and not bad at all for the present $20 a month subscription price. Considering hiring someone to type out stuff can be very expensive, and typing stuff out by hand can take a alot of time; this method has the potential to save both time and money. Lastly, I am sure it is like all OCR's in that the quality of the output majorly depends on the quality of the input; but, this model of Chat has the capability, which OCR's didnt have previously, to OCR regardless if a text is intermingled with various fonts like most Puritan texts are.

The exciting thing about this break-through in OCR tech, is that it will make it that much more easier to transcribe and edit Puritan and Reformed texts that have yet to be done.

Hope this helps someone, God bless.
 
Last edited:
My brother (a software engineer) told me recently that Amazon Web Services has a really good OCR solution that is better than anything he's seen in many years. He has to convert a lot of loan docs for banks.
 
My brother (a software engineer) told me recently that Amazon Web Services has a really good OCR solution that is better than anything he's seen in many years. He has to convert a lot of loan docs for banks.
I may have to check that out. This is only what I have found so far. It seems either way, when using AI based text conversion, one is going to have to go over the text word for word. For me, at least, Amazons AWS is kind of funky, meaning, it doesnt look that user friendly. At this point, it is easier for me just to pay the standard $20 for ChatGPT4o, and use it when needed.
 
Can AI OCR do handwriting (and specifically learn old secretary style from the 17th century; now, that would be useful to me at the moment!)
 
I may have to check that out. This is only what I have found so far. It seems either way, when using AI based text conversion, one is going to have to go over the text word for word. For me, at least, Amazons AWS is kind of funky, meaning, it doesnt look that user friendly. At this point, it is easier for me just to pay the standard $20 for ChatGPT4o, and use it when needed.
For what it's worth, AWS charges you only when you use it. $1.50/1000 pages is pretty decent. I'm merely suggesting it might yield better results and is worth checking out. The workflow he described is that he's using AWS to OCR to documents but is then able to use ChatGPT to do some really interesting queries against very long documents that are difficult to code.

 
For what it's worth, AWS charges you only when you use it. $1.50/1000 pages is pretty decent. I'm merely suggesting it might yield better results and is worth checking out. The workflow he described is that he's using AWS to OCR to documents but is then able to use ChatGPT to do some really interesting queries against very long documents that are difficult to code.

Ooh. That does sound good. I am going to check it out. Thank you for the info.
 
Thanks. Message me an email; file too large and also subject to copyright to post to the forum. I'll send in the morning.
There is really no need, as ChatGPT cannot do large files. All I would need to test it for you is for you to simply do a "cut and paste" of a single page or even a paragraph to test for you. As I mentioned above, you can only really process a single page at a time. You will be able to tell from a small sample if it is worth your time.

For what it's worth, AWS charges you only when you use it. $1.50/1000 pages is pretty decent. I'm merely suggesting it might yield better results and is worth checking out. The workflow he described is that he's using AWS to OCR to documents but is then able to use ChatGPT to do some really interesting queries against very long documents that are difficult to code.

I tested this out, and unfortunately, it doesnt look like it is a straight OCR program but is geared towards forms and invoices, etc. The way it OCR's is weird, and it exports in Excel. The ChatGPT is far easier to work with.

textract.jpg
 
Thank you all for the suggestions. The ChatGPT4o method is working good for me. It is a simple process, the editing is simple, and there is only a few steps.
 
There is really no need, as ChatGPT cannot do large files. All I would need to test it for you is for you to simply do a "cut and paste" of a single page or even a paragraph to test for you. As I mentioned above, you can only really process a single page at a time. You will be able to tell from a small sample if it is worth your time.


I tested this out, and unfortunately, it doesnt look like it is a straight OCR program but is geared towards forms and invoices, etc. The way it OCR's is weird, and it exports in Excel. The ChatGPT is far easier to work with.

View attachment 11131
The jpegs of the single pages are too large to share on PB and as I said are copyright by the MS owner (and this is a public thread). Those are the reasons I would need to email. If the file for one page is too large I can step down the quality.
 
The jpegs of the single pages are too large to share on PB and as I said are copyright by the MS owner (and this is a public thread). Those are the reasons I would need to email. If the file for one page is too large I can step down the quality.
Gotcha. I will PM you my email, and test it out for you at your convenience. But I will only need a single page.
 
The workflow he described is that he's using AWS to OCR to documents but is then able to use ChatGPT to do some really interesting queries against very long documents that are difficult to code.
I'd check the terms and conditions before I ran anything confidential through it. I know that Microsoft offers a premium AI product where they don't use your data for training the AI. I don't know anything about the AWS version.
 
Do the various AIs store any uploaded images? That would be a problem for scanning my photos; the require permission to publish; no permission needed to try to decipher the text.
 
Do the various AIs store any uploaded images? That would be a problem for scanning my photos; the require permission to publish; no permission needed to try to decipher the text.
I dont believe Chat stores images in such a way that they would seek to publish anything. I think the engine looks at each request as a task, that it completes, and then moves on. Now does it store it for training? I do not know. But even that I doubt, because Google, I think the power-house behind Chat, already has endless samples of PD text to train from. But that is just a guess. I really dont know.
 
I'd check the terms and conditions before I ran anything confidential through it. I know that Microsoft offers a premium AI product where they don't use your data for training the AI. I don't know anything about the AWS version.
Sure. That's always a consideration. Not everything in AI/ML is a transformer model. The two I identified use ML to translate handwriting/text from documents, but they are not (in themselves) transformer models. Also, Microsoft and AWS have to provide identifiable guardrails to customers in terms of how their models interact with your data.
 
Do the various AIs store any uploaded images? That would be a problem for scanning my photos; the require permission to publish; no permission needed to try to decipher the text.
As in my previous post, some do. Most companies do not trust using ChatGPT with their proprietary data as the generic commercial product doesn't have guard rails to protect that data from becoming part of their training data in all cases.
 
As in my previous post, some do. Most companies do not trust using ChatGPT with their proprietary data as the generic commercial product doesn't have guard rails to protect that data from becoming part of their training data in all cases.
Thanks for this. The test Dave ran on one page was surprisingly good in some ways but pretty far off in others. I wondered if it still would save typing time even with extensive corrections; but a no go if there is potential they abscond with the uploads.
 
Thanks for this. The test Dave ran on one page was surprisingly good in some ways but pretty far off in others. I wondered if it still would save typing time even with extensive corrections; but a no go if there is potential they abscond with the uploads.
It would be interesting to see how the AWS service compares. Transformer models (e.g., GPT) are very interesting in how they work. A transcription service uses a special purpose model focused on detecting text rather than (within a transformer model) predicting text.

Even if you only save 20%-50% of your time finding obvious transcription errors, then that saves you time from re-typing the obvious things that transcription picks up.

I have an AWS account if you would like me to try out a scanned document for you.
 
It would be interesting to see how the AWS service compares. Transformer models (e.g., GPT) are very interesting in how they work. A transcription service uses a special purpose model focused on detecting text rather than (within a transformer model) predicting text.

Even if you only save 20%-50% of your time finding obvious transcription errors, then that saves you time from re-typing the obvious things that transcription picks up.

I have an AWS account if you would like me to try out a scanned document for you.
Thanks Rich; I'll send you the same page Dave tested.
 
@davejonescue - I missed your post above about the AWS Texract service. Weird that they limit its use to financial docs but it explains why my brother found it useful. I haven't had a chance to try Azure but Google's OCR service didn't do a great job extracting text from a pdf Chris sent me. ChatGPT does some pretty cool things in ways that are not fully understood as to what the model is drawing upon to provide the output.
 
@davejonescue - I missed your post above about the AWS Texract service. Weird that they limit its use to financial docs but it explains why my brother found it useful. I haven't had a chance to try Azure but Google's OCR service didn't do a great job extracting text from a pdf Chris sent me. ChatGPT does some pretty cool things in ways that are not fully understood as to what the model is drawing upon to provide the output.
I dont know either, but whatever they are doing, it is working. I tried both Google OCR and AbbyFine, and neither of these came close to the output of Chat's 4o model. Chat's model isnt perfect, there is still mistakes, but it saved me so much time having to type stuff out as I could simply listen to the transcriptions and match it up with the facsimile; then quickly fix the mistakes. To me, that is worth paying the $20 a month for.
 
I dont know either, but whatever they are doing, it is working. I tried both Google OCR and AbbyFine, and neither of these came close to the output of Chat's 4o model. Chat's model isnt perfect, there is still mistakes, but it saved me so much time having to type stuff out as I could simply listen to the transcriptions and match it up with the facsimile; then quickly fix the mistakes. To me, that is worth paying the $20 a month for.
Yeah, that's the thing. Even programmers aren't sure how the "guts" are using code that GPT models sucked in to solve problems you pose to them. My brother mentioned Textract as a way to do really good OCR but he was dealing with loan docs. It makes sense because his company works with banks and their loan documents. Textract seems great if you're in that field. What he told me, though, was that they were then using GPT 3.5 turbo to query the documents and it was providing amazing results. It would be really difficult for them to code the queries where GPT allows querying with natural language.
 
Status
Not open for further replies.
Back
Top