From PDF to OCR

From PDF to OCR

Postby MarcoBoschi » Fri May 04, 2012 3:25 pm

Hi,
I have a PDF file.
I need to convert it to txt (using OCR)

Any ideas?
I have to extract some informations from a pdf produced by another program and to import extracted datas into my software.
Any ideas?
Bye
marco
User avatar
MarcoBoschi
 
Posts: 1027
Joined: Thu Nov 17, 2005 11:08 am
Location: Padova - Italy

Re: From PDF to OCR

Postby ukoenig » Sat May 05, 2012 12:21 pm

Marco,

maybe it works for You ?

http://www.pdfocr.net/download.html

Image

Best Regards
Uwe :?:
Since 1995 ( the first release of FW 1.9 )
i work with FW.
If you have any questions about special functions, maybe i can help.
User avatar
ukoenig
 
Posts: 4043
Joined: Wed Dec 19, 2007 6:40 pm
Location: Germany

Re: From PDF to OCR

Postby MarcoBoschi » Sun May 06, 2012 7:34 am

Uwe,
this program would be perfect if I could run it from command line.
MyApp user doesn't have to know that pdfocr exists on his computer.
When he clicks "Import data from..." MyApp populates database with two steps:
- execution of pdfocr for pdf to txt: convertion
- reading and population od my database
many thanks
User avatar
MarcoBoschi
 
Posts: 1027
Joined: Thu Nov 17, 2005 11:08 am
Location: Padova - Italy

Re: From PDF to OCR

Postby ukoenig » Sun May 06, 2012 9:27 am

Marco,

a freeware-tool, I coudn't find
Here is onother product that works from commandline :

http://www.verypdf.com

Image

A Test :
Image

Description:
Convert text based PDF files to plain text files.
Convert scanned PDF files to plain text files by OCR technology.
Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
-firstpage <int> : first PDF page to convert
-lastpage <int> : last PDF page to convert
-res <int> : set resolution, the unit is DPI (default is 300 dpi)
-ownerpwd <string> : set owner password for encrypted PDF file
-userpwd <string> : set user password for encrypted PDF file
-layout : maintain original physical layout
-noc : don't insert page breaks 0x0C between pages in text file
-bitcount <int> : set color depth when render PDF page to image data,
it can be set 1, 8, 24, default is 8bit
-ocr : enable OCR function for scanned PDF file
-lang <string> : choose the language for OCR engine
-text <string> : add additional text at end of each text page,
this parameter supports the following variables:
%PageNumber%: current page number
%PageCount% : total page count of PDF file
-$ <string> : input your License Key


Examples:
pdf2txtocr.exe C:\in.pdf C:\out.txt
pdf2txtocr.exe -firstpage 1 -lastpage 1 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -res 300 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ownerpwd 123 -userpwd 456 C:\in.pdf C:\out.txt
pdf2txtocr.exe -layout C:\in.pdf C:\out.txt
pdf2txtocr.exe -noc C:\in.pdf C:\out.txt
pdf2txtocr.exe C:\in.tif C:\out.txt
pdf2txtocr.exe C:\in.jpg C:\out.txt
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
pdf2txtocr.exe -ocr -lang eng C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 1 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 8 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 24 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -lang deu C:\in.pdf C:\out.txt
pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
pdf2txtocr.exe -text "PageText %PageNumber% of %PageCount%" C:\in.pdf C:\out.txt


Following command line will OCR all PDF files in D:\temp\ folder to text files:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr -lang deu "%F" "%~dpnF.txt"

Following command line will OCR all PDF files in D:\temp\ folder and subdirectories to text files:
for /r D:\temp %F in (*.pdf) do pdf2txtocr.exe -ocr "%F" "%~dpnF.txt"

Following command line will OCR all PDF files from D:\temp\ folder and output text files to C:\test folder:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr "%F" "C:\test\%~nF.txt""


Best Regards
Uwe :?:
Since 1995 ( the first release of FW 1.9 )
i work with FW.
If you have any questions about special functions, maybe i can help.
User avatar
ukoenig
 
Posts: 4043
Joined: Wed Dec 19, 2007 6:40 pm
Location: Germany

Re: From PDF to OCR

Postby MarcoBoschi » Tue May 08, 2012 7:05 am

Use,
many thanks!
The second (command line) one is more useful even if more expensive.
Bye
marco
User avatar
MarcoBoschi
 
Posts: 1027
Joined: Thu Nov 17, 2005 11:08 am
Location: Padova - Italy

Re: From PDF to OCR

Postby MarcoBoschi » Mon May 14, 2012 8:49 am

Uwe,
unfortunately there are problems using pdf2txtocr.exe.
The Support responds to me that I have to wait new releases.

Do you know other software that work in command line mode?
Otherwise I buy this one http://www.pdfocr.net/register.html
It converts very well my file and it's cheaper

Best regards
marco
User avatar
MarcoBoschi
 
Posts: 1027
Joined: Thu Nov 17, 2005 11:08 am
Location: Padova - Italy

Re: From PDF to OCR

Postby ukoenig » Mon May 14, 2012 9:49 am

Marco,

I found another one ( 59 US ) with a Download-file for testing :

http://www.minipdf.com/pdf-to-text-ocr.htm

Best Regards
Uwe :?:
Since 1995 ( the first release of FW 1.9 )
i work with FW.
If you have any questions about special functions, maybe i can help.
User avatar
ukoenig
 
Posts: 4043
Joined: Wed Dec 19, 2007 6:40 pm
Location: Germany


Return to FiveWin for Harbour/xHarbour

Who is online

Users browsing this forum: No registered users and 32 guests

cron