Converting multiple file types with IBM Datacap Taskmaster Capture
create a rule to convert them all to tiffs.
Taskmaster Capture has several actions to convert file types to tiff format. However, some of these will abort if applied to the wrong file type. If the batch contains multiple file types, use this solution to convert all of them.
This ruleset will take all files with a DCO status of 49 and convert them to tiff format, setting the DCO status to 75. It uses the Convert action library.
- Convert
- Convert to TIFF
- Function Images
- ChkDCOStatus("49")
- ImageFileTypesToConvert("jpg,jpeg,gif,bmp")
- ImageMonoType(1)
- ImageMonoThreshold(50)
- ImageToTIFF()
- SetDCOStatus("75")
- Function TIFF
- ChkDCOStatus("49")
- SplitMultipageTiff()
- SetDCOStatus("75")
- Function PDF
- ChkDCOStatus("49")
- PDFDocumentToImage()
- SetDCOStatus("75")
- Function Word
- ChkDCOStatus("49")
- WordDocumentToImage()
- SetDCOStatus("75")
- Function Outlook
- ChkDCOStatus("49")
- OutlookMessageToImageAndAttachment()
- SetDCOStatus("75")
- Function Excel
- ChkDCOStatus("49")
- ExcelWorkbookToImage()
- SetDCOStatus("75")
- Function Images
- Convert to TIFF
I'm trying to convert XPS file to tiff. I registered correctly and placed the .dll and rrx files where they are supposed to be. But when I process a batch, it wouldnt load the needed object. The error generated in the log file is as follows: Couldn't load needed object: Datacap.Libraries.ConvertXPS.Actions into Datacap.Libraries.ConvertXPS.Actions instance" loc="CCom::Call" API=""
ReplyDeleteAny help will be appreciated.
Thanks
Up to now i don't have an idea about the .XPS file types...but i will try to know about it....if you share the knowledge about the .XPS will be appreciated.
ReplyDeleteHi
DeleteI am new on datacap studio kindly tell me how to convert the format of the documents using which RULESET oR ACTION LIBRARY????
.XPS files is a file type just like .jpg, .doc e.t.c. There are times when we might need to convert a .xps document to tiff, thats why this action needs to be in place.
ReplyDeleteAny idea how i can go about this?
i'm trying to scan PDF file and TIFF file from one folder. but the PDF file still can't scan. should i convert into TIFF first ? and from your tips above, where should i put in task profiles the convert function? i'm trying to put that in task profiles PageID. but still not work.
ReplyDeleteany idea about this? thanks anyway
Hi
DeleteI am new on datacap studio kindly tell me how to convert the format of the documents using which RULESET oR ACTION LIBRARY????
Hi Dian,
ReplyDeletefor scaning pdf's you have to use the following action at vscan ruleset.
SetImageType(".tiff,.pdf") which is available at Vscan action library....
for converting PDF to TIF
use the fallowing things.........
Create a ruleset and add rule and add the following
from Datacap.Libraries.Convert.Pdf
PDFBitDepth("1")
PDFCompressio("4")
PDFConversionMethod("2")
PDFQuality("100")
DFDocumentToImage()
.......................................................
apply this rule on "Other" page open.
In task profile add this Ruleset as a first ruleset in PAGEID task .
thank you for your answer, it works. i can convert PDF to TIFF file. but the problem is i still can't get the content from PDF files, like the way i get the content from TIFF files.
ReplyDeletein file Export.xml, there's no DATAFILE from .PDF file :
TYPE : Other
STATUS : 49
IMAGEFILE : tm000001.pdf
ScanSrcPath : c:\datacap\irs_test\images\10403z05.pdf
It's not like .TIFF file :
TYPE : 2007 Return
STATUS : 0
IMAGEFILE : tm000002.tif
ScanSrcPath : c:\datacap\irs_test\images\1040ez01.tif
Confidence : -1,#IND
Image_Offset : 0,24
TemplateID : 556
Fingerprint Created : No
DATAFILE : tm000002.xml
FILEUPLOADED : c:\datacap\irs_test\batches\20120107.002\tm000002.tif
Doc_ID : {45CAC0CF-10D8-4BD6-A130-59A4F9887D81}
What should i do to get the content from .PDF file?
Fyi, image in .PDF file as same as .TIFF file and the zones too.
thanks.
Hi Dian,
ReplyDeleteThis is because of the original PDF file structure was not removed from the document hierarchy. to overcome this, create another rule int he same ruleset and add the actions called "RemoveSourceImages()"and "CopyOriginToChildren() " from the Documents action library which is available in the APT application.
Apply this rule at the Batch close.
You can get the above action library by copying the Documents.rrx from the APT application into your application's Rules folder.
Hai Trinath,
ReplyDeleteI'm still trying how to capture the contents from PDF files. Now, the PDF file could convert into TIFF files, but when it shown in Verify Task Profiles as TIFF, the zones are move from the original position. And the content cannot captured by Datacap.
Is there any solution for my problem?
Thanks.
Hi Dian,
ReplyDeleteI have Worked with this type of Problems.
To overcome these type of problems i used the Pattern Matching Technique. For this we have to assign the following actions at PageID Rule.
PatternMatch_Identify()
pat_RegisterZones ()
these actions will adjust the jones automatically based on the Anchor Fields.
Hi... I need to convert excel file to tiff image. But with this instructions doesn't works. I need a other example please. :)
ReplyDeleteThanks.
Hi,
ReplyDeletehere i am giving the function which i used for one of my application,
i am getting good results with the fallowing actions
ExcelAutoFitColumns(True)
ExcelAutoFitRows(True)
ExcelOrientationToLandscape()
ExcelOrientationToPortrait()
ExcelPrintQuality(300)
ExcelScalingFactor(100)
ExcelTiffCompression("CCITT4")
SetDeleteOrigional(True)
ExcelWorkbookToImage()
Apply this on Other page before Imagefix.
goodluck
Its working fine ....But Tiiff image name not set properly like as TM000001.Can any one tell me how solve this problem.
DeleteI am trying to convert pdf invoices to tiff also and although the conversion works, the original pdf document is still in the batch.
ReplyDeleteThe reply on April 16, 2012 indicates there are 2 actions which will remove the original pdf from the batch, but I cannot find these actions in my version of Datacap APT. I am running 8.0.1 service pack 3.
My batch looks like:
B 20120185.009
TYPE : APT
LAST_RR_TPROFILE : Batch Profiler:3
Recog - First Level : 2
P TM000001
TYPE : Other
STATUS : 49
IMAGEFILE : tm000001.pdf
ScanSrcPath : c:\datacap\primo\images\input\global foods.pdf
Bitcount : 1
RecogStatus : 0
s_srCreateCCO : 1
Confidence : 1
Image_Offset : 0,0
TemplateID : 1050
Fingerprint Created : No
P 01010000
TYPE : Main_Page
STATUS : 49
IMAGEFILE : 01010000.tif
ParentImage : tm000001.pdf
Bitcount : 1
RecogStatus : 0
s_srCreateCCO : 1
Confidence : 1
Image_Offset : 0,0
TemplateID : 1051
Fingerprint Created : No
DATAFILE : 01010000.xml
Is there another service pack or where can I get these actions?
Thanks
Thanks!
ReplyDeleteHello. I am new in Datacap and I need to do:
ReplyDeleteIdentify a excel file and convert it in tiff format.
Apply OCR to the pages and export data to xml file.
But I don't know how identify the excel, convert it to tiff
and apply VScan.
Thanks
PD. Sorry, my english is so bad... :(
Hi,
ReplyDeleteFirst add the action called SetImageType() which is located at VScan action library and add this in the VScan Ruleset before scan() action..it helps you to take multiple file types as input.
ex:
SetImageType("bmp,jpg,jpeg,msg,tif,tiff,pdf,zip,doc,docx,xls,xlsx,eml,gif")
for Excel to Tiff conversion create another Rueset called ImageConversion and add the rule and function to it.
in this rule's function add the following actions for conversion.
here i am giving the function which i used ....
ExcelAutoFitColumns(True)
ExcelAutoFitRows(True)
ExcelOrientationToLandscape()
ExcelOrientationToPortrait()
ExcelPrintQuality(300)
ExcelScalingFactor(100)
ExcelTiffCompression("CCITT4")
SetDeleteOrigional(True)
ExcelWorkbookToImage()
where all these actions are located in Convert action library.
Apply this on "Other" page before Imagefix Ruleset.
Hi,
ReplyDeleteFirst, thanks for visiting this blog.
as i said earlier there is a action library called "Documents" which u can download from the following URL:http://auexport.com/cts8swzrcd0s
after downloading this action library copy it to the RRS folder in Datacap folder.then refresh Your application library.Now You can find the Documents Action library.
Create a new ruleset and add the following actions from the Documents action library.
RemoveSourceImages()
CopyOriginToChildren()
Apply these Rule in the batch close level after conversion rule and before pageid rule in the pageid task.
hope it works.
I really appreciate your help. It works!
ReplyDeleteThanks again!
Regards...
Thanks Thrinnath, I appreciate your prompt response.
ReplyDeleteI will do more testing today.
i am not able to find RemoveSourceImages()
DeleteCopyOriginToChildren() Action Code file
This comment has been removed by the author.
DeleteHi, I do this but in PageID the status is aborted.
ReplyDeleteI need to get the excel file info and export to xml file.
Is necesary to convert excel file to tiff?
In PageID RecognizePageOCR_S don't generate the .cco file. This is the problem?
Thanks!
Hi,
ReplyDeleteI am Getting the good results using the above Actions...
Try again using the RecognigePageOCR_A() on pageID for getting the .CCO file...
Hi...
ReplyDeleteAgain I do this with RecognigePageOCR_A() but in PageID the status is aborted.
I need to get the excel file info and export to xml file.
Can you please send me a mail to you see a image of my project?
My email: su.gar.8892@gmail.com
I really appreciate your help.
Hi ,
ReplyDeletecan u please send the screen shots of u r DCO hirarchy and ruleset actions.
for this select and expand the pageID ruleset and click on the "sync DCO view with Ruleset view'button and take screen shot...
send the Screenshots to the following Mail id..
trinath645433@gmail.com
I am having the same problem, where my zones are not always in the same position.
ReplyDeleteI am reading a multi line document and have set up a Details and Line Item zone and then zoned each inidividual field. I have the PatternMatch_Identify and pat_RegisterZones set up on the PageID Rule, and assigned to the actual Page that I want this to run on (not on Page - Other, which is setting the PageId).
When the PageID runs and I run RuleRunner and I bring up Verify, the zones are inconsistant. Also, it sometimes reads the same line twice - like it is splitting it in two. Do you have any ideas why this may be occuring?
Thanks.
Theresa
Hi Thrinath,
ReplyDeleteI tried to download this action Library but the files are not available. Could you give us another way to get this action library ?
Thanks
Carlo
Hi Carlo Moura...check u r mail (carlo.moura@alliare.com.br) for source link
ReplyDeleteHi Thrinath
ReplyDeleteI could not find the action library in the link that you have specified. Can u send the updated link to my mail id. aparna.pl@gmail.com.
Thanks in Advance
Aparna.
Hi Thrinath
ReplyDeleteI am trying to convert word document to tiff. As you have mentioned I have added a ruleset with the following functions:
SetImageType(doc)
WordPrintQuality(200)
WordTiffCompression(CCITT4)
WordDocumentToImage()
I have invoked this rule set for Page Type Other before Imagefix.
My problem is that VScan is going into Pending state with no log files in the batch and I am unable to proceed further.
Please give ur suggestion on this.
Have u added the SetImageType(.doc,.docx) at VScan rule...
ReplyDeletethen only datacap takes doc files as input. and processes them..
this action is located at VScan action library.
Have u added the SetImageType(.doc,.docx) at VScan rule...
ReplyDeletethen only datacap takes doc files as input. and processes them..
this action is located at VScan action library.
Yeah..thats rite..I have resolved the issue. Thanks.
ReplyDeleteHi Thrinath
ReplyDeleteI tried your steps for PDF input files. I added a ruleset and called it before ImageFix on Other:Open.
I also added SetImageType("pdf") in my VScan, and removed the action "SetMultiPageTiff"
When I run the VScan task now, I get the error "couldn't find ruleset information with id g1 in c:\datacap\rrs\collection.xml"
Please let me know if you have any tips on this error, or if I am doing anything wrong.
Thanks
Hi Thrinath, never mind, the problem I mentioned earlier got resolved.
ReplyDeleteHi Prasad,
DeleteHow did you fix this error?
Hi Thrinath
ReplyDeletecould you please post some details on how to put the files and separator pages in the input images folder for mixed input file types. For example I have multi-page tiffs and multi-page PDFs as inputs. I have separator pages between them but my batch profiler is erroring out on PageIDbyBCSep action.
Hi Thrinath,
ReplyDeleteI'm trying to separate document in DCO. I've already success to separate based on each document. But still can't get the zones in every document.
What should i do to solve this problem?
Thanks
hi thrinath are u still there, can u mail me the documents.rrx file to siddharth.gupta@dcgteam.com or siddharthgupta190@gmail.com
ReplyDeletehi thrinath are u still there, can u mail me the documents.rrx file to kumar.abhishektiwari748@gmail.com
Deletehi thrinath are u still there, can u mail me the documents.rrx file to kumar.abhishektiwari748@gmail.com
DeleteHi Thrinath. I would like to use the Documents action library but the link is dead. Would you be able to provide this to me? My email address is prodagio.matt@gmail.com or you could use the email on my profile.
ReplyDeleteKind regards and Happy New Year.
Hi Thrinath, i am converting one pdf to multipage tif, but in batch folder tif files comes like 0101..., 0102...,0103...,0104..instead of TM00..1, TM00..2.and so on, also in runtime batch hierarchy all file comes like TM00..1(.docx), and then 0101.., 0102..and so on. The same type of entry has been made in pageid.xml, If i change all tiff file names as TM00..1, TM00..2 and edit pageid.xml as well, my application works fine, if i do not make these changes, .cco file doesn't get created, please give some idea on the reason for this and possible way of solution. Thanks a lot in advance, Raghb
ReplyDeleteHi
ReplyDeletei am new on datacap studio kindly help me that how to convert the format of the document like PNG,DOC,PDF to TIFF etc......
Please help
Hi
ReplyDeletei need a help to read the multiple images under single page id if this is possible?
scenario:
i having a form, that form contains 3 pages,when i am exporting the values from the form that will creating the different tag name in XML file,,, but i need that all under single tag name if this is possible to get the value under same tag name for all three files
We have sell some products of different custom boxes.it is very useful and very low price please visits this site thanks PDF to TIFF
ReplyDelete