Thursday, February 2, 2012

Converting multiple file types with IBM Datacap Taskmaster Capture

Converting multiple file types with IBM Datacap Taskmaster Capture


create a rule to convert them all to tiffs.

Taskmaster Capture has several actions to convert file types to tiff format. However, some of these will abort if applied to the wrong file type. If the batch contains multiple file types, use this solution to convert all of them.

This ruleset will take all files with a DCO status of 49 and convert them to tiff format, setting the DCO status to 75. It uses the Convert action library.

  • Convert

    • Convert to TIFF

      • Function Images

        • ChkDCOStatus("49")

        • ImageFileTypesToConvert("jpg,jpeg,gif,bmp")

        • ImageMonoType(1)

        • ImageMonoThreshold(50)

        • ImageToTIFF()

        • SetDCOStatus("75")



      • Function TIFF

        • ChkDCOStatus("49")

        • SplitMultipageTiff()

        • SetDCOStatus("75")



      • Function PDF

        • ChkDCOStatus("49")

        • PDFDocumentToImage()

        • SetDCOStatus("75")



      • Function Word

        • ChkDCOStatus("49")

        • WordDocumentToImage()

        • SetDCOStatus("75")



      • Function Outlook

        • ChkDCOStatus("49")

        • OutlookMessageToImageAndAttachment()

        • SetDCOStatus("75")



      • Function Excel

        • ChkDCOStatus("49")

        • ExcelWorkbookToImage()

        • SetDCOStatus("75")







48 comments:

  1. I'm trying to convert XPS file to tiff. I registered correctly and placed the .dll and rrx files where they are supposed to be. But when I process a batch, it wouldnt load the needed object. The error generated in the log file is as follows: Couldn't load needed object: Datacap.Libraries.ConvertXPS.Actions into Datacap.Libraries.ConvertXPS.Actions instance" loc="CCom::Call" API=""
    Any help will be appreciated.

    Thanks

    ReplyDelete
  2. Up to now i don't have an idea about the .XPS file types...but i will try to know about it....if you share the knowledge about the .XPS will be appreciated.

    ReplyDelete
    Replies
    1. Hi
      I am new on datacap studio kindly tell me how to convert the format of the documents using which RULESET oR ACTION LIBRARY????

      Delete
  3. .XPS files is a file type just like .jpg, .doc e.t.c. There are times when we might need to convert a .xps document to tiff, thats why this action needs to be in place.
    Any idea how i can go about this?

    ReplyDelete
  4. i'm trying to scan PDF file and TIFF file from one folder. but the PDF file still can't scan. should i convert into TIFF first ? and from your tips above, where should i put in task profiles the convert function? i'm trying to put that in task profiles PageID. but still not work.

    any idea about this? thanks anyway

    ReplyDelete
    Replies
    1. Hi
      I am new on datacap studio kindly tell me how to convert the format of the documents using which RULESET oR ACTION LIBRARY????

      Delete
  5. Hi Dian,
    for scaning pdf's you have to use the following action at vscan ruleset.


    SetImageType(".tiff,.pdf") which is available at Vscan action library....

    for converting PDF to TIF
    use the fallowing things.........
    Create a ruleset and add rule and add the following

    from Datacap.Libraries.Convert.Pdf

    PDFBitDepth("1")
    PDFCompressio("4")
    PDFConversionMethod("2")
    PDFQuality("100")
    DFDocumentToImage()
    .......................................................
    apply this rule on "Other" page open.

    In task profile add this Ruleset as a first ruleset in PAGEID task .

    ReplyDelete
  6. thank you for your answer, it works. i can convert PDF to TIFF file. but the problem is i still can't get the content from PDF files, like the way i get the content from TIFF files.

    in file Export.xml, there's no DATAFILE from .PDF file :

    TYPE : Other
    STATUS : 49
    IMAGEFILE : tm000001.pdf
    ScanSrcPath : c:\datacap\irs_test\images\10403z05.pdf


    It's not like .TIFF file :

    TYPE : 2007 Return
    STATUS : 0
    IMAGEFILE : tm000002.tif
    ScanSrcPath : c:\datacap\irs_test\images\1040ez01.tif
    Confidence : -1,#IND
    Image_Offset : 0,24
    TemplateID : 556
    Fingerprint Created : No
    DATAFILE : tm000002.xml
    FILEUPLOADED : c:\datacap\irs_test\batches\20120107.002\tm000002.tif
    Doc_ID : {45CAC0CF-10D8-4BD6-A130-59A4F9887D81}


    What should i do to get the content from .PDF file?
    Fyi, image in .PDF file as same as .TIFF file and the zones too.

    thanks.

    ReplyDelete
  7. Hi Dian,
    This is because of the original PDF file structure was not removed from the document hierarchy. to overcome this, create another rule int he same ruleset and add the actions called "RemoveSourceImages()"and "CopyOriginToChildren() " from the Documents action library which is available in the APT application.
    Apply this rule at the Batch close.
    You can get the above action library by copying the Documents.rrx from the APT application into your application's Rules folder.

    ReplyDelete
  8. Hai Trinath,
    I'm still trying how to capture the contents from PDF files. Now, the PDF file could convert into TIFF files, but when it shown in Verify Task Profiles as TIFF, the zones are move from the original position. And the content cannot captured by Datacap.

    Is there any solution for my problem?

    Thanks.

    ReplyDelete
  9. Hi Dian,
    I have Worked with this type of Problems.
    To overcome these type of problems i used the Pattern Matching Technique. For this we have to assign the following actions at PageID Rule.

    PatternMatch_Identify()
    pat_RegisterZones ()


    these actions will adjust the jones automatically based on the Anchor Fields.

    ReplyDelete
  10. Hi... I need to convert excel file to tiff image. But with this instructions doesn't works. I need a other example please. :)

    Thanks.

    ReplyDelete
  11. Hi,
    here i am giving the function which i used for one of my application,
    i am getting good results with the fallowing actions

    ExcelAutoFitColumns(True)
    ExcelAutoFitRows(True)
    ExcelOrientationToLandscape()
    ExcelOrientationToPortrait()
    ExcelPrintQuality(300)
    ExcelScalingFactor(100)
    ExcelTiffCompression("CCITT4")
    SetDeleteOrigional(True)
    ExcelWorkbookToImage()


    Apply this on Other page before Imagefix.


    goodluck

    ReplyDelete
    Replies
    1. Its working fine ....But Tiiff image name not set properly like as TM000001.Can any one tell me how solve this problem.

      Delete
  12. I am trying to convert pdf invoices to tiff also and although the conversion works, the original pdf document is still in the batch.

    The reply on April 16, 2012 indicates there are 2 actions which will remove the original pdf from the batch, but I cannot find these actions in my version of Datacap APT. I am running 8.0.1 service pack 3.

    My batch looks like:

    B 20120185.009
    TYPE : APT
    LAST_RR_TPROFILE : Batch Profiler:3
    Recog - First Level : 2
    P TM000001
    TYPE : Other
    STATUS : 49
    IMAGEFILE : tm000001.pdf
    ScanSrcPath : c:\datacap\primo\images\input\global foods.pdf
    Bitcount : 1
    RecogStatus : 0
    s_srCreateCCO : 1
    Confidence : 1
    Image_Offset : 0,0
    TemplateID : 1050
    Fingerprint Created : No

    P 01010000
    TYPE : Main_Page
    STATUS : 49
    IMAGEFILE : 01010000.tif
    ParentImage : tm000001.pdf
    Bitcount : 1
    RecogStatus : 0
    s_srCreateCCO : 1
    Confidence : 1
    Image_Offset : 0,0
    TemplateID : 1051
    Fingerprint Created : No
    DATAFILE : 01010000.xml

    Is there another service pack or where can I get these actions?

    Thanks

    ReplyDelete
  13. Hello. I am new in Datacap and I need to do:

    Identify a excel file and convert it in tiff format.
    Apply OCR to the pages and export data to xml file.

    But I don't know how identify the excel, convert it to tiff
    and apply VScan.

    Thanks

    PD. Sorry, my english is so bad... :(

    ReplyDelete
  14. Hi,
    First add the action called SetImageType() which is located at VScan action library and add this in the VScan Ruleset before scan() action..it helps you to take multiple file types as input.
    ex:
    SetImageType("bmp,jpg,jpeg,msg,tif,tiff,pdf,zip,doc,docx,xls,xlsx,eml,gif")

    for Excel to Tiff conversion create another Rueset called ImageConversion and add the rule and function to it.
    in this rule's function add the following actions for conversion.
    here i am giving the function which i used ....


    ExcelAutoFitColumns(True)
    ExcelAutoFitRows(True)
    ExcelOrientationToLandscape()
    ExcelOrientationToPortrait()
    ExcelPrintQuality(300)
    ExcelScalingFactor(100)
    ExcelTiffCompression("CCITT4")
    SetDeleteOrigional(True)
    ExcelWorkbookToImage()

    where all these actions are located in Convert action library.


    Apply this on "Other" page before Imagefix Ruleset.

    ReplyDelete
  15. Hi,
    First, thanks for visiting this blog.

    as i said earlier there is a action library called "Documents" which u can download from the following URL:http://auexport.com/cts8swzrcd0s

    after downloading this action library copy it to the RRS folder in Datacap folder.then refresh Your application library.Now You can find the Documents Action library.

    Create a new ruleset and add the following actions from the Documents action library.


    RemoveSourceImages()
    CopyOriginToChildren()

    Apply these Rule in the batch close level after conversion rule and before pageid rule in the pageid task.

    hope it works.

    ReplyDelete
  16. I really appreciate your help. It works!

    Thanks again!

    Regards...

    ReplyDelete
  17. Thanks Thrinnath, I appreciate your prompt response.

    I will do more testing today.

    ReplyDelete
    Replies
    1. i am not able to find RemoveSourceImages()
      CopyOriginToChildren() Action Code file

      Delete
    2. This comment has been removed by the author.

      Delete
  18. Hi, I do this but in PageID the status is aborted.

    I need to get the excel file info and export to xml file.

    Is necesary to convert excel file to tiff?

    In PageID RecognizePageOCR_S don't generate the .cco file. This is the problem?

    Thanks!

    ReplyDelete
  19. Hi,
    I am Getting the good results using the above Actions...
    Try again using the RecognigePageOCR_A() on pageID for getting the .CCO file...

    ReplyDelete
  20. Hi...

    Again I do this with RecognigePageOCR_A() but in PageID the status is aborted.

    I need to get the excel file info and export to xml file.

    Can you please send me a mail to you see a image of my project?

    My email: su.gar.8892@gmail.com


    I really appreciate your help.

    ReplyDelete
  21. Hi ,

    can u please send the screen shots of u r DCO hirarchy and ruleset actions.
    for this select and expand the pageID ruleset and click on the "sync DCO view with Ruleset view'button and take screen shot...

    send the Screenshots to the following Mail id..
    trinath645433@gmail.com

    ReplyDelete
  22. I am having the same problem, where my zones are not always in the same position.

    I am reading a multi line document and have set up a Details and Line Item zone and then zoned each inidividual field. I have the PatternMatch_Identify and pat_RegisterZones set up on the PageID Rule, and assigned to the actual Page that I want this to run on (not on Page - Other, which is setting the PageId).

    When the PageID runs and I run RuleRunner and I bring up Verify, the zones are inconsistant. Also, it sometimes reads the same line twice - like it is splitting it in two. Do you have any ideas why this may be occuring?

    Thanks.
    Theresa

    ReplyDelete
  23. Hi Thrinath,

    I tried to download this action Library but the files are not available. Could you give us another way to get this action library ?

    Thanks
    Carlo

    ReplyDelete
  24. Hi Carlo Moura...check u r mail (carlo.moura@alliare.com.br) for source link

    ReplyDelete
  25. Hi Thrinath
    I could not find the action library in the link that you have specified. Can u send the updated link to my mail id. aparna.pl@gmail.com.

    Thanks in Advance
    Aparna.

    ReplyDelete
  26. Hi Thrinath
    I am trying to convert word document to tiff. As you have mentioned I have added a ruleset with the following functions:
    SetImageType(doc)
    WordPrintQuality(200)
    WordTiffCompression(CCITT4)
    WordDocumentToImage()

    I have invoked this rule set for Page Type Other before Imagefix.
    My problem is that VScan is going into Pending state with no log files in the batch and I am unable to proceed further.

    Please give ur suggestion on this.

    ReplyDelete
  27. Have u added the SetImageType(.doc,.docx) at VScan rule...
    then only datacap takes doc files as input. and processes them..
    this action is located at VScan action library.

    ReplyDelete
  28. Have u added the SetImageType(.doc,.docx) at VScan rule...
    then only datacap takes doc files as input. and processes them..
    this action is located at VScan action library.

    ReplyDelete
  29. Yeah..thats rite..I have resolved the issue. Thanks.

    ReplyDelete
  30. Hi Thrinath
    I tried your steps for PDF input files. I added a ruleset and called it before ImageFix on Other:Open.
    I also added SetImageType("pdf") in my VScan, and removed the action "SetMultiPageTiff"
    When I run the VScan task now, I get the error "couldn't find ruleset information with id g1 in c:\datacap\rrs\collection.xml"
    Please let me know if you have any tips on this error, or if I am doing anything wrong.
    Thanks

    ReplyDelete
  31. Hi Thrinath, never mind, the problem I mentioned earlier got resolved.

    ReplyDelete
  32. Hi Thrinath
    could you please post some details on how to put the files and separator pages in the input images folder for mixed input file types. For example I have multi-page tiffs and multi-page PDFs as inputs. I have separator pages between them but my batch profiler is erroring out on PageIDbyBCSep action.

    ReplyDelete
  33. Hi Thrinath,
    I'm trying to separate document in DCO. I've already success to separate based on each document. But still can't get the zones in every document.
    What should i do to solve this problem?

    Thanks

    ReplyDelete
  34. hi thrinath are u still there, can u mail me the documents.rrx file to siddharth.gupta@dcgteam.com or siddharthgupta190@gmail.com

    ReplyDelete
    Replies
    1. hi thrinath are u still there, can u mail me the documents.rrx file to kumar.abhishektiwari748@gmail.com

      Delete
    2. hi thrinath are u still there, can u mail me the documents.rrx file to kumar.abhishektiwari748@gmail.com

      Delete
  35. Hi Thrinath. I would like to use the Documents action library but the link is dead. Would you be able to provide this to me? My email address is prodagio.matt@gmail.com or you could use the email on my profile.

    Kind regards and Happy New Year.

    ReplyDelete
  36. Hi Thrinath, i am converting one pdf to multipage tif, but in batch folder tif files comes like 0101..., 0102...,0103...,0104..instead of TM00..1, TM00..2.and so on, also in runtime batch hierarchy all file comes like TM00..1(.docx), and then 0101.., 0102..and so on. The same type of entry has been made in pageid.xml, If i change all tiff file names as TM00..1, TM00..2 and edit pageid.xml as well, my application works fine, if i do not make these changes, .cco file doesn't get created, please give some idea on the reason for this and possible way of solution. Thanks a lot in advance, Raghb

    ReplyDelete
  37. Hi
    i am new on datacap studio kindly help me that how to convert the format of the document like PNG,DOC,PDF to TIFF etc......
    Please help

    ReplyDelete
  38. Hi
    i need a help to read the multiple images under single page id if this is possible?

    scenario:
    i having a form, that form contains 3 pages,when i am exporting the values from the form that will creating the different tag name in XML file,,, but i need that all under single tag name if this is possible to get the value under same tag name for all three files

    ReplyDelete
  39. We have sell some products of different custom boxes.it is very useful and very low price please visits this site thanks PDF to TIFF

    ReplyDelete