教程.txt 2.0 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243
  1. Step 1: Make box files for images that we want to train
  2. Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox
  3. Eg:tesseract test.font.exp0.tif test.font.exp0 batch.nochop makebox
  4. {*Note: After making box files we have to change or modify wrongly identified characters in box files.}
  5. Step 2: Create .tr file (Compounding image file and box file)
  6. Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train
  7. Eg: tesseract test.font.exp0.tif test.font.exp0 box.train
  8. step 3: Extract the charset from the box files (Output for this command is unicharset file)
  9. Syntax: unicharset_extractor [langname].[fontname].[expN].box
  10. Eg: unicharset_extractor test.font.exp0.box
  11. step 4: Create a font_properties file based on our needs.
  12. Syntax: echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" [angle bracket should be here] font_properties
  13. Eg: echo "arial 0 0 1 0 0" > font_properties
  14. step 4.5:生成shape文件
  15. Eg: shapeclustering -F font_properties -U unicharset -O test.unicharset test.font.exp0.tr
  16. Step 5: Training the data.
  17. Syntax: mftraining -F font_properties -U unicharset -O [langname].unicharset [langname].[fontname].[expN].tr
  18. Eg: mftraining -F font_properties -U unicharset -O test.unicharset test.font.exp0.tr
  19. Step 6:
  20. Syntax: cntraining [langname].[fontname].[expN].tr
  21. Eg: cntraining test.font.exp0.tr
  22. {*Note:After step 5 and step 6 four files were created.(shapetable,inttemp,pffmtable,normproto) }
  23. Step 7: Rename four files (shapetable,inttemp,pffmtable,normproto) into ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto)
  24. Syntax: rename filename1 filename2
  25. Eg:
  26. rename shapetable train.shapetable
  27. rename inttemp train.inttemp
  28. rename pffmtable train.pffmtable
  29. rename normproto train.normproto
  30. Step 8: Create .traineddata file
  31. Syntax: combine_tessdata [langname].
  32. Eg: combine_tessdata test.