| 12345678910111213141516171819202122232425262728293031323334353637383940414243 |
- Step 1: Make box files for images that we want to train
- Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox
- Eg:tesseract test.font.exp0.tif test.font.exp0 batch.nochop makebox
- {*Note: After making box files we have to change or modify wrongly identified characters in box files.}
- Step 2: Create .tr file (Compounding image file and box file)
- Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train
- Eg: tesseract test.font.exp0.tif test.font.exp0 box.train
- step 3: Extract the charset from the box files (Output for this command is unicharset file)
- Syntax: unicharset_extractor [langname].[fontname].[expN].box
- Eg: unicharset_extractor test.font.exp0.box
- step 4: Create a font_properties file based on our needs.
- Syntax: echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" [angle bracket should be here] font_properties
- Eg: echo "arial 0 0 1 0 0" > font_properties
- step 4.5:生成shape文件
- Eg: shapeclustering -F font_properties -U unicharset -O test.unicharset test.font.exp0.tr
- Step 5: Training the data.
- Syntax: mftraining -F font_properties -U unicharset -O [langname].unicharset [langname].[fontname].[expN].tr
- Eg: mftraining -F font_properties -U unicharset -O test.unicharset test.font.exp0.tr
- Step 6:
- Syntax: cntraining [langname].[fontname].[expN].tr
- Eg: cntraining test.font.exp0.tr
- {*Note:After step 5 and step 6 four files were created.(shapetable,inttemp,pffmtable,normproto) }
- Step 7: Rename four files (shapetable,inttemp,pffmtable,normproto) into ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto)
- Syntax: rename filename1 filename2
- Eg:
- rename shapetable train.shapetable
- rename inttemp train.inttemp
- rename pffmtable train.pffmtable
- rename normproto train.normproto
- Step 8: Create .traineddata file
- Syntax: combine_tessdata [langname].
- Eg: combine_tessdata test.
|