How to build tesseract 4 beta on macOS
1 | brew info tesseract |
The result of recognition on Chinese - Simplified
is a little bit terrifying.
I noticed that it added a new neural network system based on LSTMs after 4.0.0+
But it need to be build from source code on macOS.
Thankfully, the manul is quit specify on their README.md
Install dependencies
1 | brew install automake autoconf autoconf-archive libtool |
Compile
1 | git clone https://github.com/tesseract-ocr/tesseract/ |
Their best trained modes, download the language chi_sim.traineddata
and put it under tesseract/4.0.0.1/tessdata/
Usage
1 | tesseract image.png image -l chi_sim |
OK, it is still terrible under the Song typeface
font. It need to be trained a new model by myself.
Finally, ignoring the tesseract
, I found drag the image to OneNote, and Ctrl + Click
-> Copy Text from Picture
will get more Accuracy. 😓