Question

当前，我正在使用下面的代码从图像中获取文本，它可以正常工作，但不适用于这两个图像，似乎tesseract无法扫描这些类型的图像。请告诉我如何解决

https://i.ibb.co/zNkbhKG/Untitled1.jpg

https://i.ibb.co/XVbjc3s/Untitled3.jpg

def read_screen():
        spinner = Halo(text='Reading screen', spinner='bouncingBar')
        spinner.start()
        screenshot_file="Screens/to_ocr.png"
        screen_grab(screenshot_file)

        #prepare argparse
        ap = argparse.ArgumentParser(description='HQ_Bot')
        ap.add_argument("-i", "--image", required=False,default=screenshot_file,help="path to input image to be OCR'd")
        ap.add_argument("-p", "--preprocess", type=str, default="thresh", help="type of preprocessing to be done")
        args = vars(ap.parse_args())

        # load the image 
        image = cv2.imread(args["image"])
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        if args["preprocess"] == "thresh":
                gray = cv2.threshold(gray, 177, 177,
                        cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
        elif args["preprocess"] == "blur":
                gray = cv2.medianBlur(gray, 3)

        # store grayscale image as a temp file to apply OCR
        filename = "Screens/{}.png".format(os.getpid())
        cv2.imwrite(filename, gray)

        # load the image as a PIL/Pillow image, apply OCR, and then delete the temporary file
        pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
        #ENG
        #text = pytesseract.image_to_string(Image.open(filename))

        #VIET
        text = pytesseract.image_to_string(Image.open(filename), lang='vie')

        os.remove(filename)
        os.remove(screenshot_file)

        # show the output images

        '''cv2.imshow("Image", image)
        cv2.imshow("Output", gray)
        os.remove(screenshot_file)
        if cv2.waitKey(0):
                cv2.destroyAllWindows()
        print(text)
        '''
        spinner.succeed()
        spinner.stop()
        return text

Answer 1

您应该尝试不同的psm模式，而不是像这样的默认模式：

@Suspendable
override fun call(): SignedTransaction {
// Obtain a reference to the notary we want to use.
val notary = serviceHub.networkMapCache.notaryIdentities[0]

// Stage 1.
progressTracker.currentStep = GENERATING_TRANSACTION
// Generate an unsigned transaction.
val iouState = IOUState(iouValue, serviceHub.myInfo.legalIdentities.first(), otherParty)
val txCommand = Command(IOUContract.Commands.Create(), listOf(ourIdentity.owningKey))
val txBuilder = TransactionBuilder(notary)
        .addOutputState(iouState, IOU_CONTRACT_ID)
        .addCommand(txCommand)

// Stage 2.
progressTracker.currentStep = VERIFYING_TRANSACTION
// Verify that the transaction is valid.
txBuilder.verify(serviceHub)

// Stage 3.
progressTracker.currentStep = SIGNING_TRANSACTION
// Sign the transaction.
val partSignedTx = serviceHub.signInitialTransaction(txBuilder)

// Stage 5.
progressTracker.currentStep = FINALISING_TRANSACTION
// Notarise and record the transaction in both parties' vaults.
return subFlow(FinalityFlow(partSignedTx,emptyList()))
}

来自文档的专家

target = pytesseract.image_to_string(im,config='--psm 4',lang='vie')

例如，对于Page segmentation modes: 0 Orientation and script detection (OSD) only. 1 Automatic page segmentation with OSD. 2 Automatic page segmentation, but no OSD, or OCR. 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes. 5 Assume a single uniform block of vertically aligned text. 6 Assume a single uniform block of text. 7 Treat the image as a single text line. 8 Treat the image as a single word. 9 Treat the image as a single word in a circle. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order. 12 Sparse text with OSD. 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.，您可以尝试/Untitled3.jpg，但如果失败，则可以同时尝试--psm 4。

根据您的tesseract版本，您还可以尝试不同的oem模式：

对于LSTM，请使用--oem 1，对于Legacy Tesseract，请使用--oem 0。请注意，传统的Tesseract模型仅包含在tessdata存储库的经过训练的数据文件中。

编辑

在图像中也可以看到两种语言，因此，如果您希望使用--psm 11参数，则需要将图像手动分成两部分，以免混淆tesseract引擎，并为其使用不同的lang值

编辑2

下面是有关Unitiled3的完整工作示例。我注意到的是您对阈值的使用不当。您应该将lang设置为大于阈值的值。就像在我的示例中一样，我将maxval设置为177，但将thresh设置为255，因此177以上的所有内容均为黑色。我什至不需要做任何二值化。

maxval

输出：

import cv2
import pytesseract
from cv2.cv2 import imread, cvtColor, COLOR_BGR2GRAY, threshold, THRESH_BINARY

image = imread("./Untitled3.jpg")
image = cvtColor(image,COLOR_BGR2GRAY)
_,image = threshold(image,177,255,THRESH_BINARY)
cv2.namedWindow("TEST")
cv2.imshow("TEST",image)
cv2.waitKey()
text = pytesseract.image_to_string(image, lang='eng')
print(text)

如何使用Tesseract从此图像中获取文本？

1 个答案: