Read Book is an iOS application which helps us to recognise text from an image and even you can play that text and copy it for further use. I have used Vision for Text Recognition and AVFoundation for speech.
And when you select any image and click on convert, you will get the text like this and you can even play that text as shown below:
For recognising the text from any image first you have to import vison and convert the image into cgImage type. Then make a handler of VNImageRequestHandler type and a request of VNRecognizeTextRequest type. After that assign observation to request and extract text from the observation and ask handler to handle the request. You can even add some request property that how text should be extracted like, recognitionLanguage, recognitionLevel, etc.
import Vision
func requestText() {
guard let cgImage = self.recievedImage?.cgImage else { return }
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
var request = VNRecognizeTextRequest(completionHandler: nil)
var text = ""
request = VNRecognizeTextRequest(completionHandler: {(request, error) in
guard let observations = request.results as? [VNRecognizedTextObservation] else { fatalError("Invalid ovservation")}
for observation in observations {
guard let topCandidate = observation.topCandidates(1).first else {
print("Not candidate")
continue
}
text += "\n\(topCandidate.string)"
}
DispatchQueue.main.async {
self.imageTextView.text = text
}
})
request.customWords = ["custOm"]
request.minimumTextHeight = 0.03125
request.recognitionLevel = .accurate
request.recognitionLanguages = ["en_US"]
request.usesLanguageCorrection = true
let requests = [request]
DispatchQueue.global(qos: .userInitiated).async {
try?handler.perform(requests)
}
}
For speech we use AVFoundation and its AVSpeechSynthesizer and also in this you can set the speech property as you like. For ex: voice, rate, etc.
import AVFoundation
let synthesizer = AVSpeechSynthesizer()
func requestSound(text: String) {
let utterance = AVSpeechUtterance(string: text)
utterance.voice = AVSpeechSynthesisVoice(language: "en-GB")
utterance.rate = 0.5
synthesizer.speak(utterance)
}
We can even recognise text using CoreML and GoogleMLKit/TextRecognition
Basically Text Recognition is a part of OCR. OCR stands for Optical character recognition or optical character reader. OCR Will scan the document or image file and then converting the text into a machine-readable.
let me break process one by one and explain you
Image Acquisition
In this process, an Image/ document will be scanned and replace each pixel in an image with a black or a white pixel Example Image:
Pre-processing
Areas outside the text will be removed Example Image:
After Pre-processing that black and white image we will get like the above image.
Segmentation
Just look at the 22 it was like joined with one and other , So in this process OCR will segmenting these type
Feature Extraction:
- In this process each and every character will be Recognize and convert as machine-readable text
- OCR have many font will compare and convert it
- There are many Approach, will show some two
Approach #1
Will scan by single, single character and compare with functions
Approach #2
In this Approach will take line by line (Like Human eyes reading )and will convert it
Like this there are many Approach, Its based on what tech we need
Post-Processing
Computer also do some mistake (OCR make some spelling mistake while recognition), So here will try to correct it.
So in iOS we use Vision for the OCR Process
Thank You, Happy Learning!