Convert Image to CVPixelBuffer for Machine Learning Swift

15,063

Solution 1

You don't need to do a bunch of image mangling yourself to use a Core ML model with an image — the new Vision framework can do that for you.

import Vision
import CoreML

let model = try VNCoreMLModel(for: MyCoreMLGeneratedModelClass().model)
let request = VNCoreMLRequest(model: model, completionHandler: myResultsMethod)
let handler = VNImageRequestHandler(url: myImageURL)
handler.perform([request])

func myResultsMethod(request: VNRequest, error: Error?) {
    guard let results = request.results as? [VNClassificationObservation]
        else { fatalError("huh") }
    for classification in results {
        print(classification.identifier, // the scene label
              classification.confidence)
    }

}

The WWDC17 session on Vision should have a bit more info — it's tomorrow afternoon.

Solution 2

You can use a pure CoreML, but you should resize an image to (224,224)

    DispatchQueue.global(qos: .userInitiated).async {
        // Resnet50 expects an image 224 x 224, so we should resize and crop the source image
        let inputImageSize: CGFloat = 224.0
        let minLen = min(image.size.width, image.size.height)
        let resizedImage = image.resize(to: CGSize(width: inputImageSize * image.size.width / minLen, height: inputImageSize * image.size.height / minLen))
        let cropedToSquareImage = resizedImage.cropToSquare()

        guard let pixelBuffer = cropedToSquareImage?.pixelBuffer() else {
            fatalError()
        }
        guard let classifierOutput = try? self.classifier.prediction(image: pixelBuffer) else {
            fatalError()
        }

        DispatchQueue.main.async {
            self.title = classifierOutput.classLabel
        }
    }

// ...

extension UIImage {

    func resize(to newSize: CGSize) -> UIImage {
        UIGraphicsBeginImageContextWithOptions(CGSize(width: newSize.width, height: newSize.height), true, 1.0)
        self.draw(in: CGRect(x: 0, y: 0, width: newSize.width, height: newSize.height))
        let resizedImage = UIGraphicsGetImageFromCurrentImageContext()!
        UIGraphicsEndImageContext()

        return resizedImage
    }

    func cropToSquare() -> UIImage? {
        guard let cgImage = self.cgImage else {
            return nil
        }
        var imageHeight = self.size.height
        var imageWidth = self.size.width

        if imageHeight > imageWidth {
            imageHeight = imageWidth
        }
        else {
            imageWidth = imageHeight
        }

        let size = CGSize(width: imageWidth, height: imageHeight)

        let x = ((CGFloat(cgImage.width) - size.width) / 2).rounded()
        let y = ((CGFloat(cgImage.height) - size.height) / 2).rounded()

        let cropRect = CGRect(x: x, y: y, width: size.height, height: size.width)
        if let croppedCgImage = cgImage.cropping(to: cropRect) {
            return UIImage(cgImage: croppedCgImage, scale: 0, orientation: self.imageOrientation)
        }

        return nil
    }

    func pixelBuffer() -> CVPixelBuffer? {
        let width = self.size.width
        let height = self.size.height
        let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue,
                     kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary
        var pixelBuffer: CVPixelBuffer?
        let status = CVPixelBufferCreate(kCFAllocatorDefault,
                                         Int(width),
                                         Int(height),
                                         kCVPixelFormatType_32ARGB,
                                         attrs,
                                         &pixelBuffer)

        guard let resultPixelBuffer = pixelBuffer, status == kCVReturnSuccess else {
            return nil
        }

        CVPixelBufferLockBaseAddress(resultPixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
        let pixelData = CVPixelBufferGetBaseAddress(resultPixelBuffer)

        let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
        guard let context = CGContext(data: pixelData,
                                      width: Int(width),
                                      height: Int(height),
                                      bitsPerComponent: 8,
                                      bytesPerRow: CVPixelBufferGetBytesPerRow(resultPixelBuffer),
                                      space: rgbColorSpace,
                                      bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue) else {
                                        return nil
        }

        context.translateBy(x: 0, y: height)
        context.scaleBy(x: 1.0, y: -1.0)

        UIGraphicsPushContext(context)
        self.draw(in: CGRect(x: 0, y: 0, width: width, height: height))
        UIGraphicsPopContext()
        CVPixelBufferUnlockBaseAddress(resultPixelBuffer, CVPixelBufferLockFlags(rawValue: 0))

        return resultPixelBuffer
    }
}

The expected image size for inputs you can find in the mimodel file: enter image description here

A demo project that uses both pure CoreML and Vision variants you can find here: https://github.com/handsomecode/iOS11-Demos/tree/coreml_vision/CoreML/CoreMLDemo

Solution 3

If the input is UIImage, rather than an URL, and you want to use VNImageRequestHandler, you can use CIImage.

func updateClassifications(for image: UIImage) {

    let orientation = CGImagePropertyOrientation(image.imageOrientation)

    guard let ciImage = CIImage(image: image) else { return }

    let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation)

}

From Classifying Images with Vision and Core ML

Share:
15,063

Related videos on Youtube

Alex Wulff
Author by

Alex Wulff

I'm a student app developer and Arduino hobbyist. Check out www.ConiferApps.com and www.AlexWulff.com to see some of the things that I've done.

Updated on September 15, 2022

Comments

  • Alex Wulff
    Alex Wulff almost 2 years

    I am trying to get Apple's sample Core ML Models that were demoed at the 2017 WWDC to function correctly. I am using the GoogLeNet to try and classify images (see the Apple Machine Learning Page). The model takes a CVPixelBuffer as an input. I have an image called imageSample.jpg that I'm using for this demo. My code is below:

            var sample = UIImage(named: "imageSample")?.cgImage
            let bufferThree = getCVPixelBuffer(sample!)
    
            let model = GoogLeNetPlaces()
            guard let output = try? model.prediction(input: GoogLeNetPlacesInput.init(sceneImage: bufferThree!)) else {
                fatalError("Unexpected runtime error.")
            }
    
            print(output.sceneLabel)
    

    I am always getting the unexpected runtime error in the output rather than an image classification. My code to convert the image is below:

    func getCVPixelBuffer(_ image: CGImage) -> CVPixelBuffer? {
            let imageWidth = Int(image.width)
            let imageHeight = Int(image.height)
    
            let attributes : [NSObject:AnyObject] = [
                kCVPixelBufferCGImageCompatibilityKey : true as AnyObject,
                kCVPixelBufferCGBitmapContextCompatibilityKey : true as AnyObject
            ]
    
            var pxbuffer: CVPixelBuffer? = nil
            CVPixelBufferCreate(kCFAllocatorDefault,
                                imageWidth,
                                imageHeight,
                                kCVPixelFormatType_32ARGB,
                                attributes as CFDictionary?,
                                &pxbuffer)
    
            if let _pxbuffer = pxbuffer {
                let flags = CVPixelBufferLockFlags(rawValue: 0)
                CVPixelBufferLockBaseAddress(_pxbuffer, flags)
                let pxdata = CVPixelBufferGetBaseAddress(_pxbuffer)
    
                let rgbColorSpace = CGColorSpaceCreateDeviceRGB();
                let context = CGContext(data: pxdata,
                                        width: imageWidth,
                                        height: imageHeight,
                                        bitsPerComponent: 8,
                                        bytesPerRow: CVPixelBufferGetBytesPerRow(_pxbuffer),
                                        space: rgbColorSpace,
                                        bitmapInfo: CGImageAlphaInfo.premultipliedFirst.rawValue)
    
                if let _context = context {
                    _context.draw(image, in: CGRect.init(x: 0, y: 0, width: imageWidth, height: imageHeight))
                }
                else {
                    CVPixelBufferUnlockBaseAddress(_pxbuffer, flags);
                    return nil
                }
    
                CVPixelBufferUnlockBaseAddress(_pxbuffer, flags);
                return _pxbuffer;
            }
    
            return nil
        }
    

    I got this code from a previous StackOverflow post (last answer here). I recognize that the code may not be correct, but I have no idea of how to do this myself. I believe that this is the section that contains the error. The model calls for the following type of input: Image<RGB,224,224>

  • Alex Wulff
    Alex Wulff about 7 years
    Works like a charm (with some modifications), thanks. I didn't realize that Vision had a specific type of request for models that output information from an image input. I guess I should've payed closer attention to the documentation...
  • chengsam
    chengsam almost 7 years
    For the original question, VNImageRequestHandler(cgImage: CGImage) is more appropriate.
  • rickster
    rickster almost 7 years
    @chengsam Not really—the original question starts from a resource on disk. Reading that in as a UIImage, converting to CGImage, and passing that to Vision loses metadata long the way, but passing the resource URL keeps that metadata available to Vision.
  • mskw
    mskw over 6 years
    If the MLModel requires a grayscale image, does the VNImageRequestHandler converts that into grayscale?