Image processing in 2017

About myself

Pillow core team member.

Maker of the Pillow-SIMD library.

What I do

On-the-fly image processing service in Uploadcare.

Libraries

Pillow

python-pillow.org

Pillow-SIMD

github.com/uploadcare/pillow-simd

OpenCV

opencv.org

VIPS

jcupitt.github.io/libvips/

ImageMagick & GraphicsMagick

imagemagick.org, graphicsmagick.org

Performance

Always check your output

			from PIL import Image, ImageFilter.BoxBlur
			im.filter(ImageFilter.BoxBlur(3))
			...
		
			import cv2
			cv2.blur(im, ksize=(3, 3))
			...
		

The problem

			cv2.GaussianBlur(im, (window, window), radius)
		
radius = 3 58 ms
radius = 30 880 ms

The problem

			im.filter(ImageFilter.GaussianBlur(radius))
		
radius = 3 60 ms
radius = 30 61 ms

Resampling speed in Pillow, Mpx/s

Resampling speed in Pillow, Mpx/s

Resampling speed in Pillow, Mpx/s

Resampling speed in Pillow, Mpx/s

Pillow-SIMD speeds up

Some sequence of operations, Mpx/s

Load, rotate by 90°, reduce 2.5 times, apply blur, save to JPEG.

Some sequence of operations, Mpx/s

Results when you invest some time.

Benchmarking framework

Results page
https://python-pillow.org/pillow-perf/

Benchmark sources
https://github.com/python-pillow/pillow-perf

Concurrent working

Performance metrics

Concurrent working levels

  1. Application level

Actual execution time doesn't change.
Throughput grows in proportion to the number of cores.

Concurrent working levels

  1. Graphical operation level

Actual execution time lowers.
Throughput grows not in proportion to the number of cores.

Concurrent working levels

  1. Data and CPU instructions level (SIMD)

Actual execution time lowers.
Throughput grows.
Win-win.

Combining methods

SIMD operation
level
application
level

Multithreading

Release GIL
Pillow, OpenCV, pyvips, Wand

Doesn't release
pgmagick

The N + 1 rule

Create not more than N + 1 workers,
where N is a number of CPU cores or threads.

Worker — a process or thread doing the processing.

Asynchronous work

Executing imaging operations blocks event loop,
even if a library releases GIL.

			@gen.coroutine
    	def get(self, *args, **kwargs):
			    im = process_image(...)
    	    ...
		

Asynchronous work

			@run_on_executor(executor=ThreadPoolExecutor(1))
    	def process_image(self, ...):
    	    ...
    	@gen.coroutine
    	def get(self, *args, **kwargs):
			    im = yield process_image(...)
			    ...
		
		

File input/output

Lazy loading

			>>> from PIL import Image
			>>> %time im = Image.open('cover.jpg')
			Wall time: 1.2 ms
			>>> im.mode, im.size
			('RGB', (2152, 1345))
		

Lazy loading

			>>> from PIL import Image
			>>> %time im = Image.open('cover.jpg')
			Wall time: 1.2 ms
			>>> im.mode, im.size
			('RGB', (2152, 1345))
			>>> %time im.load()
			Wall time: 73.6 ms
		

Broken images mode

			from PIL import Image
			Image.open('trucated.jpg').save('trucated.out.jpg')
			IOError: image file is truncated (143 bytes not processed)
		

Broken images mode

			from PIL import Image, ImageFile
			ImageFile.LOAD_TRUNCATED_IMAGES = True
			Image.open('trucated.jpg').save('trucated.out.jpg')
		

Pillow VIPS OpenCV IM
Number of codecs 17 12+ 8 66
Broken images
Lazy loading
Reading EXIF and ICC
Auto rotation based on EXIF

OpenCV quirks

			cv2.imread(filename)
		

OpenCV quirks

			cv2.imread(filename, flags=cv2.IMREAD_UNCHANGED)
		

OpenCV, why?

OpenCV is not designed to work with untrusted sources.

Solution

Solution

OpenCV images are numpy arrays.

			import numpy
			from PIL import Image
			...
			pillow_image = Image.open(filename)
			cv_image = numpy.array(pillow_image)
		

Solution

			import numpy
			from PIL import Image
			...
			pillow_image = Image.fromarray(cv_image, "RGB")
			pillow_image.save(filename)
		

Questions

Slides: homm.github.io/image-libs-2017/

Email: ak@uploadcare.com