(Difference between revisions)
Jump to: navigation, search
Line 31: Line 31:
====== Diagram of a misprint detection system  ======
====== Diagram of a misprint detection system  ======
{| align="center"
====== <br> ======
The sytem take one image as input: I_in, and it generates two images:  
The sytem take one image as input: I_in, and it generates two images:  

Revision as of 13:34, 10 August 2015

This wiki describes hardware systems and software algorithms for rapidly detecting misprints in high-speed digital presses. (G. Leseur, N. Meunier, P. B. Catrysse, and B. A. Wandell with support from an HP Labs Innovation Research Award)

Modern high-speed digital presses print a relatively small number of copies of a document, such as books or brochures, and they typically do this much faster than traditional rotary presses. Digital presses also permit the printing of short-runs, rather than the very large volumes that are needed to justify a traditional (rotary) press. When only a small number of copies are printed, the proportional cost of each error is much higher. If an error is caught after 50 copies in a short-run of 100 copies, the cost is increased by 50%. If the error is caught after 50 copies of a typical 1,000,000 copy run then the increase in cost is much lower. Hence, detecting misprints is particularly important on short-runs.

In this project, we study a high-speed digital sensing solution for real-time misprint detection in printing presses. Click here for an overview of the project and methods, and a simulation of the misprint detector.


Background and Prior Art

There seems to be very little academic literature on misprint detection. This is not very surprising as it is an engineering problem more likely to be tackled inside firms. Moreover each method has to be specific to a printer. Nevertheless, we will see that assuming that we know the properties of the printer, it is possible to develop a fairly general method to detect misprints, and this method can then be applied to the printer, using its particular properties. This paper1 deals with a quite similar problem in pad printing, they developed a real time method, but they are using a template method, which is what we want to avoid here because of the small volume constraint.

On the other hand, there exist a lot of patents dealing with misprint detection. Most of work pertains to misprint detection with rotary presses and relies on the fact that you can first take an picture of a correct print. Assuming correct print, these methods can compare the images that you get with the image that you know to be correct. This can a perfectly acceptable solution when you have a lot of copies, but if you want to print only a few dozen copies, this becomes more and more useless. Examples include patents like this one2, or this one3.

Some other patents are dealing with self-testing printers: the printer prints a specific known pattern and then the system tries to decide if the printer is working correctly here4, but this is not a general approach: the printer may be working and there could still be a problem with the paper after some pages.

The only found one a fully automatic method in an HP patent, but it does not seem to take into account the real time constraint for the sensing, or the computational cost of the detection algorithm (see this page5). It is also assuming that the sensed image is accurate enough to be compared with the original digital file after applying some transformation, and therefore does not directly simulate what the image should be.

Here, in contrast, we are developing a fully-automated, real-time method for the misprint detection.

System Implementation

We used ISET to model a high-speed document sensing system. We modeled the optical imaging system and one CCD and one CMOS sensor that measure the printed page.

We also developed algorithms to convert the original image into a reflectance map of the printed page. Our main objective while developing those models was to be able to process them in real-time. With both this transformation and the previous model, we have an entire pipeline to simulate the information we should receive on our sensor given our original image. From this pipeline, we created a large model of conversion ( a look-up table ) to avoid using ISET to resimulate the process for every images. This simulation process provide us two type of information : the first one is the expected image on the sensor. The second one is a typical standard deviation that describe the variation of the value on the sensor that we can expect for a well-printed image given the different noises we modeled.

Using this simulation model, we could develop some tools to detect misprints. First, we created a toolbox to simulate different types of errors (color removal, color shifting, local misprints...). Then, we are using the information from the original image to control our error search algorithm in the sense that we are comparing the difference between the simulated image (I_sim) and the sensed image (I_sensed) with respect to what the noise level should be in this particular area. Finally, we developed some statistical tools to decide whether or not an image contain an error. For this, we used the different information we computed from our original image and comparing our final sensor image with the expected one and taking into account the classical standard deviation that could appear from a well-printed image. Using a learning method, we could compute conditional probabilities distributions that we can then use to detect errors in our printed images.

Our system, as we implemented it, works according to the diagram below:

Diagram of a misprint detection system
Diagram of a misprint detection system

The sytem take one image as input: I_in, and it generates two images:

  1. The right column represents the actual image (I_sensed) of the printed page: the digital file is printed by the printer, then we add inside the printer a light (whose properties we know perfectly) and we add a sensor. In our case, this is a line sensor. The sensor is also monochromatic: we are not using any filtering to help discriminating the colors. Though this limitation may seem to make the problem much difficult, it has one main advantage: this limits the computation need to simulate the images, as well as to process them.
  2. The left represents the same process, but fully simulated: as we know the file that should be printed, and the printer properties (particularly the reflectance of the ink), we can calculate for each pixel of the printed paper a reflectance.
    • This gives us a reflectance map.
    • We know the light that we added in the printer, by combining its properties with the reflectance map, it gives us a radiance map.
    • We also know the properties of the optics and the sensor, so for from the radiance map, we can compute an output (I_sim).

This simulated output represents a possible output of the sensor, as by definition, we cannot simulate exactly what the noise will be; but we can simulate one instance of this noise. However for the misprint detection algorithm that takes these I_sim and I_sensed as input, it is better to have the simulated average image, which is our best guess as -obviously- we cannot predict the what the noise will be.

You will find more detailed information on the sensing system on this page

From these two images, a misprint detection algorithm computes the probability of having a misprint according to what we have learned and measured.

How to detect the misprints and reduce the computation

Based on intuition, it seems that the representation of data, or the representation of the impact of misprints on a page should be very sparse and then, a lot of useless computation could be avoided by exploiting this sparsity. For example, if we want to detect a missing color plane from an image, let's say a missing cyan plane, it is useless to check areas where cyan is not available on the original image.

From this intuition, we first developed a simple approach to find area in which a color is present more importantly than the other colors. This simple operations allows to check for a misprint only using a small area of the original image thus leading to potential huge reductions in computation.

Another solution that we did not study in details was to obtain a form of sparse representation of the data directly from the hardware. Our goal is to find a way to measure only a limited number of characteristics, instead of the entire (line) array of information. Then, it seems a good idea that the hardware should be able to measure those “projections” itself, and we should only process those features vectors. There has been some interesting papers around this application, but none on them are able to perform general type of transforms that we could reprogram at will. People interested in those applications can look at the references given at the end of the wiki concerning compressive sensing applications.

The drawbacks with this approach is that we need to develop sparsity models for each type of misprints we would like to detect. So, it may seam an interesting approach at first to considerably lower the total detection algorithm complexity but there are two problems with this method:

  1. It supposes that we know every possible type of misprint, which would lead to huge number of cases, and each case requiring a specific method and specific calculation, it finally requires a lot of development time and reduces strongly the gain in the total computation (probably it is even worst).
  2. The repartition of certain type of misprints cannot be predicted: for instance you cannot predict where the paper could have a scratch given the image that should be printed. Therefore we have to check the whole page.

Given these two remarks, it appears to be a better idea to develop a general algorithm that could catch different types of misprints and checks the whole page only once. This is the kind of method that we developed.

The idea is to compare the simulated sensor output with the actual output, and see how likely it is to have such a outcome given what it should be, and the noise level for this value. From this consideration we compute the probability of a correct print. You can find the details of this method on this page, it uses a learning process to find the different parameters of the problem, and then infer the final probability from these and the measurements.

At first we developed a method for a monochromatic sensor, and then we generalized these results to a multi-sensor algorithm.


We have carried out a series of computational misprint experiments using the simulator. These tell us how well the method performs on different misprints, and how the algorithm behaves with respect to noise, or changes in parameters.

We also ran some comparison between the different flavours of the algorithm an evaluated their efficiency with respect to their computational cost.

To be able to run these experiment we implemented of dozen of possible misprints, reflecting a wide range of possible effects. These results show that the method is really effective in detecting these misprints.


In this project, we developed an entire model for simulating and evaluating misprints in real-time for fast printing presses. We created a simulation pipeline involving a model of an hardware CCD sensor, an optical system, conversion algorithms from CMYK representation to reflectance maps as well as different strategies to efficiently detect misprints and a toolbox to simulate a wide class of misprint errors.

After a first complete version of the model, we developed a new approach to speed up the entire simulation by a large factor. The idea was to use ISET to build a look-up table allowing a fast conversion from the original paper to an estimated sensor measurements. Using this strategy, we divided by approximately 100 the computational cost of the simulation.

For detecting misprints from the simulated measurements, we used a learning approach by simulating hundreds of misprinted images and learning from them classical conditional probability distribution observed after comparison with the original image. With those learnt distributions, we could then test new simulated images for misprints and obtained very high probability of misprint detection. This shows that it is actually possible to achieve very good results on this particular problem with only a monochromatic sensor.

This also proves that we can have a general algorithm to detect misprints, that it can run fast (all these computations seem reasonable in real time), and still be efficient, even with a monochromatic sensor. This is very encouraging for a more general, and more accurate system. We tested this system under several different conditions and it is performing very well.

We also implemented a generalised method to apply a similar algorithm to a multi sensor system.

Future directions

The simulation can make use of multiple chromatic sensors; this may provide a more efficient detection algorithm and detect a wider class of misprints.Multiple sensor yield multiple measurements of the signal, and give results that are likely to be much more accurate and precise, as they would probably allow the algorithm to work at a finer scale. We haven't fully tested this mode so far, and it will be interesting to see how the misprint detector behaves with several sensors.

One possible extension of this project is to use different tables for different types of misprint (see this section for more details). Some types of misprint are indeed having the same effect on the features that we are extracting. The idea would be to group according to this criterion for more efficiency; probably the wide spread misprints on one side and the localized misprints on the other side.

Studying the trade-off between efficiency and computation is also something that would be interesting, as adding more sensors, and computing with more and more tables may give better results, but it also comes with a cost.

Finally it is also possible to improve the results with a more complex model, or adding more features.

Other issues adressed

During this project we address other peripheral issues, like managing look-up tables, or 3D visualization, and you can find some comments about these on this page

Software Overview

The Misprint software is stored in an SVN repository. The software begins with various types of RGB input images, converts the images to CMYK, and simulates the printing process and the sensor. The Misprint software page describes the Matlab, VTK and ImageMagick functions.

In this part, you can find the different steps used in the whole pipeline, and some code to run the files.


  1. Printing Quality Control Using Template Independent NeuroFuzzy Defect Classification, 7th European Conference on Intelligent Techniques and Soft Computing (EUFIT), Aachen (Germany), 1999
  2. European patent EP0554811 (Misprint detection device in a rotary printing machine)
  3. USPTO Application 20080260395 (Image Forming Apparatus and Misprint Detection Method)
  4. US Patent 6,003,980 (Continuous ink jet printing Apparatus and method including self-testing for printing errors)
  5. US Patent 7,519,222 (Print defect detection)

Development notes

You can find some notes written during the development of this project on this page

Personal tools