
 2022-08-14 03:08



Fundamentals of Digital Image Processing:A Practical Approach with Examples in Matlab


Chris Solomon ,School of Physical Sciences, University of Kent, Canterbury, UK

Toby Breckon ,School of Engineering, Cranfield University, Bedfordshire, UK


RGB (or true color) images are 3-D arrays that we may consider conceptually as three distinct 2-D planes, one corresponding to each of the three red (R), green (G) and blue (B) color channels. RGB is the most common color space used for digital image representation a sit conveniently corresponds to the three primary colors which are mixed for display on a monitor or similar device.

We can easily separate and view the red, green and blue components of a true-color image, as shown in Figure 1.6. It is important to note that the colors typically present in a real image are nearly always a blend of color components from all three channels. A common misconception is that, for example, items that are perceived as blue will only appear in the blue channel and so forth. Whilst items perceived as blue will certainly appear brightest in the blue channel (i.e. they will contain more blue light than the other colors) they will also have milder components of red and green.

If we consider all the colors that can be represented with in the RGB representation, then we appreciate that the RGB color space is essentially a3-Dcolour space(cube)with axes R, G and B (Figure 1.7). Each axis has the same range 0!1 (this is scaled to 0–255 for the common1bytepercolourchannel,24-bitimagerepresentation). The color black occupies the origin of the cube (positioneth;0;0;0THORN;), corresponding to the absence of all three colors; white occupies the opposite corner(positioneth;1;1;1THORN;), indicating the maximum amount of all three colors. All other colors in the spectrum lie within this cube.

The RGB color space is based upon the portion of the electromagnetic spectrum visible to humans (i.e. the continuous range of wavelengths in the approximate range 400–700nm). The human eye has three different types of color receptor over which it has limited (and no uniform) absorbency for each of the red, green and blue wave lengths. This is why, as we will see later, the color to grey-scale transform uses a nonlinear combination of the RGB channels.

In digital image processing we use a simplified RGB color model (based on the CIE color standard of 1931) that is optimized and standardized towards graphical displays. However, the primary problem with RGB is that it is perceptually nonlinear. By this we mean that moving in a given direction in the RGB color cube (Figure 1.7) does not necessarily produce a color that is perceptually consistent with the change in each of the channels. For example, starting at white and subtracting the blue component produces yellow; similarly, starting at red and adding the blue component produces pink. For this reason, RGB space is inherently difficult for humans to work with and reason about because it is not related to the natural way we perceive colors. As an alternative we may use perceptual color representations such as HSV.

We can convert from an RGB color space to a grey-scale image using a simple transform. Grey-scale conversion is the initial step in many image analysis algorithms, as it essentially simples (i.e. reduces) the amount of information in the image. Although a grey-scale image contains less information than a color image, the majority of important, feature related information is maintained, such as edges, regions, blobs, junctions and so on. Feature detection and processing algorithms then typically operate on the converted greyscale version of the image. As we can see from Figure 1.8, it is still possible to distinguish between the red and green apples in grey-scale. An RGB color image, Color, is converted to grey scale, Igrey-scale, using the following transformation:

Igrey-scale(n,m)=alpha;Color(n,m,r) beta;Color (n,m,g) gamma;Color (n,m,b)

where (n;m) indexes an individual pixel within the grey-scale image and (n;m;c) the individualchannelatpixellocationeth;n;mTHORN;inthecolourimageforchannelcintheredr,blue b and green g image channels. As is apparent from Equation (1.1), the grey-scale image is essentially a weighted sum of the red, green and blue color channels. The weighting coefcients(a,bandg)aresetinproportiontotheperceptualresponseofthehumaneyeto each of the red, green and blue color channels and a standardized weighting ensures uniformity(NTSC television standard, a=0.2989,b=0.5870 and g=0.1140).The human eye is naturally more sensitive to red and green light; hence, these colors are given higher weightings to ensure that the relative intensity balance in the resulting grey-scale image is similar to that of the RGB color image. An example of performing a grey-scale conversion in Matlab is given in Example 1.6.

RGB to grey-scale conversion is a noninvertible image transform: the true color information that is lost in the conversion cannot be readily recovered.


The main goal of image enhancement is to process an image in some way so as to render it more visually acceptable or pleasing. The removal of noise, the sharpening of image edges and the lsquo;soft focusrsquo; (blurring) effect so often favored in romantic photographs are all examples of popular enhancement techniques. These and other enhancement operations can be achieved through the process of spatial domain filtering. The term spatial domain is arguably somewhat spurious, but is used to distinguish this procedure from frequency domain procedures(discussedinChapter5).Thus, spatial domain filtering simply indicates that the filtering process takes place directly on the actual pixels of the image itself.

Therefore, we shall refer simply to filtering in this chapt




RGB(或真彩色)图像是3-D阵列,我们可以在概念上将其视为三个不同的2-D平面,一个平面对应于三个红色(R),绿色(G)和蓝色(B)色彩通道。 RGB是用于数字图像表示的最常见的色彩空间,它很方便地对应于混合在监视器或类似设备上显示的三种原色。


如果我们考虑到可以用RGB表示形式表示的所有颜色,那么我们就会意识到RGB颜色空间基本上由3-D颜色空间(立方体)组成,其中的轴R,G和B(图1.7)。每个轴都具有相同的范围0!1(对于公共1字节每个彩色通道,该比例缩放为0-255,24位图像表示)。黑色占据了多维数据集的原点(位置0; 0; 0,),对应于所有三种颜色的缺失。白色占据相反的角(位置eth;1; 1;1THORN;),指示所有三种颜色的最大数量。光谱中的所有其他颜色都位于该立方体内。RGB色空间基于人类可见的电磁光谱的比例(即波长范围在400-700nm范围内的连续范围)。人眼具有三种不同类型的颜色接收器,在这些颜色接收器上,它们分别对固定的,绿色和蓝色波长具有有限的吸收率(并且吸收率不均匀)。这就是为什么我们将在后面看到,颜色到灰度转换使用RGB通道的非线性组合的原因。


RGB到灰度图像的转换我们可以使用简单的转换将RGB颜色空间转换为灰度图像。 灰度转换是许多图像分析算法的第一步,因为它本质上可以简化(即减少)图像中的信息量。 尽管灰度图像包含的信息少于彩色图像,但是大多数重要的,与特征相关的信息得以保留,例如边缘,区域,斑点,结点等。 然后,特征检测和处理算法通常对图像的转换灰度版本进行操作。 从图1.8中可以看到,仍然可以区分红色和绿色的苹果。 使用以下转换将RGB彩色图像Color转换为灰度Igrey-scale

Igrey-scale(n,m)=alpha;Color(n,m,r) beta;Color (n,m,g) gamma;Color (n,m,b)

其中(n;m)索引灰度图像中的单个像素,(n; m;c)单个像素在像素位置的位置(n;m)彩色图像中的通道,蓝色b和绿色g图像通道。从等式显而易见,灰度图像本质上是红色,绿色和蓝色通道的加权和。权重系数(a,band)的设置与人眼对每个红色,绿色和蓝色通道的感知响应的比例成比例,并且标准化的权重可确保一致性(NTSC电视标准,a=0.2989,b=0.5870和g=0.1140)。人眼自然对红色和绿色的光更敏感;因此,对这些颜色赋予较高的权重,以确保所得灰度图像中的相对强度平衡类似于RGB彩色图像。




主题过滤器可能是最简单的线性过滤器,并且通过对周围的所有像素赋予相等的权重wK来进行操作。权重WK=1/(NM)用于N M邻域,具有平滑图像的效果,将输出图像中的每个像素替换为其N M邻域的平均值。这种加权方案可确保内核中的加权在任何给定的邻域大小上加起来为一。均值滤波器可以用作抑制图像噪声的方法(尽管我们稍后将讨论的中值滤波器通常做得更好)。另一个常见用途是作为初步处理步骤来平滑图像,以使某些后续处理操作更有效。















作者:Ravina Mithe, Supriya Indalkar, Nilam Divekar

  1. OCR技术简介

OCR是光学字符识别的首字母缩写。该技术允许通过光学机制自动识别字符。对于人类来说,我们的眼睛是光学机制。眼睛看到的图像输入到大脑。每个人对这些输入的理解能力因许多因素而异[2]。 OCR是一项功能类似于人类的阅读能力的技术。尽管OCR无法与人类阅读能力竞争。

大多数字符识别程序将使用扫描仪或数码相机和计算机软件通过输入图像进行识别。计算机和扫描仪的空间大小存在问题。如果没有扫描仪和数码相机,则会发生硬件问题。为了克服计算机占据较大空间的局限性,提出了一种基于android手机的字符识别系统[4]。 OCR是一项使您能够将不同类型的文档(例如,扫描的纸张文档,PDF文件或数码相机捕获的图像)转换为可编辑和可搜索的数据的技术。数码相机捕获的图像与扫描的文档或图像不同。它们通常具有诸如边缘变形和光线昏暗之类的缺陷,这使得大多数OCR应用程序难以正确识别文本。我们之所以选择Tesseract,是因为它得到了广泛的认可,其可扩展性和灵活性,其活跃的开发人员社区以及开箱即用的“事实”。要执行字符识别,我们的应用程序必须经过三个重要步骤。第一个是分割,即给定二进制输入图像,以识别各个字形(代表一个或多个字符的基本单位,通常是连续的)。第二步是特征提取,即从每个字形中计算一个数字向量,该数字向量将作为ANN [3]的输入特征。从没有明显的方式获得这些功能的意义上说,此步骤是最困难的。最后的任务是分类。




原文和译文剩余内容已隐藏,您需要先支付 30元 才能查看原文和译文全部内容!立即支付
