发点编程的东西，非局部均值法（NLM）降噪

%7B%22isLastPage%22%3Atrue%2C%22notes%22%3A%5B%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22t26246%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225468%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225470%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225472%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225473%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225474%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225475%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225478%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225482%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225485%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225496%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225500%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225501%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225506%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225507%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225523%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225558%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225583%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225594%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225601%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225605%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225609%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225638%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225647%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225649%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225698%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225706%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225711%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22225723%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22226729%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22226752%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22249157%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22256435%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22263140%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22265389%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22271845%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22272943%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22530321%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22530491%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22530510%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22534970%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22848275%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22897782%22%2C%22notes%22%3A%5B%5D%7D%2C%7B%22type%22%3A%22post%22%2C%22targetId%22%3A%22897799%22%2C%22notes%22%3A%5B%5D%7D%5D%2C%22pid%22%3A%22t26246%22%2C%22tid%22%3A%2226246%22%2C%22mainForumsId%22%3A%5B%22134%22%5D%2C%22categoriesId%22%3A%5B%220%22%5D%2C%22tcId%22%3A%5B%5D%7D

加载中...

论坛>计算机科学>软件综合

文号t26246

52514

发点编程的东西，非局部均值法（NLM）降噪

小俊2010/06/18软件综合 IP:荷兰

图像降噪专用，不解释。

算法描述：

超慢的方法，处理每个像素的算法复杂度与处理半径的关系为O(N^4)。

偷懒一下加快算法：

代码（写成了线程函数模式，以便多线程处理，NOISE为降噪强度，LERPC为lerp系数）：

#include "stdafx.h"

#include <cmath>

#define NLM_WINDOW_RADIUS    3
#define NLM_BLOCK_RADIUS    3

#define NLM_WINDOW_AREA        ((2 * NLM_WINDOW_RADIUS + 1) * (2 * NLM_WINDOW_RADIUS + 1))
#define INV_NLM_WINDOW_AREA    (1.0f / (float)NLM_WINDOW_AREA)

#define NLM_WEIGHT_THRESHOLD    0.10f
#define NLM_LERP_THRESHOLD        0.10f

#define BLOCKDIM_X        8
#define BLOCKDIM_Y        8

__inline unsigned long vecLen(unsigned long p1, unsigned long p2)
{
    unsigned char *a = (unsigned char *)&p1;
    unsigned char *b = (unsigned char *)&p2;

    return ((b[0] - a[0]) * (b[0] - a[0])
          + (b[1] - a[1]) * (b[1] - a[1])
          + (b[2] - a[2]) * (b[2] - a[2]));
}

/*
__inline float lerpf(float a, float b, float c)
{
    return a + (b - a) * c;
}
*/

__inline unsigned long lerpf(unsigned long a, unsigned long b, long c)
{
    return a + (b - a) * c / 65536;
}

__inline unsigned long tex2D(unsigned long *image, int p, int h, int x, int y)
{
    if(x < 0)
        x = 0;
    else if(x >= p)
        x = p - 1;
    if(y < 0)
        y = 0;
    else if (y >= h)
        y = h - 1;

    return image[y * p + x];
}

#define NOISE        0.25f
#define LERPC        0.15f

UINT AFX_CDECL CPUFastNLMThread(LPVOID param)
{
    unsigned long *p = (unsigned long *)param;

    unsigned long *image = (unsigned long *)p[0];
    unsigned long pitch = p[1];
    unsigned long pitch4 = p[1] / 4;
    unsigned long width = p[2];
    unsigned long height = p[3];
    unsigned long y_amount = p[4];
    unsigned long y_start = p[5];
    unsigned long *image_out = (unsigned long *)p[6];

    float Noise = 1.0f / (NOISE * NOISE);
    float lerpC = LERPC;

    long bx, by, tx, ty;

    for(by = y_start; by < y_start + y_amount; by += BLOCKDIM_Y)
    {
        for(bx = 0; bx < width; bx += BLOCKDIM_Y)
        {
            // float fWeights[BLOCKDIM_X * BLOCKDIM_Y];
            long fWeights[BLOCKDIM_X * BLOCKDIM_Y];

            for(ty = 0; ty < BLOCKDIM_Y; ty ++)
                for(tx = 0; tx < BLOCKDIM_X; tx ++)
                {
                    long x = bx + tx;
                    long y = by + ty;
                    long cx = bx + NLM_WINDOW_RADIUS;
                    long cy = by + NLM_WINDOW_RADIUS;

                    if((x < width) && (y < height))
                    {
                        //Find color distance from current texel to the center of NLM window
                        // float weight = 0;
                        long weight = 0;
                        for(int n = -NLM_BLOCK_RADIUS; n <= NLM_BLOCK_RADIUS; n ++)
                            for(int m = -NLM_BLOCK_RADIUS; m <= NLM_BLOCK_RADIUS; m ++)
                                weight += vecLen(tex2D(image, pitch4, height, cx + m, cy + n), tex2D(image, pitch4, height, x + m, y + n));

                        //Geometric distance from current texel to the center of NLM window
                        float dist = (float)((tx - NLM_WINDOW_RADIUS) * (tx - NLM_WINDOW_RADIUS) + (ty - NLM_WINDOW_RADIUS) * (ty - NLM_WINDOW_RADIUS));

                        //Derive final weight from color and geometric distance
                        // weight = exp(-(weight / 65536.0f * Noise + dist * INV_NLM_WINDOW_AREA));

                        //Write the result to memory
                        // fWeights[ty * BLOCKDIM_X + tx] = weight;
                        fWeights[ty * BLOCKDIM_X + tx] = exp(-(weight / 65536.0f * Noise + dist * INV_NLM_WINDOW_AREA)) * 65536;
                    }
                }

            for(ty = 0; ty < BLOCKDIM_Y; ty ++)
                for(tx = 0; tx < BLOCKDIM_X; tx ++)
                {
                    long x = bx + tx;
                    long y = by + ty;
                    if((x < width) && (y < height))
                    {
                        //Normalized counter for the NLM weight threshold
                        // float fCount = 0;
                        long fCount = 0;
                        //Total sum of pixel weights
                        // float sumWeights = 0;
                        long sumWeights = 0;
                        //Result accumulator
                        // float clr[3] = {0, 0, 0};
                        long clr[3] = {0, 0, 0};

                        int idx = 0;

                        //Cycle through NLM window, surrounding (x, y) texel
                        for(long i = -NLM_WINDOW_RADIUS; i <= NLM_WINDOW_RADIUS + 1; i ++)
                            for(long j = -NLM_WINDOW_RADIUS; j <= NLM_WINDOW_RADIUS + 1; j ++)
                            {
                                //Load precomputed weight
                                // float weightIJ = fWeights[idx ++];
                                long weightIJ = fWeights[idx ++];

                                //Accumulate (x + j, y + i) texel color with computed weight
                                long clrIJ = tex2D(image, pitch4, height, x + j, y + i);
                                clr[0] += (clrIJ & 0xff) * weightIJ;
                                clr[1] += ((clrIJ >> 8) & 0xff) * weightIJ;
                                clr[2] += ((clrIJ >> 16) & 0xff) * weightIJ;

                                //Sum of weights for color normalization to [0..1] range
                                sumWeights  += weightIJ;

                                //Update weight counter, if NLM weight for current window texel
                                //exceeds the weight threshold
                                // fCount += (weightIJ > NLM_WEIGHT_THRESHOLD) ? INV_NLM_WINDOW_AREA : 0;
                                fCount += (weightIJ > 6553) ? 1337 : 0;
                            }

                        //Normalize result color by sum of weights
                        /*
                        sumWeights = 1.0f / sumWeights;
                        clr[0] *= sumWeights;
                        clr[1] *= sumWeights;
                        clr[2] *= sumWeights;
                        */

                        clr[0] /= sumWeights;
                        clr[1] /= sumWeights;
                        clr[2] /= sumWeights;

                        //Choose LERP quotent basing on how many texels
                        //within the NLM window exceeded the weight threshold
                        // float lerpQ = (fCount > NLM_LERP_THRESHOLD * 65536) ? lerpC : 1.0f - lerpC;
                        long lerpQ = (fCount > 6553) ? 13107 : 52429;

                        //Write final result to memory
                        unsigned long clr00 = image[y * pitch4 + x];
                        clr[0] = lerpf(clr[0], clr00 & 0xff, lerpQ);
                        clr[1] = lerpf(clr[1], (clr00 >> 8) & 0xff, lerpQ);
                        clr[2] = lerpf(clr[2], (clr00 >> 16) & 0xff, lerpQ);
                        image_out[y * pitch4 + x] = ((unsigned char)clr[0]) | ((unsigned char)clr[1] << 8) | ((unsigned char)clr[2] << 16);
                    }
                }
        }
    }

    return 0;
}

+1000 科创币

phpskycn 2010/06/18 加少了。。。

+500 科创币

phpskycn 2010/06/18 不解释

来自：计算机科学 / 软件综合

全部只看作者

~~空空如也

小俊

作者

15年1个月前 IP:未同步

225468

1楼

为了提高速度，所有运算放大65536倍后基于整型进行，避免过多浮点运算。

运算速度在破双核E7200上大概为每秒处理160万像素，悲剧了，1000多万像素的照片得处理10秒。。。

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

thallium2012 千古风流

15年1个月前 IP:未同步

225470

2楼

你很厉害哈呵呵难以企及的高度飘过

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年1个月前 IP:未同步

225472

3楼

降噪效果测试：

500D拍的室内照片暗部，ISO1600，0.125s，原图裁剪：

处理结果（参数为noise=0.25, lerpc=0.15）

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年1个月前 IP:未同步

225473

4楼

原算法来自CUDA SDK。

这个程序的CUDA版本代码在GTX285上的速度大概为双核E7200的80-90倍。

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

y2k

15年1个月前 IP:未同步

225474

5楼

感覺細節失了不少...

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年1个月前 IP:未同步

225475

6楼

引用第5楼y2k于2010-06-18 00:26发表的 :
感覺細節失了不少...

哪里的细节丢失了？请圈出

局部颜色过渡不自然倒是有点

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

y2k

15年1个月前 IP:未同步

225478

7楼

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年1个月前 IP:未同步

225482

8楼

引用第7楼y2k于2010-06-18 00:33发表的 :

所有的花纹都还在吧，只是对比度损失了一些，可以加一级处理纠回来

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

delete

15年1个月前 IP:未同步

225485

9楼

处理后不锐了不处理的图还可以接受

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年1个月前 IP:未同步

225496

10楼

引用第9楼cat13于2010-06-18 00:46发表的 :
处理后不锐了不处理的图还可以接受

世界上有一样算法叫USM锐化：

不处理那是噪点满天飞了。

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

花落一天

15年1个月前 IP:未同步

225500

11楼

眼神不好,没看到燥点少了多少.........倒是看到锐度降低了很多,整个图糊了........

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

warmonkey

15年1个月前 IP:未同步

225501

12楼

对焦不太好，另外请上100%截图

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年1个月前 IP:未同步

225506

13楼

引用第12楼warmonkey于2010-06-18 01:22发表的 :
对焦不太好，另外请上100%截图

已经是原图大小的截图，可能屏幕分辨率不够大的话论坛给缩小了

给个小尺寸截图：

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年1个月前 IP:未同步

225507

14楼

引用第11楼花落一天于2010-06-18 01:19发表的 :
眼神不好,没看到燥点少了多少.........倒是看到锐度降低了很多,整个图糊了........

可能你的屏幕分辨率太低了，论坛把图缩小了

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

87685302

15年1个月前 IP:未同步

225523

15楼

那窗帘可以制作个裙子那黑色的桌子让人肃穆起敬

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

ltl

15年0个月前 IP:未同步

225558

16楼

CUDA及其有爱……可惜缺钱缺硬件……

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

novakon

15年0个月前 IP:未同步

225583

17楼

这个方法去除了高频噪声,而且效果相当可以,只是低频部分仍然一塌糊涂。

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

hmy0011

15年0个月前 IP:未同步

225594

18楼

用易语言的飘过。。。不懂高深的英文编程。。

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年0个月前 IP:未同步

225601

19楼

引用第18楼hmy0011于2010-06-18 13:31发表的 :
用易语言的飘过。。。不懂高深的英文编程。。

高不高深不在于语言本身，而在于算法。

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

phpskycn

15年0个月前 IP:未同步

225605

20楼

看图
GTX285不是人人都有的，不过在GT240之类主流产品上的速度应该还可以接受。
上面那个E7200的速度是单线程的？另外感觉躁点仍然存在，少了很多

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年0个月前 IP:未同步

225609

21楼

引用第20楼phpskycn于2010-06-18 15:02发表的 :
看图
GTX285不是人人都有的，不过在GT240之类主流产品上的速度应该还可以接受。
上面那个E7200的速度是单线程的？另外感觉躁点仍然存在，少了很多

是多线程的，线程数量等于CPU核的数量。我发的代码也是线程函数。

并行计算时代已经到来，过去靠彪CPU频率来拼速度的时代一去不复返。
即使不用CUDA，也应该在编程时考虑并行化和多线程。

另外GT240之流已经不能算是主流产品了。Fermi系列的最低端GF108已经达到GT240的级别。

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

phpskycn

15年0个月前 IP:未同步

225638

22楼

GF108还有若干个月。。。
额那两个线程如何同步？把图像裁两半分开处理？

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年0个月前 IP:未同步

225647

23楼

引用第22楼phpskycn于2010-06-18 17:10发表的 :
GF108还有若干个月。。。
额那两个线程如何同步？把图像裁两半分开处理？

正解。。。
GF108，1个月之内必出，96个SP。

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

startwood

15年0个月前 IP:未同步

225649

24楼

额，那么中间分开那地方便还得再处理遍。。。
TSMC产能有点小小的问题，还有我发现NV似乎还没完全清仓，GT200系列还要存在一段时间，不知道啥时候GF10X才能替代原来那些产品，价格降到正常点的价位。
GF108要是便宜的话买一张……i7配7300LE太那个了。。。

引用

加载评论中，请稍候...

200字以内，仅用于支线交流，主线讨论请采用回复功能。

折叠评论

小俊

作者

15年0个月前 IP:未同步

225698

25楼

引用第24楼startwood于2010-06-18 17:35发表的 :
额，那么中间分开那地方便还得再处理遍。。。
TSMC产能有点小小的问题，还有我发现NV似乎还没完全清仓，GT200系列还要存在一段时间，不知道啥时候GF10X才能替代原来那些产品，价格降到正常点的价位。
GF108要是便宜的话买一张……i7配7300LE太那个了。。。

不需要处理分界部分。这个是根据输出像素来划分线程的。

引用