首页 \ 问答 \ 如何优化此代码?(How to optimize this code?)

如何优化此代码?(How to optimize this code?)

Profiler说,这个功能占总时间的50%。 你会如何优化它? 它将BMP颜色方案转换为YUV。 谢谢!

更新:平台是ARMV6(为iPhone编写)

#define Y_FROM_RGB(_r_,_g_,_b_) ( (  66 * _b_ + 129 * _g_ +  25 * _r_ + 128) >> 8) + 16
#define V_FROM_RGB(_r_,_g_,_b_) ( ( 112 * _b_ -  94 * _g_ -  18 * _r_ + 128) >> 10) + 128
#define U_FROM_RGB(_r_,_g_,_b_) ( ( -38 * _b_ -  74 * _g_ + 112 * _r_ + 128) >> 10) + 128

  /*!
 * \brief
 * Converts 24 bit image to YCrCb image channels
 * 
 * \param source
 * Source 24bit image pointer
 * 
 * \param source_width
 * Source image width
 * 
 * \param dest_Y
 * destination image Y component pointer
 * 
 * \param dest_scan_size_Y
 * destination image Y component line size
 * 
 * \param dest_U
 * destination image U component pointer
 * 
 * \param dest_scan_size_U
 * destination image U component line size
 * 
 * \param dest_V
 * destination image V component pointer
 * 
 * \param dest_scan_size_V
 * destination image V component line size
 * 
 * \param dest_width
 * Destination image width = source_width
 * 
 * \param dest_height
 * Destination image height = source image height
 *
 * Convert 24 bit image (source) with width (source_width)
 * to YCrCb image channels (dest_Y, dest_U, dest_V) with size (dest_width)x(dest_height), and line size
 * (dest_scan_size_Y, dest_scan_size_U, dest_scan_size_V) (in bytes)
 * 
 */
void ImageConvert_24_YUV420P(unsigned char * source, int source_width,
                            unsigned char * dest_Y, int dest_scan_size_Y,
                            unsigned char * dest_U, int dest_scan_size_U,
                            unsigned char * dest_V, int dest_scan_size_V,
                            int dest_width, int dest_height)
{
  int source_scan_size = source_width*3;

  int half_width = dest_width/2;

  //Y loop
  for (int y = 0; y < dest_height/2; y ++)
  {
    //Start of line
    unsigned char * source_scan = source;
    unsigned char * source_scan_next = source+source_scan_size;
    unsigned char * dest_scan_Y = dest_Y;
    unsigned char * dest_scan_U = dest_U;
    unsigned char * dest_scan_V = dest_V;

    //Do all pixels
    for (int x = 0; x < half_width; x++)
    {
      int R = source_scan[0];
      int G = source_scan[1];
      int B = source_scan[2];

      //Y
      int Y = Y_FROM_RGB(B, G, R);

      *dest_scan_Y = Y;
      source_scan += 3;
      dest_scan_Y += 1;

      int R1 = source_scan[0];
      int G1 = source_scan[1];
      int B1 = source_scan[2];

      //Y
      Y = Y_FROM_RGB(B1, G1, R1);

      R += (R1 + source_scan_next[0] + source_scan_next[3]);
      G += (G1 + source_scan_next[1] + source_scan_next[4]);
      B += (B1 + source_scan_next[2] + source_scan_next[5]);


      //YCrCb
      *dest_scan_Y = Y;
      *dest_scan_V = V_FROM_RGB(B, G, R);
      *dest_scan_U = U_FROM_RGB(B, G, R);

      source_scan += 3;
      dest_scan_Y += 1;
      dest_scan_U += 1;
      dest_scan_V += 1;
      source_scan_next += 6;
    };

    //scroll to next line
    source += source_scan_size;
    dest_Y += dest_scan_size_Y;
    dest_U += dest_scan_size_U;
    dest_V += dest_scan_size_V;

    //Start of line
    source_scan = source;
    dest_scan_Y = dest_Y;

    //Do all pixels
    for (int x = 0; x < half_width; x ++)
    {
      int R = source_scan[0];
      int G = source_scan[1];
      int B = source_scan[2];

      //Y
      int Y = Y_FROM_RGB(B, G, R);

      *dest_scan_Y = Y;
      source_scan += 3;
      dest_scan_Y += 1;

      R = source_scan[0];
      G = source_scan[1];
      B = source_scan[2];

      //Y
      Y = Y_FROM_RGB(B, G, R);
      *dest_scan_Y = Y;
      source_scan += 3;
      dest_scan_Y += 1;
    };

    source += source_scan_size;
    dest_Y += dest_scan_size_Y;
  };
};

Profiler says that 50% of total time spends inside this function. How would you optimize it? It converts BMP color scheme to YUV. Thanks!

Update: platform is ARMV6 (writing for IPhone)

#define Y_FROM_RGB(_r_,_g_,_b_) ( (  66 * _b_ + 129 * _g_ +  25 * _r_ + 128) >> 8) + 16
#define V_FROM_RGB(_r_,_g_,_b_) ( ( 112 * _b_ -  94 * _g_ -  18 * _r_ + 128) >> 10) + 128
#define U_FROM_RGB(_r_,_g_,_b_) ( ( -38 * _b_ -  74 * _g_ + 112 * _r_ + 128) >> 10) + 128

  /*!
 * \brief
 * Converts 24 bit image to YCrCb image channels
 * 
 * \param source
 * Source 24bit image pointer
 * 
 * \param source_width
 * Source image width
 * 
 * \param dest_Y
 * destination image Y component pointer
 * 
 * \param dest_scan_size_Y
 * destination image Y component line size
 * 
 * \param dest_U
 * destination image U component pointer
 * 
 * \param dest_scan_size_U
 * destination image U component line size
 * 
 * \param dest_V
 * destination image V component pointer
 * 
 * \param dest_scan_size_V
 * destination image V component line size
 * 
 * \param dest_width
 * Destination image width = source_width
 * 
 * \param dest_height
 * Destination image height = source image height
 *
 * Convert 24 bit image (source) with width (source_width)
 * to YCrCb image channels (dest_Y, dest_U, dest_V) with size (dest_width)x(dest_height), and line size
 * (dest_scan_size_Y, dest_scan_size_U, dest_scan_size_V) (in bytes)
 * 
 */
void ImageConvert_24_YUV420P(unsigned char * source, int source_width,
                            unsigned char * dest_Y, int dest_scan_size_Y,
                            unsigned char * dest_U, int dest_scan_size_U,
                            unsigned char * dest_V, int dest_scan_size_V,
                            int dest_width, int dest_height)
{
  int source_scan_size = source_width*3;

  int half_width = dest_width/2;

  //Y loop
  for (int y = 0; y < dest_height/2; y ++)
  {
    //Start of line
    unsigned char * source_scan = source;
    unsigned char * source_scan_next = source+source_scan_size;
    unsigned char * dest_scan_Y = dest_Y;
    unsigned char * dest_scan_U = dest_U;
    unsigned char * dest_scan_V = dest_V;

    //Do all pixels
    for (int x = 0; x < half_width; x++)
    {
      int R = source_scan[0];
      int G = source_scan[1];
      int B = source_scan[2];

      //Y
      int Y = Y_FROM_RGB(B, G, R);

      *dest_scan_Y = Y;
      source_scan += 3;
      dest_scan_Y += 1;

      int R1 = source_scan[0];
      int G1 = source_scan[1];
      int B1 = source_scan[2];

      //Y
      Y = Y_FROM_RGB(B1, G1, R1);

      R += (R1 + source_scan_next[0] + source_scan_next[3]);
      G += (G1 + source_scan_next[1] + source_scan_next[4]);
      B += (B1 + source_scan_next[2] + source_scan_next[5]);


      //YCrCb
      *dest_scan_Y = Y;
      *dest_scan_V = V_FROM_RGB(B, G, R);
      *dest_scan_U = U_FROM_RGB(B, G, R);

      source_scan += 3;
      dest_scan_Y += 1;
      dest_scan_U += 1;
      dest_scan_V += 1;
      source_scan_next += 6;
    };

    //scroll to next line
    source += source_scan_size;
    dest_Y += dest_scan_size_Y;
    dest_U += dest_scan_size_U;
    dest_V += dest_scan_size_V;

    //Start of line
    source_scan = source;
    dest_scan_Y = dest_Y;

    //Do all pixels
    for (int x = 0; x < half_width; x ++)
    {
      int R = source_scan[0];
      int G = source_scan[1];
      int B = source_scan[2];

      //Y
      int Y = Y_FROM_RGB(B, G, R);

      *dest_scan_Y = Y;
      source_scan += 3;
      dest_scan_Y += 1;

      R = source_scan[0];
      G = source_scan[1];
      B = source_scan[2];

      //Y
      Y = Y_FROM_RGB(B, G, R);
      *dest_scan_Y = Y;
      source_scan += 3;
      dest_scan_Y += 1;
    };

    source += source_scan_size;
    dest_Y += dest_scan_size_Y;
  };
};

原文:https://stackoverflow.com/questions/3439594
更新时间:2021-06-07 08:06

最满意答案

除非我遗漏了某些内容,否则以下代码似乎会在两个循环中重复出现,那么,为什么不通过这个循环呢? 这可能需要对算法进行一些更改,但这会提高性能。

for (int x = 0; x < half_width; x ++) 
{ 
  int R = source_scan[0]; 
  int G = source_scan[1]; 
  int B = source_scan[2]; 

  //Y 
  int Y = Y_FROM_RGB(B, G, R); 

  *dest_scan_Y = Y; 
  source_scan += 3; 
  dest_scan_Y += 1; 

  R = source_scan[0]; 
  G = source_scan[1]; 
  B = source_scan[2]; 

但是,在做任何事情之前,将两个内部循环移动到单独的函数中,然后运行您的探查器,看看您是否在一个函数中花费的时间多于另一个函数。

你在这个函数中有三个循环,你不知道哪个部分实际上是你花时间的地方。 因此,在进行任何优化之前确定,否则您可能会发现您正在修复错误的部分。


Unless I am missing something the follow code seems to be repeated in both loops, so, why not go through this loop once? This may require some changes to your algorithm, but it would improve performance.

for (int x = 0; x < half_width; x ++) 
{ 
  int R = source_scan[0]; 
  int G = source_scan[1]; 
  int B = source_scan[2]; 

  //Y 
  int Y = Y_FROM_RGB(B, G, R); 

  *dest_scan_Y = Y; 
  source_scan += 3; 
  dest_scan_Y += 1; 

  R = source_scan[0]; 
  G = source_scan[1]; 
  B = source_scan[2]; 

But, before doing anything, move the two inside loops into separate functions, and then run your profiler, and see if you spend more time in one function than the other.

You have three loops in this function, and you don't know which section is actually where you are spending your time. So determine that before doing any optimization, otherwise you may find that you are fixing the wrong section.

相关问答

更多

如何优化代码?(How to optimize the code?)

比连接和截断容易得多...... func: function(array, fieldName) { return array.map(a => a[fieldName]).join(', '); } Much easier than concatenating and truncating... func: function(array, fieldName) { return array.map(a => a[fieldName]).join(', '); }

如何优化下面的代码(How to optimize the code below)

使用一种方法为所有3个类定义通用接口: public interface IRfq { void DoSomething(); } public class Rfq : IRfq { public void DoSomething() { //... } } public class RfqBid : IRfq { public void DoSomething() { //... } } public ...

我该如何优化此代码?(How can I optimize this code?)

不知道速度,但从风格的角度来看,你的代码可以像这样改进(当然是很多可能的方法之一)。 function p($key) { return isset($_POST[$key]) ? trim($_POST[$key]) : null; } $name = p('myName'); $age = intval(p('myAge')); $res = p('myRes'); $err = array(); if(!preg_match('~^\w{3,}+$~', $name)) ...

如何优化此代码(How to optimize this Code)

您正在迭代已加载/引用的所有程序集中的所有类型。 但您想要的类型是您的类型,因此您知道它不在任何系统程序集中。 例如,如果您没有安装程序,则可以在全局程序集缓存中筛选出程序集: var type = typeof(TInterface); var types = AppDomain.CurrentDomain.GetAssemblies().Where(a => !a.GlobalAssemblyCache) .SelectMany(s => s.GetTypes()) .Wher ...

我可以优化此代码吗?(Can I optimize this code?)

两种方法,按优先顺序排列: 流输出 PrintWriter csvOut = ... // Construct a write from an outputstream, say to a file while (rs.next()) csvOut.println(...) // Write a single line (请注意,您应该确保缓存Writer / OutputStream,尽管默认情况下有很多) 使用StringBuilder StringBuilder sb = new ...

如何优化此代码?(How do I optimize this code?)

只创建一次所有复选框的数组: $checkboxes = Get-Variable checkBox_* $checkboxes | ForEach { $_.Value.Add_CheckStateChanged({ $okButton.Enabled = $True -in ($checkboxes | ForEach { $_.Value.Checked }) }) } 在PS3.0 +中,内部检查可以简化: $okButton.Enable ...

如何优化JS代码?(How to optimize JS code?)

你可能不喜欢你在块内的tilepos上行事。 如果您希望消除错误,我建议您按照以下方式执行此操作,但也会获得一些性能提升: for(var i = 0; i < this.buffer.length; i++) { for(var j = 0; j < this.buffer[i].length; j++) { _self.tiles.putTile('ground', i, j, _self.ground); } } It likely does not ...

如何优化以下代码?(how to optimize the following code?)

您可以where使用: for k, v in lookup.items(): df = df.where(~df.isin(v), k) 这表示当v中不包含这些值时保留df的值。 否则,用值k替换它们。 赋值在每次迭代时覆盖df以累积分类标签。 此方法适用于一个操作中的所有列,因此仅当您要将给定数值的每个实例替换为其分类编码字母时才有效。 指定就地修改的位置还有另一种选择,但不幸的是,它不能与具有混合列类型的DataFrame一起使用。 在您的示例中,列0,1和2具有类型object而 ...

如何优化此代码(How to optimize this code)

选项1 def task_count(status) Task .where(status: { to_do: 0, in_progress: 1, paused_count: 2, completed_count: 3 }[status]) .count end task_count(:to_do) task_count(:in_progress) 选项2 您可以使用范围来简化它 class Task scope :to_do, -> { whe ...

如何优化此代码?(How to optimize this code?)

除非我遗漏了某些内容,否则以下代码似乎会在两个循环中重复出现,那么,为什么不通过这个循环呢? 这可能需要对算法进行一些更改,但这会提高性能。 for (int x = 0; x < half_width; x ++) { int R = source_scan[0]; int G = source_scan[1]; int B = source_scan[2]; //Y int Y = Y_FROM_RGB(B, G, R); *dest_scan_Y = ...

相关文章

更多

最新问答

更多
  • 如何从远程文件拉取文件而不覆盖本地文件?(How do I pull files from remote without overwriting local files?)
  • Reactjs:状态改变时重新渲染iframe(Reactjs: re-renders iframes when state changed)
  • 奇怪的网址,以及跟随php页面流程的困难(odd url, and difficulty in following the php page flow)
  • 标签活动无效(Tab Activity is not working)
  • JavaME合适的语法编译器建议?(JavaME-suitable grammar compiler recommendations?)
  • 指定参数(Specifying arguments)
  • 可以通过Ruby插件或控制台覆盖Sketchup中的键盘快捷键吗?(Can one override keyboard shortcuts in Sketchup through the a Ruby Plugin or Console?)
  • 计算Java EE Web App中用户数的最佳方法(Best way to count number of users in a Java EE web App)
  • 无法使用templateUrl加载cordova中的外部模板(unable to load external templates in cordova with templateUrl)
  • PHPExcel:写入期间无法使用缓存(PHPExcel: Unable to use cache during write)
  • 在javascript中嵌套这个指针(nested this pointer in javascript)
  • 谁跟领航致远培训过,有问题问下啊
  • 控制器要求在入门时下载(Controller ask to download on entry)
  • 未能通过conda安装Asyncio(Failure to install Asyncio via conda)
  • 如何查找已完成项目的总长度?(How to find length of total completed items?)
  • 如何检查OleInitialize是否已被调用?(How to check if OleInitialize has already been called?)
  • SQL在特定范围内返回列中具有最大值的行(SQL Returning rows with max value in column, within a specific range)
  • preg_match从url获取id(preg_match get the id from url)
  • 如何在运算符中为make方程转换perl变量?(How to convert a perl variable in a operator for make equations?)
  • 在导航上方添加空格/标题。(Add a white space/ header above navigation.)
  • MeetingItem已保存;(MeetingItem saved; but change now shown in Calendar)
  • c#vb:我们应该使用System.Lazy进行资源密集型任务吗?(c# vb: Should we use System.Lazy for resource-intensive task? (when threading is not needed))
  • 为什么在armeabi代码中使用armeabi-v7a代码?(Why use armeabi-v7a code over armeabi code?)
  • 获取请求的自定义标头(Java HTTP)(Fetching a custom header of a request (Java HTTP))
  • 是否可以在嵌套的if语句中从varchar转换为numeric以动态评估参数?(Is it possible to convert from varchar to numeric within a nested if statement in order to dynamically evaluate a parameter?)
  • 如何将Html.ActionLink转换为链接到Ajax调用的按钮?(How to convert from Html.ActionLink to a button linked to Ajax call?)
  • 应用程序如何处理Windows符号链接?(How are Windows symbolic links treated by the apps?)
  • html,js,css在jsfiddle中工作,但不在sharepoint中(html, js, css works in jsfiddle but not in sharepoint)
  • 从Ruby脚本调用Elasticsearch Rest API(Calling Elasticsearch Rest API from Ruby script)
  • 如何将嵌套setTimeouts转换为承诺(How to convert nested setTimeouts to promises)