首页 \ 问答 \ 快速将黄土曲线添加到大型数据集图中的方法(Quick way to add loess curve to large data set graph)

快速将黄土曲线添加到大型数据集图中的方法(Quick way to add loess curve to large data set graph)

我试图绘制一个矢量, y具有604800点,对序列: x=seq(from=1, to=604800) 。 这不是问题,但我需要在图中添加黄土曲线。

我已经尝试使用ggplot2但这需要永远,并且在绘制大型数据集方面是出了名的糟糕。 见R代码:

vf <- ggplot(single.prop, aes(x,y)) + geom_line(linetype=1, size=1)
vf <- vf + stat_smooth(method="loess",fullrange=TRUE,aes(outfit=fit1<<-..y..))
vf

我现在尝试使用base包,但这也是永远的:

lw <- loess(y ~ x,data=single.prop)
plot(y ~ x, data=single.prop,pch=19,cex=0.1)
k <- order(single.prop$x)
lines(single.prop$x[k],lw$fitted[k],col="red",lwd=3)

有没有其他人对我能做些怎么做才能让这个更快跑? 我必须多次这样做,并且到目前为止已经等待了大约15分钟的一个情节,但仍未完成。


I am trying to plot a vector, y which has 604800 points, against a sequence: x=seq(from=1, to=604800). This is not a problem, but I do need to add a loess curve to the plots.

I have tried this using ggplot2 but this takes forever, and is notoriously bad at plotting large datasets. See R code:

vf <- ggplot(single.prop, aes(x,y)) + geom_line(linetype=1, size=1)
vf <- vf + stat_smooth(method="loess",fullrange=TRUE,aes(outfit=fit1<<-..y..))
vf

I have now tried to use the base package, but this is also taking forever:

lw <- loess(y ~ x,data=single.prop)
plot(y ~ x, data=single.prop,pch=19,cex=0.1)
k <- order(single.prop$x)
lines(single.prop$x[k],lw$fitted[k],col="red",lwd=3)

Does anyone else have any suggestions about what I can do to make this run quicker? I have to do this multiple times, and have so far been waiting about 15 minutes for one plot, and is still not completed.


原文:https://stackoverflow.com/questions/32885709
更新时间:2020-05-22 06:20

最满意答案

有了这么多的数据点,绘图的确可以持续很长时间。 当然,这取决于数据,但通常情况下,这一点很多,并没有给出一个非常可解释的图片。 对于这两个时间的可解释性,首先计算汇总统计数据然后绘制图表是有用的。 在你的情况下,我可以想象在x上进行分箱并为每个bin计算y的一个或多个统计数据可能很有用。 我用平均值做了一个小例子,但你可以使用你喜欢的数据。 希望这可以帮助..

x <- 1:10^6
y <- x/10^5 + rnorm(10^6)
plot_dat <- data.frame(x, y)
p <- ggplot(plot_dat, aes(x,y)) + geom_point()


bin_plot_dat <- function(bin_size){
  nr_bins <- nrow(plot_dat) / bin_size
  x2 <- rep(1:nr_bins * bin_size, each = bin_size)
  y2 <- tapply(plot_dat$y, x2, mean)
  data.frame(x = unique(x2), y= y2)
}

plot_dat2 <- bin_plot_dat(50)
p2 <- ggplot(plot_dat2, aes(x,y)) +
  geom_point()

p2 + geom_smooth()

With this many data points it can indeed last a long time for the plot to render. Of course it depends on the data but often a plot with this many points does not give a very interpretable picture. For both time an interpretability it can be useful to calculate summary stats first and then plot. In your situation I can imagine binning on x and calculating one or multiple stats for y for every bin can be useful. I did a small example with the mean, but you can use the stat of your liking of course. Hope this helps..

x <- 1:10^6
y <- x/10^5 + rnorm(10^6)
plot_dat <- data.frame(x, y)
p <- ggplot(plot_dat, aes(x,y)) + geom_point()


bin_plot_dat <- function(bin_size){
  nr_bins <- nrow(plot_dat) / bin_size
  x2 <- rep(1:nr_bins * bin_size, each = bin_size)
  y2 <- tapply(plot_dat$y, x2, mean)
  data.frame(x = unique(x2), y= y2)
}

plot_dat2 <- bin_plot_dat(50)
p2 <- ggplot(plot_dat2, aes(x,y)) +
  geom_point()

p2 + geom_smooth()
2015-10-01

相关文章

更多

最新问答

更多
  • jsPlumb draggable element javascript函数(jsPlumb draggable element javascript function)
  • MVC4:ViewModel(带有radiobuttonlist)在HttpPost之后为空(MVC4: ViewModel (with radiobuttonlist) is empty after HttpPost)
  • 如何在同一帐户上设置“Dev repo”(在prod和团队之间)(How to set up a “Dev repo” (between the prod and the team) on the same account)
  • 如何在tcl中将eth0配置为发送方udp端口(how to configure eth0 as a sender udp port in tcl)
  • 如何在datarow []中的列中找到最大值?(How to find max value in a column in a datarow[] ?)
  • 如何使用预定义文本替换来自数据库的部分结果(How do I replace part of result coming from Database with predefined text)
  • Selenium Java注入了新的Javascript函数(Selenium Java inject new Javascript function)
  • 使用.on的多个下拉菜单选择文本仅适用于第一个下拉列表(Multiple Dropdowns Menu Selection text using .on works only on first dropdown)
  • 快速将黄土曲线添加到大型数据集图中的方法(Quick way to add loess curve to large data set graph)
  • FilteringSelect in mvc(FilteringSelect in mvc)
  • 在Delphi XE2中开发Mac或iOS应用程序需要哪些硬件/软件?(What hardware/software is necessary to develop Mac or iOS apps in Delphi XE2?)
  • 在原型的构造函数中初始化属性时获取“未定义”(Getting 'undefined' when a property is initialized in the constructor of a prototype)
  • 通过越狱加载的应用程序的Documents文件夹位置(Location of Documents folder for an app loaded via jailbreak)
  • 在OpenGL中使用可编程和固定管道功能(Using both programmable and fixed pipeline functionality in OpenGL)
  • 将任何用户输入重定向到单独的底层程序(redirect any user input to a separate underlying program)
  • 编辑文本不能正常工作android(Edit texts not working properly android)
  • “user_denied”Facebook应用页面上的Facebook用户区域设置(Facebook user locale on “user_denied” facebook app page)
  • 在大图像中找到小的部分透明图像的坐标(find coordinates of small partially-transparent image within a large image)
  • 我如何在cakephp 3.1中获得完整的相对路径?(How i can get full relative path of image in cakephp 3.1?)
  • 如何保存拖动标记的新本地化?(How to save new localization of dragged marker?)
  • MySQL UPDATE vs INSERT和DELETE(MySQL UPDATE vs INSERT and DELETE)
  • 在执行查询之前,在SQLAlchemy模型中将datetime转换为unix时间戳?(Convert datetime to unix timestamp in SQLAlchemy model before executing query?)
  • OpenCL与OpenGL互操作的优势(Advantage of OpenCL interoperability with OpenGL)
  • 如何解析用点和等分隔的数据然后添加到listview(How to parsing data from delimited with dot and equal then add to listview)
  • 带调试输出的X3解析器段错误(BOOST_SPIRIT_X3_DEBUG)(X3 parser segfaults with debug output (BOOST_SPIRIT_X3_DEBUG))
  • 将文件夹名称添加到fgrep结果(Add folder name to fgrep result)
  • 在MySQL中加载一个表是非常慢的(Loading one table in MySQL is ridiculously slow)
  • 如何将JSON放入PHP变量?(How do I put JSON into a PHP Variable?)
  • 如何绕过Microsoft.Speech.Recognition中的不流畅?(How to bypass disfluencies in Microsoft.Speech.Recognition?)
  • 原点的最后一行是什么?(What is the last row of an origin for?)