Figure 1: Synthetic data and fitted curves. |
To demonstrate this, in this short example, after generating a synthetic data, we will fit quasibinomial regression model to different observations.
ggplot [2], an implementation of grammar of graphics [3], provides capability to apply regression or customised smoothing onto a raw data during plotting.
Generating Synthetic Data
Let generate set of $n$ observation over time $t$, denoted, $X_{1}, X_{2}, ..., X_{n}$ for $k$ observation $X=(x_{1}, x_{2}, ..., x_{k})$. We will use cumulative function for logistic distribution [4],
$$F(x;\mu,s) = \frac{1}{2} + \frac{1}{2} \tanh((x-\mu)/2s)$$, adding some random noise to make
it realistic.
Let's say there are $k=6$ observations with the following parameter sets, $\mu = \{9,2,3,5,7,5\}$ and $s=\{2,2,4,3,4,2\}$, we will utilise `mapply` [5] in generating a syntetic data frame.
generate_logit_cdf <- function(mu, s, sigma_y=0.1, x=seq(-5,20,0.1)) { x_ms <- (x-mu)/s y <- 0.5 + 0.5 * tanh(x_ms) y <- abs(y + rnorm(length(x), 0, sigma_y)) ix <- which(y>=1.0) if(length(ix)>=1) { y[ix] <- 1.0 } return(y) } set.seed(424242) x <- seq(-5,20,0.025) # 1001 observation mu_vec <- c(1,2,3,5,7,8) # 6 variables s_vec <- c(2,2,4,3,4,2) # Syntetic variables observations_df<- mapply(generate_logit_cdf, mu_vec, s_vec, MoreArgs = list(x=x)) # Give them names colnames(observations_df) <- c("Var1", "Var2", "Var3", "Var4", "Var5", "Var6") head(observations_df)
Smoothing of observations
Using the syntetic data we have generated, `observations_df`,
we can noq use `ggplot` and `quasibinomial` `glm` to visualise
and smooth the variables.
library(ggplot2) library(reshape2) df_all <- reshape2:::melt(observations_df) colnames(df_all) <- c("x", "observation", "y") df_all$observation <- as.factor(df_all$observation) p1<-ggplot(df_all, aes(x=x, y=y, colour=observation)) + geom_point() + scale_color_brewer(palette = "Reds") + theme( panel.background = element_blank(), axis.text.x = element_text(face="bold", color="#000000", size=11), axis.text.y = element_text(face="bold", color="#000000", size=11), axis.title.x = element_text(face="bold", color="#000000", size=11), axis.title.y = element_text(face="bold", color="#000000", size=11) # legend.position = "none" ) l1<-ggplot(df_all, aes(x=x, y=y, colour=observation)) + geom_point(size=3) + scale_color_brewer(palette = "Reds") + scale_color_brewer(palette = "Reds") + #geom_smooth(method="loess", se = FALSE, size=1.5) + geom_smooth(aes(group=observation),method="glm", family=quasibinomial(), formula="y~x", se = FALSE, size=1.5) + xlab("x") + ylab("y") + #scale_y_continuous(breaks=seq(0.0,1,0.1)) + #scale_x_continuous(breaks=seq(0.0,230,20)) + #ggtitle("") + theme( panel.background = element_blank(), axis.text.x = element_text(face="bold", color="#000000", size=11), axis.text.y = element_text(face="bold", color="#000000", size=11), axis.title.x = element_text(face="bold", color="#000000", size=11), axis.title.y = element_text(face="bold", color="#000000", size=11) ) library(gridExtra) gridExtra:::grid.arrange(p1,l1)
References
[1] https://en.wikipedia.org/wiki/Logistic_distribution#Applications
[2] http://www.ggplot.org
[3] The Grammar of Graphics, L. Wilkinson, http://www.amzn.com/038724544
[4] http://en.wikipedia.org/wiki/Logistic_distribution#Cumulative_distribution_function.
[5] https://stat.ethz.ch/R-manual/R-devel/library/base/html/mapply.html
Notes:
* Quasibinomial distribution is as a term used in R's GLM implementation context, see here.