楼主: oliyiyi
895 2

Does brain mass predict how much mammals sleep in a day? [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271951 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5468 小时
注册时间
2007-5-21
最后登录
2024-4-18

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Data: Does brain mass predict how much mammals sleep in a day?

Let’s use the mammal sleep dataset from ggplot2. This dataset
contains the number of hours spent sleeping per day for 83 different species of
mammals along with each species’ brain mass (kg) and body mass (kg), among other
measures. Here’s a first look at the data.

  1. library(dplyr, warn.conflicts = FALSE)
  2. library(ggplot2)

  3. # Preview sorted by brain/body ratio. I chose this sorting so that humans would
  4. # show up in the preview.
  5. msleep %>%  
  6.   select(name, sleep_total, brainwt, bodywt, everything()) %>%  
  7.   arrange(desc(brainwt / bodywt))
  8. #> # A tibble: 83 × 11
  9. #>                              name sleep_total brainwt bodywt        genus
  10. #>                             <chr>       <dbl>   <dbl>  <dbl>        <chr>
  11. #> 1  Thirteen-lined ground squirrel        13.8 0.00400  0.101 Spermophilus
  12. #> 2                      Owl monkey        17.0 0.01550  0.480        Aotus
  13. #> 3       Lesser short-tailed shrew         9.1 0.00014  0.005    Cryptotis
  14. #> 4                 Squirrel monkey         9.6 0.02000  0.743      Saimiri
  15. #> 5                         Macaque        10.1 0.17900  6.800       Macaca
  16. #> 6                Little brown bat        19.9 0.00025  0.010       Myotis
  17. #> 7                          Galago         9.8 0.00500  0.200       Galago
  18. #> 8                        Mole rat        10.6 0.00300  0.122       Spalax
  19. #> 9                      Tree shrew         8.9 0.00250  0.104       Tupaia
  20. #> 10                          Human         8.0 1.32000 62.000         Homo
  21. #> # ... with 73 more rows, and 6 more variables: vore <chr>, order <chr>,
  22. #> #   conservation <chr>, sleep_rem <dbl>, sleep_cycle <dbl>, awake <dbl>

  23. ggplot(msleep) +  
  24.   aes(x = brainwt, y = sleep_total) +  
  25.   geom_point()
  26. #> Warning: Removed 27 rows containing missing values (geom_point).
复制代码


Hmmm, not very helpful! We should put our measures on a log-10 scale. Also, 27
of the species don’t have brain mass data, so we’ll exclude those rows for the
rest of this tutorial.

  1. msleep <- msleep %>%  
  2.   filter(!is.na(brainwt)) %>%  
  3.   mutate(log_brainwt = log10(brainwt),  
  4.          log_bodywt = log10(bodywt),  
  5.          log_sleep_total = log10(sleep_total))
复制代码


Now, plot the log-transformed data. But let’s also get a little fancy and label
the points for some example critters :cat: so that we can get some intuition
about the data in this scaling. (Plus, I wanted to try out the annotation
tips
from the R4DS book.)

  1. # Create a separate data-frame of species to highlight
  2. ex_mammals <- c("Domestic cat", "Human", "Dog", "Cow", "Rabbit",
  3.                 "Big brown bat", "House mouse", "Horse", "Golden hamster")

  4. # We will give some familiar species shorter names
  5. renaming_rules <- c(
  6.   "Domestic cat" = "Cat",  
  7.   "Golden hamster" = "Hamster",  
  8.   "House mouse" = "Mouse")

  9. ex_points <- msleep %>%  
  10.   filter(name %in% ex_mammals) %>%  
  11.   mutate(name = stringr::str_replace_all(name, renaming_rules))

  12. # Define these labels only once for all the plots
  13. lab_lines <- list(
  14.   brain_log = "Brain mass (kg., log-scaled)",  
  15.   sleep_raw = "Sleep per day (hours)",
  16.   sleep_log = "Sleep per day (log-hours)"
  17. )

  18. ggplot(msleep) +  
  19.   aes(x = brainwt, y = sleep_total) +  
  20.   geom_point(color = "grey40") +
  21.   # Circles around highlighted points + labels
  22.   geom_point(size = 3, shape = 1, color = "grey40", data = ex_points) +
  23.   ggrepel::geom_text_repel(aes(label = name), data = ex_points) +  
  24.   # Use log scaling on x-axis
  25.   scale_x_log10(breaks = c(.001, .01, .1, 1)) +  
  26.   labs(x = lab_lines$brain_log, y = lab_lines$sleep_raw)
复制代码


As a child growing up on a dairy farm :cow:, it was remarkable to me how little
I saw cows sleeping, compared to dogs or cats. Were they okay? Are they
constantly tired and groggy? Maybe they are asleep when I’m asleep? Here, it
looks like they just don’t need very much sleep.

Next, let’s fit a classical regression model. We will use a log-scaled sleep
measure so that the regression line doesn’t imply negative sleep (even though
brains never get that large).

  1. m1_classical <- lm(log_sleep_total ~ log_brainwt, data = msleep)  
  2. arm::display(m1_classical)
  3. #> lm(formula = log_sleep_total ~ log_brainwt, data = msleep)
  4. #>             coef.est coef.se
  5. #> (Intercept)  0.74     0.04   
  6. #> log_brainwt -0.13     0.02   
  7. #> ---
  8. #> n = 56, k = 2
  9. #> residual sd = 0.17, R-Squared = 0.40
复制代码


We can interpret the model in the usual way: A mammal with 1 kg (0 log-kg)
of brain mass sleeps 100.74 = 5.5 hours per
day. A mammal with a tenth of that brain mass (-1 log-kg) sleeps
100.74 + 0.13 = 7.4 hours.

We illustrate the regression results to show the predicted mean of y and
its 95% confidence interval. This task is readily accomplished in ggplot2 using
stat_smooth(). This function fits a model and plots the mean and CI for each
aesthetic grouping of data1 in a plot.

  1. ggplot(msleep) +  
  2.   aes(x = log_brainwt, y = log_sleep_total) +  
  3.   geom_point() +
  4.   stat_smooth(method = "lm", level = .95) +  
  5.   scale_x_continuous(labels = function(x) 10 ^ x) +
  6.   labs(x = lab_lines$brain_log, y = lab_lines$sleep_log)
复制代码


This interval conveys some uncertainty in the estimate of the mean, but this
interval has a frequentist interpretation which can be
unintuitive for this sort of data.

Now, for the point of this post: What’s the Bayesian version of this kind of
visualization
? Specifically, we want to illustrate:

  • Predictions from a regression model
  • Some uncertainty about those predictions
  • Raw data used to train the model
Option 1: The pile-of-lines plot

The regression line in the classical plot is just one particular line. It’s the
line of best fit that satisfies a least-squares or maximum-likelihood objective.
Our Bayesian model estimates an entire distribution of plausible
regression lines. The first way to visualize our uncertainty is to plot our
own line of best fit along with a sample of other lines from the posterior
distribution of the model.

First, we fit a model RStanARM using weakly informative priors.

  1. library("rstanarm")

  2. m1 <- stan_glm(
  3.   log_sleep_total ~ log_brainwt,  
  4.   family = gaussian(),  
  5.   data = msleep,  
  6.   prior = normal(0, 3),
  7.   prior_intercept = normal(0, 3))
复制代码


We now have 4,000 credible regressions lines for our data.

  1. summary(m1)
  2. #> stan_glm(formula = log_sleep_total ~ log_brainwt, family = gaussian(),  
  3. #>     data = msleep, prior = normal(0, 3), prior_intercept = normal(0,  
  4. #>         3))
  5. #>  
  6. #> Family: gaussian (identity)
  7. #> Algorithm: sampling
  8. #> Posterior sample size: 4000
  9. #> Observations: 56
  10. #>  
  11. #> Estimates:
  12. #>                 mean   sd   2.5%   25%   50%   75%   97.5%
  13. #> (Intercept)    0.7    0.0  0.6    0.7   0.7   0.8   0.8   
  14. #> log_brainwt   -0.1    0.0 -0.2   -0.1  -0.1  -0.1  -0.1   
  15. #> sigma          0.2    0.0  0.1    0.2   0.2   0.2   0.2   
  16. #> mean_PPD       1.0    0.0  0.9    0.9   1.0   1.0   1.0   
  17. #> log-posterior 12.0    1.2  9.0   11.5  12.3  12.9  13.4   
  18. #>  
  19. #> Diagnostics:
  20. #>               mcse Rhat n_eff
  21. #> (Intercept)   0.0  1.0  3040  
  22. #> log_brainwt   0.0  1.0  3046  
  23. #> sigma         0.0  1.0  2862  
  24. #> mean_PPD      0.0  1.0  3671  
  25. #> log-posterior 0.0  1.0  2159  
  26. #>  
  27. #> For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
复制代码


For models fit by RStanARM, the generic coefficient function coef() returns
the median parameter values.

  1. coef(m1)
  2. #> (Intercept) log_brainwt  
  3. #>   0.7354829  -0.1263922
  4. coef(m1_classical)
  5. #> (Intercept) log_brainwt  
  6. #>   0.7363492  -0.1264049
复制代码


We can see that the intercept and slope of the median line is pretty close to
the classical model’s intercept and slope. The median line serves as the “point
estimate” for our model: If we had to summarize the modeled relationship using
just a single number for each parameter, we can use the medians.

One way to visualize our model therefore is to plot our point-estimate line
plus a sample of the other credible lines from our model
. First, we create a
data-frame with all 4,000 regression lines.

  1. # Coercing a model to a data-frame returns data-frame of posterior samples.  
  2. # One row per sample.
  3. fits <- m1 %>%  
  4.   as_data_frame %>%  
  5.   rename(intercept = `(Intercept)`) %>%  
  6.   select(-sigma)
  7. fits
  8. #> # A tibble: 4,000 × 2
  9. #>    intercept log_brainwt
  10. #>        <dbl>       <dbl>
  11. #> 1  0.7529824  -0.1369554
  12. #> 2  0.7243708  -0.1266290
  13. #> 3  0.7575502  -0.1171410
  14. #> 4  0.7855554  -0.1031353
  15. #> 5  0.6327073  -0.1795992
  16. #> 6  0.6474521  -0.1714347
  17. #> 7  0.7512467  -0.1155559
  18. #> 8  0.7363273  -0.1162038
  19. #> 9  0.7490401  -0.1276618
  20. #> 10 0.7238091  -0.1305896
  21. #> # ... with 3,990 more rows
复制代码


We now plot the 500 randomly sampled lines from our model with light,
semi-transparent lines.

  1. # aesthetic controllers
  2. n_draws <- 500
  3. alpha_level <- .15
  4. col_draw <- "grey60"
  5. col_median <-  "#3366FF"

  6. ggplot(msleep) +  
  7.   aes(x = log_brainwt, y = log_sleep_total) +  
  8.   # Plot a random sample of rows as gray semi-transparent lines
  9.   geom_abline(aes(intercept = intercept, slope = log_brainwt),  
  10.               data = sample_n(fits, n_draws), color = col_draw,  
  11.               alpha = alpha_level) +  
  12.   # Plot the median values in blue
  13.   geom_abline(intercept = median(fits$intercept),  
  14.               slope = median(fits$log_brainwt),  
  15.               size = 1, color = col_median) +
  16.   geom_point() +  
  17.   scale_x_continuous(labels = function(x) 10 ^ x) +
  18.   labs(x = lab_lines$brain_log, y = lab_lines$sleep_log)
复制代码


Each of these light lines represents a credible prediction of the mean across
the values of x. As these line pile up on top of each other, they create an
uncertainty band around our line of best fit. More plausible lines are more
likely to be sampled, so these lines overlap and create a uniform color around
the median line. As we move left or right, getting farther away from the mean of
x, the lines start to fan out and we see very faint individual lines for some
of the more extreme (yet still plausible) lines.

The advantage of this plot is that it is a direct visualization of posterior
samples—one line per sample. It provides an estimate for the central tendency
in the data but it also converys uncertainty around that estimate.

This approach has limitations, however. Lines for subgroups require a little
more effort to undo interactions. Also, the regression lines span the whole x
axis which is not appropriate when subgroups only use a portion of the x-axis.
(This limitation is solvable though.) Finally, I haven’t found good defaults
for the aesthetic options: The number of samples, the colors to use, and the
transparency level. One can lose of lots and lots and lots time fiddling with
those knobs!



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:predict Brain sleep Does mass different library number humans sleep

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
h2h2 发表于 2016-11-24 10:16:16 |只看作者 |坛友微信交流群
谢谢分享

使用道具

藤椅
minixi 发表于 2016-11-24 11:39:22 |只看作者 |坛友微信交流群
谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-25 05:00