WinBUGS

Karine Poitrineau · Messagepar **Karine Poitrineau** » 11 Juil 2006, 12:22

Bonjour à tous,

WinBUGS peut être utilisé dans R (soit grâce à une librairie, soit grâce à une fonction: http://www.mrc-bsu.cam.ac.uk/bugs/winbu ... te14.shtml).

Est-ce que parmi vous certains ont testé l'utilisation de WinBUGS, et est-ce que vous sauriez comment faire pour ajuster un modèle logistique hiérarchique?

K.
PS: j'espère ne pas être trop lourde avec mes questions
PS2: j'espère pouvoir à mon tour aider les participants à ce site, mais mes connaissances de R, et en statistiques, restent lacunaires (un peu comme de la dentelle...)

Renaud Lancelot · Messagepar **Renaud Lancelot** » 11 Juil 2006, 12:48

J'ai fait ça une fois, et il y a longtemps: j'ai oublié...

Par paresse, je continue à utiliser MLwiN, qui dispose d'ailleurs d'interface vers WinBugs.

La fonction mcmcsamp du package lme4 (Matrix en fait) est aussi censée pouvoir faire ce genre de chose, mais ne marche pas actuellement avec les glmm.

Il y a aussi le package BRugs qui fait l'inverse de ce que tu décris:

BRugs is a collection of R functions that allow users to analyse graphical models using MCMC techniques. Most of the R functions in BRugs provide a interface to the BRugs dynamic link library (shared object file). The BRugs dynamic link library is able to make use of many of the WinBUGS components, in particular those components concerned with graphical models and MCMC simulation. BRugs lacks the GUI interface of WinBUGS but is able to use R to create graphical displays of the MCMC simulation. BRugs uses the same model specification language as WinBUGS and the same format for data and initial values. However BRugs always uses plain text files for input inplace of WinBUGS compound documents. The BRugs functions can be split into two groups: those associated with setting up and simulating the graphical model and those associated with making statistical inference. In general the R functions in BRugs correspond to the command buttons and and text entry fields in the menus of WinBUGS. Each WinBUGS text entry field splits into two R functions, one to set the quantity and the other to get the value of the quantity.

Il y a des exemples traités dans l'aide...

Bon courage,

Renaud

Renaud Lancelot · Messagepar **Renaud Lancelot** » 21 Juil 2006, 05:54

La toute dernière version du package Matrix (0.995-12) / lme4 permet d'ajuster des modèles glmm par mcmc. Il y a un petit bug d'affichage faisant qu'une ligne vide est envoyée à la console à chaque échantillonnage mais les résultats semblent cohérents. Exemple avec ce jeu de données (mettre dans un fichier et sourcer ce fichier avec source()):

"Data" <-
structure(list(herd = structure(as.integer(c(1, 1, 1, 1, 2, 2,
2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12,
13, 13, 13, 13, 14, 14, 14, 14)), .Label = c("ASKA", "BAFU",
"BEDE", "BRTA", "EJAK", "FAGU", "GARE", "GEGU", "IFBE", "MABO",
"MUKE", "SASA", "WAAY", "WABK"), class = "factor"), incidence = as.integer(c(2,
3, 4, 0, 3, 1, 1, 8, 2, 0, 2, 2, 0, 2, 0, 5, 0, 0, 1, 3, 0, 0,
1, 8, 1, 3, 0, 12, 2, 0, 0, 0, 1, 1, 0, 2, 0, 5, 3, 1, 2, 1,
0, 0, 1, 2, 0, 0, 11, 0, 0, 0)), size = as.integer(c(14, 12,
9, 5, 22, 18, 21, 22, 16, 16, 20, 10, 10, 9, 6, 18, 25, 24, 4,
17, 17, 18, 20, 16, 10, 9, 5, 34, 9, 6, 8, 6, 22, 22, 18, 22,
25, 27, 22, 22, 10, 8, 6, 5, 21, 24, 19, 23, 19, 2, 3, 2)), period = structure(as.integer(c(1,
2, 3, 4, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,
4, 1, 2, 3, 4, 1, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,
4, 1, 2, 3, 4, 1, 2, 3, 4)), .Label = c("1", "2", "3", "4"), class = "factor")), .Names = c("herd",
"incidence", "size", "period"), row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37",
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48",
"49", "50", "51", "52"), class = "data.frame")

Exemple:

Code : Tout sélectionner

> library(lme4)
> fm1 <- lmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
+             family = binomial, data = Data, method = "Laplace")
> samp1 <- mcmcsamp(fm1, n = 10000)

[en vrai, 10000 lignes "blanches" après la commande précédente !]

Code : Tout sélectionner

> library(coda)
> fm1
Generalized linear mixed model fit using Laplace 
Formula: cbind(incidence, size - incidence) ~ period + (1 | herd) 
Data: Data 
 Family: binomial(logit link)
      AIC      BIC    logLik deviance
 1126.129 1135.885 -558.0644 1116.129
Random effects:
 Groups Name        Variance Std.Dev.
 herd   (Intercept) 0.67803  0.82343 
number of obs: 52, groups: herd, 14

Estimated scale (compare to 1)  4.173388 

Fixed effects:
             Estimate Std. Error  z value  Pr(>|z|)    
(Intercept) -1.444974   0.224479  -6.4370 1.219e-10 ***
period2     -0.635619   0.075893  -8.3752 < 2.2e-16 ***
period3     -1.122275   0.092261 -12.1641 < 2.2e-16 ***
period4     -1.111207   0.104065 -10.6781 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Correlation of Fixed Effects:
        (Intr) perid2 perid3
period2 -0.100              
period3 -0.082  0.290       
period4 -0.070  0.251  0.200
> summary(samp1)

Iterations = 1:10000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                    Mean      SD  Naive SE Time-series SE
(Intercept)      -1.4464 0.25275 0.0025275      0.0029053
period2          -0.6355 0.07463 0.0007463      0.0009293
period3          -1.1234 0.09171 0.0009171      0.0010549
period4          -1.1157 0.10504 0.0010504      0.0011141
log(herd.(In))   -0.2379 0.41636 0.0041636      0.0045673
deviance       1077.2544 7.82955 0.0782955      0.0948178

2. Quantiles for each variable:

                    2.5%       25%       50%        75%     97.5%
(Intercept)      -1.9470   -1.6085   -1.4455   -1.28420   -0.9449
period2          -0.7808   -0.6855   -0.6359   -0.58527   -0.4886
period3          -1.3044   -1.1865   -1.1233   -1.06043   -0.9456
period4          -1.3259   -1.1875   -1.1144   -1.04289   -0.9127
log(herd.(In))   -0.9844   -0.5223   -0.2663    0.02647    0.6459
deviance       1063.9922 1071.6625 1076.6523 1082.11553 1094.3923

> autocorr.diag(samp1)
        (Intercept)      period2      period3      period4 log(herd.(In))      deviance
Lag 0   1.000000000  1.000000000  1.000000000 1.0000000000   1.0000000000  1.000000e+00
Lag 1   0.129161303  0.153393971  0.127459466 0.1550942328   0.1020060804  8.238593e-02
Lag 5   0.010755982 -0.017558211  0.020079728 0.0143746704  -0.0030298516 -1.672269e-05
Lag 10 -0.007951045  0.003324626 -0.003093759 0.0049683503  -0.0003803623 -1.684084e-02
Lag 50 -0.013348585  0.023293578 -0.008232808 0.0007363334  -0.0221778923 -1.729875e-02

Je suis d'ailleurs "scotché" par la similitude des estimateurs. La variance mcmc de l'effet aléatoire est quand même plus grande que celle obtenue avec Laplace:

> exp(-0.2379)
[1] 0.7882815

(contre 0.67803 avec Laplace).

Happy MCMC...

Renaud

Emilien Luquet · Messagepar **Emilien Luquet** » 19 Avr 2007, 08:00

Je me permet de reprendre ce poste.
J'essaie actuellement de comprendre la fonction mcmcsamp afin d'analyser les effets des facteurs d'un modèle mixte (avec lmer).
Un lien utile que j'ai découvert lors de mes recherches : http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests
En utilisant la fonction proposée par D. Bates :

Code : Tout sélectionner

mcmcpvalue <- function(samp)
{
   ## elementary version that creates an empirical p-value for the
   ## hypothesis that the columns of samp have mean zero versus a
   ## general multivariate distribution with elliptical contours.

   ## differences from the mean standardized by the observed
   ## variance-covariance factor
   std <- backsolve(chol(var(samp)),
                    cbind(0, t(samp)) - colMeans(samp),
                    transpose = TRUE)
   sqdist <- colSums(std * std)
   sum(sqdist[-1] > sqdist[1])/nrow(samp)
}

il est possible de calculer la significativité du facteur à partir des simulations.
Voilà ce que j'obtiens pour le facteur periode:

Code : Tout sélectionner

> HPDinterval(samp1)
                   lower      upper
(Intercept)    -1.842052 -0.8581093
period2        -1.670072 -0.4581206
period3        -1.934440 -0.6137108
period4        -2.467452 -0.7817338
log(herd.(In)) -2.221310  0.4081371
attr(,"Probability")
[1] 0.95
> mcmcpvalue(as.matrix(samp1)[,2:4])
[1] 0

Le facteur période a donc un effet significatif.
Cependant, j'aimerai être sûr de comprendre ce que je fais.
La fonction mcmcsamp simule 10000 fois la valeur des coefficients du modèle (comment ? à partir de quoi ? j'ai du mal à saisir).
Ensuite pour chaque coef, on calcule les intervalles de confiance à 95%, puis on teste la probabilité que la moyenne des coef (obtenue par simulation) d'un facteur est différente de 0.
Est ce que c'est ça ? Je veux tout savoir !
Merci

Stéphane Laurent · Messagepar **Stéphane Laurent** » 26 Avr 2007, 20:10

Bonjour,

Pour répondre à Karine, peut-être que le package MCMCpack : http://mcmcpack.wustl.edu/wiki/index.php/Main_Page contient des fonctions pour cela (mais c'est peut-être de ça dont parle Renaud)

Groupe des utilisateurs du logiciel R

WinBUGS

WinBUGS

Qui est en ligne