clementine提供的贝叶斯网络模型都是基于有Y的情况下建立模型.其他类型的贝叶斯网络是没有的.
BayesiaLab是一个比较全的贝叶斯网络分析软件.
模型有如下:
1. Association Discovering:Allowing unsupervised learning to discover all the probabilistic relations in the data. And this type model includes five learning algorithms.
1.1 Maximum spanning tree
This learning algorithm is by far the quickest unsupervised learning algorithm. Indeed, it relies only on two passes. The first one consists in computing the a priori weight of all the binary relations between all the variables, the second one consists then in constructing the maximum weight spanning tree with those relations. Even if the resulting network is not optimal, it can then be used for a first imputation of the missing values, it can be used as the initial network before using Taboo or EQ, and it can also be used for the variable clustering with there is a lot of variables.
1.2 Taboo
Structural learning implementing the Taboo search in the space of the Bayesian networks. This method is particularly useful for refining a network built by human experts or for updating a network learned on a different data set. Indeed, beyond taking into account the a priori knowledge represented by a network and an equivalent number of cases, the starting point of Taboo is the current network (and not the fully unconnected network (no arc), as this is the case for SopLEQ and Taboo Order). Furthermore, arcs that are fixed (the blue one) remain unchanged and the forbidden arcs are taken in account. The temporal indices are also taken into account.
It is possible to define the size of the taboo list as well as the maximum numbers of parents and children allowed. If these options are not checked, they are not taken into account.
1.3 EQ
Search method looking for the equivalence classes of Bayesian networks. This method is very efficient because it allows avoiding a lot of local minima and to strongly reduce the size of the search space. As the Taboo algorithm does, EQ can start with the current network. Furthermore, the fixed arcs are treated as normal arcs and the forbidden arcs are taken into account.?The temporal indices are also taken into account.
1.4 SopLEQ
Search method based on a global characterization of data and on the exploitation of the equivalence properties of Bayesian networks. Arcs that are fixed (the blue ones) are treated as normal arcs but the forbidden arcs are taken into account.?The temporal indices are also taken into account.
1.5 Taboo Order
Learning method using Taboo search in the space of the order of the Bayesian network nodes. Indeed, finding the best Bayesian network for a fixed node order is an easy task that only consists in choosing the parents of a node among the nodes that appear before it in the considered order. This is the more complete search method, but also the more time consuming. Arcs that are fixed (the blue ones) are treated as normal arcs but the forbidden arcs are taken into account.?The temporal indices are also taken into account.
2.The target node Characterization: Allowing the learning of Bayesian networks that have a structure entirely dedicated to the characterization of a target variable.
2.1 Naive Bayes
Bayesian network with a predefined architecture in which the target node is the parent of all the other nodes. This structure thus states that the target node is the cause of all the other nodes and that the knowledge of its value makes each node independent of the others. In spite of these strong assumptions, which are false in the majority of the cases, the low number of probabilities to estimate makes this structure very robust, with a very short learning time as only the probabilities have to be estimated.
2.2 Augmented Naive Bayes
partially predefined structure allowing relaxing the strong constraint of conditional independence mentioned above. This architecture is made up of a naive architecture, enriched by the relations between the child nodes knowing the value of the target node (the common parent). The prediction accuracy of this algorithm is better than those obtained by the naive architecture, but the unsupervised search of the child relationships can be time consuming.
2.3 Tree Augmented Naive Bayes
partially predefined structure allowing relaxing the strong constraint of conditional independence mentioned above. This architecture is made up of a naive architecture on which a maximum spanning tree is learned. The prediction accuracy of this algorithm is better than those obtained by the naive architecture, but not as good as obtained with Augmented Naive Bayes; however, this algorithm?is much quicker than it.
2.4 Sons & Spouses
structure in which the target node is the parent of a subset of nodes having potentially other parents (spouses). This structure is to some extent an augmented naive architecture in which the children set is not fixed a priori, but searched according to the marginal dependence of the nodes on the target. This algorithm thus has the advantage of highlighting the nodes that are not correlated to the target. The learning duration is comparable with the augmented naive architecture one.
2.5 Markov Blanket Learning
algorithm that searches the nodes belonging to the Markov Blanket of the target node, i.e. fathers, sons and spouses. The knowledge of the values of each node of this subset of nodes makes the target node independent of the all the other nodes. The search of this structure, which is entirely focused on the target node, makes it possible to obtain the subset of the nodes that are really useful much more quickly than the two previous algorithms. Furthermore, this method is a very powerful selection algorithm and is the?ideal tool for the analysis of a variable: a restricted number of connected nodes, different kinds of probabilistic relations:
fathers: nodes that bring more information jointly than alone;
sons: nodes having a direct probabilistic dependence with the target;
spouses: nodes those are marginally independent of the target but which become informative when knowing the value of the son.
2.6 Augmented Markov Blanket Learning
algorithm that is initialized with the Markov Blanket structure and that uses an unsupervised search to find?the probabilistic relations that hold between each variable belonging to this Markov Blanket. This unsupervised search implies additional time cost but allows having better prediction results compared to the first version.
2.7 Minimal Augmented Markov Blanket Learning
the selection of the variables that is realized with the Markov Blanket learning algorithm is based on a heuristic search. The set of the selected nodes can then be non minimal, especially when there are various influence paths between the nodes and the target. In that case, the target analysis result?takes into account too much nodes. By applying an unsupervised learning algorithm on the set of the selected nodes, the Minimal Augmented Market Blanket learning allows reducing this set of nodes, and it results then in a more accurate target analysis.
However, if the task is a pure prediction task (as for example a scoring function), the Augmented Markov Blanket algorithm is usually more accurate than its Minimal version since it uses more "pieces of evidences".
2.8 Semi-Supervised Learning
unsupervised learning algorithm that searches the relationships between the nodes that belong to a predefined distance of the target. This distance is computed by using the Markov Blanket learning algorithm. The semi-supervised learning algorithm allows?learning a network fragment centered on the target variable. This algorithm is very useful for tasks that involve a lot of nodes, as for example in micro-arrays analysis (thousand of genes), and for prediction tasks where the Markov Blanket nodes have missing values, as these nodes do not allow to separate the target node from the other nodes anymore.
|