博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Relationship and difference among HMM, MEMM, CRF and MRF
阅读量:7021 次
发布时间:2019-06-28

本文共 4457 字,大约阅读时间需要 14 分钟。

 

由于工作语言是英语,所以这里用英语总结一下对机器学习中的几个模型:隐马尔可夫模型,最大熵隐马,条件随机场和马尔可夫随机场相互之间的关系和区别的体会。

 

     MaxEnt: Maximum-Entropy model

     HMM: Hidden Markov Model
     MEMM: Maximum-Entropy Markov model
     CRF: Conditional Random Field
     MRF: Markov Random Field
     The keywords that will be related within this article:
     conditional independence, generative model, discriminative model, undirected graph model, directed graph model, factor graph
     1. Backgroud on graph models and generative/discriminative models
     1.1) Directed vs. Undirected:
     If the model tries to model the probability dependency between the observations and states/outputs, it is directed graph.
     However, if the relation between observations and states are described by some arbitrary functions/potential functions/energy functions, it is undirected graph, because the model can not judge what is the reason and what is the result.
     Someone calls undirected graph expresses soft constraints among observations and outputs (you can NOT judge who "observation or output" decides whom "observation or output").
     Convertint directed graph to undirected graph: factoring by introducing factor node between observation and output node!!!
     NOTE: factor graph is always undirected! The observations and outputs are linked (I would like using regularized/functioned/factorized) by the factor nodes.
     1.2) Generative vs. Discriminative:
     If the model tries to model the joint probability distribution between the observations and states/outputs, it is generative model;
     however, if the model tries to model the conditional probability distribution, it is discriminative model.
     2. How to classify HMM, MEMM, CRF and MRF
     2.1)           
     The difference and relations among these 4 models can be shown in the following table:
                                                  |     generative model      |       discriminative model
     ---------------------------------------------------------------------------------------------
     directed graph model           |      HMM                        |         MEMM
     ---------------------------------------------------------------------------------------------
     undirected graph model       |           ??                       |         MaxEnt, CRF, MRF
     2.2) HMM vs. MEMM     
     So HMM and MEMM are directed graph models: HMM models "state decides observation" and this is why it is called "generative"; MEMM models "observation decides state", the other direction comparing with HMM.
     HMM is generative model because it models the joint probability p(x, y) given x is the observation and y is the state/output.
     But given a x and p(x, y), to predict p(y|x), we need as well p(x).
     MEMM is discriminative model because it models directly the conditional probability p(y|x).
     So MEMM dose not need p(x) and more over, it can use other features of the observation and even the neighbor observations, which is impossible in HMM.
     Or we can say that HMM has "very strict independence assumptions" on the observations, in another sentence, HMM dose NOT use the features of observations, single observation or the neighbor observations?
     It seems thae MEMM is much powerful and simple than HMM and CRF, but it has "label bias problem". Note that HMM does NOT have "label bias problem", because HMM is generative model.
     Now the question is: what is "label bias problem" in MEMM.
     Let's start from the format of MEMM: p(y1, y2, ..., yn|x1, x2, ..., xn) = ....p(y2|y1)p(y3|y2)p(y4|y3)..., the conditional probability p(yi|yi-1) is affected by the number of output goings of p(yi-1).
     So generally, the small number of outputs takes advantages than the big number.
     (why? because per-state normalization)
     2.3) CRF vs. MRF
     MRF is also called Markov Network, for the name, we know that the nodes in the network cannot separated clearly by observation nodes and output nodes. However, CRF does: the output is conditioned on an input sequence X (discriminate feeling, right? Yes).
   
     2.4) Maximum Entropy
     MaxEnt is the discriminative model, corresponding the naive bayes as the generative model.
     Naive bayes models the single observation and output by the joint probability, with the assumption that conditional dependence among all the outputs.
     However, Maximum Entropy models the conditional dependency between the observation and outputs. It uses the maximum entrophy function, while not use the pure conditional probablity to measure the output dependence given the observation.
      3. Generative and Undirected Graph Model
     Is there any model that is undirected graph and generative model?
     Yes, Restricted Boltzmann machine, and neutral network, for example.

转载于:https://www.cnblogs.com/yanlongpankow/p/6343221.html

你可能感兴趣的文章
linux命令初识
查看>>
禁止和允许被iframe
查看>>
用labview开发C语言的编译下载工具
查看>>
solr searcher
查看>>
paper 33 :[教程] 如何使用libsvm进行分类
查看>>
数据库应用设计设计报告
查看>>
在sz
查看>>
monkeyrunner 自动化 (通过脚本来实现对runner的控制)
查看>>
进入移动互联C2B时代
查看>>
C语言中的地址传递(传指针,传递给形参的指针仍然是实参指针的一份拷贝)
查看>>
http常见状态码
查看>>
(贪心)School Marks -- codefor -- 540B
查看>>
redis优缺点
查看>>
Sublime text 3 SVN插件及使用方法
查看>>
Jquery EasyUI datagrid 的一些问题
查看>>
nginx代理缓存
查看>>
计算器小练习
查看>>
SeekArc
查看>>
rem和em和px vh vw和% 移动端长度单位
查看>>
Ubuntu关机与重启的相关指令
查看>>