<TITLE: Signal Processing
ACADEMIC DOMAIN: technology
DISCIPLINE: information technology
EVENT TYPE: seminar presentation
FILE ID: USEMP06B
NOTES: continuation of and continued in USEMD140, presentation interspersed with questions

RECORDING DURATION: 48 min 57 sec

RECORDING DATE: 13.4.2005

NUMBER OF PARTICIPANTS: 11

NUMBER OF SPEAKERS: 3

S2: NATIVE-SPEAKER STATUS: Romanian; ACADEMIC ROLE: senior staff; GENDER: male; AGE: 31-50

S3: NATIVE-SPEAKER STATUS: Romanian; ACADEMIC ROLE: senior staff; GENDER: male; AGE: 31-50

S5: NATIVE-SPEAKER STATUS: Romanian; ACADEMIC ROLE: research student; GENDER: male; AGE: 24-30

SU: unidentified speaker>



<S5 USES POWERPOINT THROUGHOUT PRESENTATION>

<S5> <SIGH> er okay (xx) told you my name is <NAME S5> and i represent the second part of the chapter today , from the book , the second part have er has two main er parts realistic more realistic evolution evolution and models and probabilistic interpretation of various models <P:08> i will start with er first part first part models er realistic evolution in models and er er sample problem of this is models with different rates at different sites so basic strategy of maximum latitude is to pick a tree and a set of lengths and compute the likelihood over all (cycles) in the foreign formula , er in 1993 yang suggested introduction of this cyclic (xx) R , R-U that (change) all the , <SIGH> theories of (any) things that there should be er foreign . so he introduces er (dilemma R-U) <P:09> then R-U is in generally unknown so the best strategy is to do (question) (xx) (prime) for each R-U , yang proposes a so- a solution for this so he decided to use er gamma distribution as a (prime) this distribution has mean one invariance one over alpha er , for alpha large this allows er tight distribution for er er alpha small a broad one so the formula becomes , to this (including an) integral over R , then he improved this er solution because integrals are computationally slow so he replaced integral with a sum , so the , there is another (xx) there should be infinitive , so the interval (zero infinitive) is subdivided into (xx) intervals . er this is the final formula , and yang found that the this formula works well for (M three or four) <P:16> so the case presented above works well only if the data data are plentiful , for smaller amounts of data a solution would be to (xx) variable alpha from largely the set of (xx) given to this , er felsenstein and churchill in 1996 proposed an algorithm simil- similar to yang's but in the hidden markov model format <P:11> er <SIGH> so here are some o- other difference between felsenstein and churchill algorithm and the forward algorithm for HMM's so the parse of the model is a set of choices of grades rather than an alignment of sequence to to the model the probabilities are not initials probability from a state where's sum (to one) but likelihood's for the whole set of sequences that's (er) , er formally having the equation for this algorithm and these are the only changes so this term is said to be the likelihood at (xx) for rate L and this is said to be the transition probability from rate K to rate L <P:08> another problem for improving the models is er to use models that allows (depths) er er (parse) solution wi- would be to treat er this chapter as an extra chapter and er substitution matrix will be replaced with one with er size K plus one by K plus one , allison wallace and yee in 1992 proposed a better model model by introducing the (length) or (xx) states states , er this this configuration (xx) , another approach is proposed by thorne kishino and felsenstein but this was applied only for the case of two sequences </S5>
<S2> just if i may stop you here to , there's a reference , you mentioned allison wallace and yee wallace is er , as famous as <NAME> is er , maximum er er </S2>
<S3> [MMM] </S3>
<SU> [(xx) minimum message] </SU>
<S2> message yes it's very similar to MDL and it's interesting that they study s- so er biological progress phylogenies and affine types , and allison is still very active in this phylogeny , but they are basically er computer scientists not biologists </S2>
<S5> erm in the next slides i will present a mo- er a model that allows to find the affinities but handled in a computational and reasonable way , so the assumptions are sequences can be aligned to er an HMM with an R but that's just simpler than to (that) of the profile HMM proposed by krogh krogh and all in 1994 , so this er model has only match (xx) er states , erm . here er each sequence (in each) , there we have er in this (affine) state . here is . here is the good example and these are two states and the last phrase was (above) each (xx) and , some receiving , so ha- we have this tree er er sequence X and er answers (that you get per Y) er so both starts from match states (xx) position table starts from match state er in er position K plus one they will have er a different start position for the next er . i will come back here , here are some facts so in that position K it acts in its resi- (residual) X , Y and Y (xx) as the next state we have er substitution er probability on the substitution of (xx) . er i i use some (plugging) for that allow allows me to use er (xx) in (xx) so that's why i have some [(mistake)] </S5>
<S2> [so you] you have to translate this er exclamation mark as probably less than or </S2>
<S5> no this exclamation means er er the problem with (xx) from A-Y-J to X-I and here there could be er so <WRITING ON BLACKBOARD, P:32> and the probability is <WRITING ON BLACKBOARD, P:26> so this exclamation should be (xx) with the M . so in (F) position K we have er this probability then er add transition from K from K plus one er <P:12> yes er if here should be transition from K plus one to K plus , you can see . er so we have er this transition X builds from match state to the (direct) state Y builds from match state to match state , we have a probability of substitutional transitions , erm so here is er the transitional for Y there is a transitional X , and er , here we have a transition from K plus one to K plus two both er er X and Y er starts from different states X starts from the last state and Y starts from match state so we have this transition for X and this transition for Y , er in this case considered that erm , <COUGH> er X has an in- independent parse so here we use a prior for for it (in the end) <P:08> so i'm back to this , so this is a position K both starts from M , er X er will be the (last) state we have (xx) here and will be back in the next state on (xx) er Y builds from match to match and match to match , so h- how is it works so at posi- position K X (disturb) er er Q-Y-J is root prior for Y , K plus one adds to this transition probability and er K plus two adds this probability , (you see) two primes because er the sequences starts from different (origins) </S5>
<S2> so ma- maybe (xx) er still quite er difficult er to see what is happening here , that model was built (years) before it's just er an alignment model <S5> [yes (xx)] </S5> [and then (through alignment)] sequence X to the model and you align then sequence Y to the model <S5> yes </S5> and you have the states how how it happened the alignment er , but now you you start to have of course some other probabilities those of if (anyone) (xx) (time) so those are i don't see very very well how those are built so we know very well how to build HMM for alignment of sequences but the how those probabilities which use their (xx) is it any er you mentioned here so how do you switch from a M-M match match state to an M-E state </S2>
<S5> er i've seen they give only this . this matrix </S5>
<S2> so that's it's something you should have , from some sources or <S5> (well) </S5> there is no way to construct if it's nothing said here how it was done this matrix only that you use it </S2>
<S5> yes and er this is for (the real) (xx) </S5>
<S2> there should be some real training this from some data if you have some training data as usual , that is the main difference between the old HMM and this new HLM that you have an extra (xx) transition </S2>
<S3> yeah and not (xx) do not sum to one </S3>
<S5> er , don't know [no no] </S5>
<S3> [so it's] like er like a normal transition for the written matrix </S3>
<S2> yeah that is not a strictly (liability function) it's just a some <S3> yeah </S3> pseudo <S3> [yeah yeah exactly] </S3> [liability pseudo] transition matrix </S2>
<S5> well these are some priors and these are for one for example this case it's the probability of X building from match to match when Y is building from match to match it's the probability of M er M er Y is building from match to match when X is building [from match to] </S5>
<S2> [so e- each of them] sums up one in in the interior <S3> in the blocks </S3> in the blocks while for the (xx) (sum up differently) very small and (anyway) really small </S2>
<P:14>
<S5> erm so here the there's a modelling example we have a four four related trees , and er here here is the other transition and we have er here here we have the last transition for X one here last transition for X two last transition for X three and last transition for X four , and er X five six and X seven are considered some ancestors of these (xx) , and here is , er so from this tree you're obtaining practically three three trees . so here is er X four example and er X four sta- the last transition for X four is from E to M and er the last transition for each ancestor is er X six and it er goes from M to M so in this case we can er cut this er this er H and we'll have er X four as a root for another tree , so we can do the same for this tree also we have for X three , we have er er transition M to M for X for X er six (xx) transition and X seven the same , erm it's a similar way of thinking for this tree so the probability of the whole tree can be er computed using three terms the probability of er each of these three trees . this er this term is intr- <SIC> introducted </SIC> by this tree this term is <SIC> introducted </SIC> by this and this is from this <P:18> <SIGH> so this algorithm is much slower than forward algorithm for (xx) HMMs because it needs to track (xx) of the proceeding state used by a (parse) and the state is used by its (ancestor parse) , the computation procedure exponentially read the number of sequences so it then is suitable only for a small number of sequences <P:07> er another problem discussed is er how can we evaluate a model when we can say that a model is better than the other so if we consider M two a more complex model than M one the maximum latitude of M two should be larger than the maximum latitude of M one but this is true only if M one is er contains M one as M two contains M one as a special case , but in fact M two may be a poorer model er if likelihood is non-manageable only for a very narrow range of parameters , so they propose er another solution for comparing two two models using the this kind of probabilities so here is (percentage) <WRITING ON BLACKBOARD, P:10> (xx) so here is the data M is the model <P:13> this is the way with we compute er these probabilities for those two models er this kind of probability is called (evidence) of the model given the data so each er model has a number of parameters for example this has . there should be theta M and here phi M of course the number of parameters may differ for each models <P:14> so a natural way to compare two models is that you (do) into account their prior probabilities and compute their posterior probability , er this is the formula an alternative method is er to consider the maximum attribute of the data for the model and (xx) the maximum likelihood of the data for the model to er these maximum likelihoods are then (xx) then we consider theta as <SIGH> difference between logarithm from maximums <P:10> so <COUGH> data is not itself a good good er evaluating er method but if we si- simulate data sets from data sets D-I from M one using the values of the parameters of M one that give the maximum likelihood for D you can ask where the distr- distribution of M's for the simulated sets er show the original gravity to be (typical) so very much (lie) within that five per cent (xx) for example , so if delta exists almost all the delta I the model M two is better than M one so M one can be rejected , this method is called parametric parametric bootstrap and it's more (xx) than (plain) bootstrap </S5>
<S2> w- what do you (xx) parametrically (plain) bootstrap </S2>
<S5> well (xx) i think it's er <COUGH> it's a matter presented before and this is </S5>
<S2> so is it they call (plain) bootstrap the case like this you take your data <S5> [no no] </S5> [but start] er sending out a bit only some columns then you get a set of data then (several) time you can sample other columns other columns and so on that's what we call (plain) bootstrap so from the real data er which you have you take a subcollection and you may get it in many ways , while parametric bootstrap is you first er estimate the most ideal model (of) the data and then generate with that dat- that model (xx) that that would be parametric er bootstrap (and the) realisations </S2>
<S3> realisations yes </S3>
<S2> those parameters , this is (i think we should) discuss only about parameters not about er (data) </S2>
<P:08>
<S5> so then i give another example here . so we we have two models this with only one parameter and this with two parameters and er we have only two types of (xx) <P:11> <COUGH> , <SIGH> so which was for er which was er basic set of databytes sent in from M two using the (discrete) for the parameters , we choose M M equal 500 , er yes and very (xx) they are all (xx) ranked and each has equal probability , so we have (M-A-A) and (M-A-E) the number of the (xx) and these here are (xx) and M- (xx) and M- (xx) the numbers of A's and D's here er , er the values of P one and P two are quite close so normally the data should be not (evaluated) (xx) (small) so we should find M two is better , so given the dataset in the maximum likelihood of E four M one and simulate one hundred sets of data (maximum) (xx) <P:13> here they give the histogram for the data they obtained and this line is delta obtained for for the basic basic dataset , so from this histogram you can see that er majority of delta five are under this limit so M one is not a better model than M two <P:10> so here they simulate the one hundred sets from data from M two here are the formulas for , er and er <COUGH> yes here this should be (xx) this (example) <P:08> so having er these probabilities we we can obtain a- and er posterior probability for model M two </S5>
<S3> what means better here in this er equations </S3>
<P:10>
<S5> i don't know i think it's a constant </S5>
<S3> be- [because] </S3>
<S5> [(i thought)] it's a (xx) factor but i don't know or </S5>
<S3> because er the integral it's er (xx) is a better function so it's a little bit confusing </S3>
<P:07>
<S5> no they don't give [more] </S5>
<S3> [or er] <P:13> because my expectation is that er the integral we have there is from zero to one , the limit of the integration are not shown but i expect that we have to have the integral from zero to one because that er P it's <S5> [yes yes] </S5> [actually a probability] which is going from zero to one so er we have integral from zero to one of P power something multiplied by one minus P power something so that's exactly the better function , and the better function (factorising) gamma functions over gamma function of the sum of arguments and gamma function er er gives er the factorials so probably this is how they compute it at least this is but er that better in the (front) is er disturbing first of all because somebody's expected to have the better function and er er that better seems to be the just a <S5> yes </S5> constant </S3>
<S5> yeah the point should be put here <P:11> here they er represent the er graphic on this X ax- axis we have er posterior probability for model M two and on this er , probability on delta I to E smaller than that they call this cox confidence value , so here we can see that with , for P M the probable the posterior probability for M two and this is higher than , 0.5 also the cox confidence value is higher than this 95 per c- per cent in <P:13> er this is the second part of the presentation , er here is a comparison of probabilistic and non-probabilistic methods , they they try probabilistic interpretation of different methods presented in this chapter , so first probabilistic implementation for parsimony , <SIGH> first given a set of substitution probabilities in B-A in which their dependence on the length is neglected , it is possible to obs- obtain a a set of substitution costs computing maximum minus (logarithm) from this (xx) <P:10> so with with the cost are used to (xx) parsimony the minimal cost of (xx) for a (xx) regarded as the viterbi approximation for (xx) probability , in HMM's the (good) probability sums over all parts viterbi in (network) finds the most probable parts , parsimony minimises the sum of the negative probabilities er minus er er PDA and finds a set of ancestor assignments that maximise the probability </S5>
<P:14>
<S2> what are you expecting by some from comparing probabilistic with non-probabilistic , which should be more general or , (what do you think) </S2>
<P:05>
<S5> <SIGH> er so i- in probabilistic ones you need more information you need to have all these probability distributions </S5>
<S2> [(so you have two) (xx) yeah] </S2>
<S5> [and basic hypotheses also] about this (xx) . so given the probabilities are from the data (are familiar then) er data here could be should be results (xx) so we say that (X-A) is minus (log from R to M) (xx) . the real advantage of parsimony is the speed you (xx) final optimisation of the HMM <SIC> maximalise </SIC> the , if parsimony is in- interpreted as a (third) effort solution to maximum likelihood , so this probability becomes this it's it's not er , functional from (two) so the parametric (xx) this can be useful but also can have unfortunate consequences <P:13> so a simple <COUGH> is <COUGH> an example here a simple matter of te- testing the performance of rebuilding algorithms is to generate probabilistic (xx) and see how often a given algorithm (xx) them correct (xx) if (xx) are true if probability (xx) then accepting your substitution to be around (xx) and you have a probability P P-A and P , sequences of length M generated by M (xx) repetition er so sequences of length M are generated by M (xx) issue this procedure , for an unrooted tree and you can (xx) . so the same probabilistic model is used also for reconstruction re- reconstruction of the tree maximum likelihood then to do this correctly how how about other algorithm <P:09> <SIGH> still we have still the next case , for simplicity we have the same they are partly (composed) from (chapters) A and B substitution matrix is the same as in the previous level we consider P 0.3 point is one (in tree) and zero one for this (xx) and zero zero nine for (xx) <P:06> er this is our tree this is an (xx) tree (very angular) (xx) lengths here are the three possible er representation of this tree , er possible topologies and these are two cases the topology T one and T two you see that on these cases parsimony er it's not working correct <P:07> so this er , way runs the algorithm one thousand times (in here) <COUGH> for M equ- M equal two thousand we have almost all all the times er working with tree T one but in place of parsimony we see that T two is equal in more of the cases <P:06> then this is why parsimony (xx) is equal for most of the cases er because here we have er two substitution here we have one substitution and parsimony tend to choose er as a the minimal number of substitutions this is T one T two <P:07> another problem er on this the second part of the presentation is maximum likelihood distances <COUGH> so we suppose three D and (xx) D one to D-M , and sample sequences (xx) the trees , this is our sample (trait) and we try to compute the distance from X one to X (xx) now <P:10> <SIGH> you see multiplicativity , we have this variation from this node to this node <P:09> and i'm using inverse (xx) and multiplicativity we can obtain the distance from here to here , so we have first probability from here to here and then probability from here to here like presented here and then we have this final probability <P:11> er here it is er generalisation of the two case presented in the two node case presented before , and based on this probability er fel- felsenstein in 1996 er presented er er a formula for maximum likelihood distance er this can be this is not the function for D it can be (irrelevant) , and for la- for large (xx) of M this distance can be approximated to the sum of the (H's) <P:08> so here is an example , using (xx) . this is how we calculate the distance from D two <P:07> so for example if we really (xx) the distance from R to M it take this plus this it's simple , do you understand . so er what we assume is that we can use maximum likelihood distances we suppose we have a multiplicative reversible model we have plenty of data the underlined probabilistic model is correct and we expect that neighbour (joining) the construct M-G (for M) . er (i have to begin with) the previous case with parsimony , we have this present table as with parsimony and this is this for maximum likelihood case and this is for (neighbour) joining case we see the <COUGH> erm neighbour joining case (plays) er (works) correct almost like maximum likelihood <P:09> and here are some more probabilistic interpretations sankoff and cedergren presented other interpretations so if we try to simultaneously align sequences at (xx) it's phylogeny by a similar (xx) substitution model they use the <COUGH> the (xx) they use are interpreted as low probabilities the procedure is additive instead of maximising but the matter seems to be not computationally practical , hein's proposes er affine cost algorithm this (xx) and align sequences and find its phylogeny by using affined (xx) scores are in this case also interpreted as low probabilities but they tried in this case also with sum and maximisation but for last case for the parse (magnitude) and this because M two had a portion of M three and X and so on so all this (xx) language is wrong <P:06> so here are some conclusion on this probabilistic , so probabilistic interpretation mean we may get to better results but not all the time also er maybe less useful because the cost maybe increased too , high neighbour joining constructs (correct tree) but only if has the acquired assumptions so , depends on the particular problem which models or which method is better </S5>
