algorithm - Similarity measurement of time-sequenced data with different length -
consider following data.
groundtruth | dataset1 | dataset2 | dataset3 datapoints|time | datapoints|time | datapoints|time | datapoints|time |0 | |0 | |0 | |0 b |10 | b |5 | b |5 | b |13 c |15 | c |12 | c |12 | c |21 d |25 | d |22 | d |14 | d |30 e |30 | e |30 | e |17 | | | f |27 | | | g |30 |
visualized (as in number of - between each identifier):
time -> groundtruth: a|----------|b|-----|c|----------|d|-----|e dataset1: a|-----|b|-------|c|----------|d|--------|e dataset2: a|-----|b|-------|c|--|d|---|e|----------|f|---|g dataset3: a|-------------|b|--------|c|---------|d
my goal compare datasets groundtruth. want create function generates similarity measurement between 1 of datasets , groundtruth in order evaluate how segmentation algorithm is. segmentation algorithm consist of equal number of datapoints(segments) groundtruth illustrated datasets not guarantee, neither number of datapoints known ahead of time.
i've created jacard index generate basic evaluation score. looking evaluation method punish abundance/absence of datapoints limit distance correct datapoint. is, b doesn't have match b, has close correct datapoint.
i've tried dynamic programming method introduced penalty removing or adding datapoint distance penalty move closest datapoint. i'm struggling though, due to: 1. need limit each datapoint 1 correct datapoint 2. figure out datapoint delete if needed 3. general lack of understanding in how implement dp algorithms
anyone have ideas how this? if dynamic programming way go, i'd love link recommendation pointers in how go it.
basically, can modify dp levenshtein edit distance compute distances problem. levenshtein dp amounts finding shortest paths in acyclic directed graph looks this
*-*-*-*-* |\|\|\|\| *-*-*-*-* |\|\|\|\| *-*-*-*-*
where arcs oriented left-to-right , top-to-bottom. dag has rows numbered 0 m , columns numbered 0 n, m length of first sequence, , n length of second. lists of instructions changing first sequence second correspond one-to-one (cost , all) paths upper left lower right. arc (i, j) (i + 1, j) corresponds instruction of deleting ith element first sequence. arc (i, j) (i, j + 1) corresponds instruction of adding jth element second sequence. arc (i, j) corresponds modifying ith element of first sequence become jth element of second sequence.
all have quadratic-time algorithm problem define cost of (i) adding datapoint (ii) deleting datapoint (iii) modifying datapoint become datapoint , compute shortest paths on dag in 1 of ways described wikipedia.
(as aside, algorithm assumes never profitable make modifications "cross over" 1 another. under mild assumption modification costs, assumption superfluous. if you're interested in more details, see answer of mine: approximate matching of 2 lists of events (with duration) .)
Comments
Post a Comment