c# - Removing duplicates from datatable -
i'm trying remove duplicates in datatable similar this question. however, when need on ordered dataset, 1 of criteria time 1 of columns, , need earliest time instance remain.
i came across question on ordered lists datatable, i'm not sure how combine two.
basically, i'm reading file dataset, want sort on time , 3 other columns, , delete duplicates leaving earliest time instance. columns in question name (int), phone number (long), time (int) , location (string). if name, phone , location duplicated, remove after first (earliest) time.
dsholdingset.tables["filedata"].columns.add("location", typeof(string)); dsholdingset.tables["filedata"].columns.add("name", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("time", typeof(int)); dsholdingset.tables["filedata"].columns.add("phone", typeof(long)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(long)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(string));
that's table definition, add rows validate lines in file.
what want group rows distinct values. if want use linq against datatable, easiest way using built-in datatable.asenumerable()
extension method. returns ienumerable<datarow>
you.
once we've got that, need construct comparable object out of composite of 3 values. here used approach of string concatenation, because strings easy compare. there other ways this, 1 simple:
name|phone|location
this produces sequence of igrouping<string, datarow>
. each grouping ienumerable<datarow>
represents subset group. if sort each grouping object time, , pull first 1 off, that's first row.
here's complete code.
var rows = dsholdingset.tables["filedata"].asenumerable() .groupby(row => string.format("{0}|{1}|{2}", row.field<string>("name"), row.field<string>("phone"), row.field<string>("location")) .select(group => group.orderby(row => row.field<timespan>("time")).first());
some other notes - phone
should string, not long; unless time
represents other kind of measure haven't gone into, should either timespan or datetime. first thing want when loading data set manipulate coerce data robust , correct data types - makes actual manipulation easier. can deconvert if need after it's done.
Comments
Post a Comment