python - Using TotalOrderPartitioner in Hadoop streaming -
i'm using python hadoop streaming project, , need similar functionality provided totalorderpartitioner , inputsampler in hadoop, is, need sample data first , create partition file, use partition file decide k-v pair go reducer in mapper. need in hadoop 1.0.4.
i find hadoop streaming examples keyfieldbasedpartitioner , customized partitioners, use -partitioner option in command tell hadoop use these partitioners. examples found using totalorderpartitioner , inputsampler in java, , need use writepartitionfile() of inputsampler , distributedcache class job. wondering if possible use totalorderpartitioner hadoop streaming? if possible, how can organize code use it? if not, practical implement total partitioner in python first , use it?
did not try, taking example keyfieldbasedpartitioner , replacing:
-partitioner org.apache.hadoop.mapred.lib.keyfieldbasedpartitioner
with
-partitioner org.apache.hadoop.mapreduce.lib.partition.totalorderpartitioner
should work.
Big data in hadoop is the interesting topic and to get some important information. Big data hadoop online training India
ReplyDelete