python - Using TotalOrderPartitioner in Hadoop streaming -


i'm using python hadoop streaming project, , need similar functionality provided totalorderpartitioner , inputsampler in hadoop, is, need sample data first , create partition file, use partition file decide k-v pair go reducer in mapper. need in hadoop 1.0.4.

i find hadoop streaming examples keyfieldbasedpartitioner , customized partitioners, use -partitioner option in command tell hadoop use these partitioners. examples found using totalorderpartitioner , inputsampler in java, , need use writepartitionfile() of inputsampler , distributedcache class job. wondering if possible use totalorderpartitioner hadoop streaming? if possible, how can organize code use it? if not, practical implement total partitioner in python first , use it?

did not try, taking example keyfieldbasedpartitioner , replacing:

-partitioner org.apache.hadoop.mapred.lib.keyfieldbasedpartitioner

with

-partitioner org.apache.hadoop.mapreduce.lib.partition.totalorderpartitioner

should work.


Comments

  1. Big data in hadoop is the interesting topic and to get some important information. Big data hadoop online training India

    ReplyDelete

Post a Comment

Popular posts from this blog

c++ - How to add Crypto++ library to Qt project -

c++ - Serialize a class with a Qlist of custom classes as member (using QDataStream) -

Read video using VideoReader function in Matlab? -