hadoop - Error running a MapReduce streaming job using Python -
i'm trying run mapper , reducer code (*disclaimer - part of solution training course)
mapper.py
import sys line in sys.stdin: data = line.strip().split("\t") if len(data) == 6: date, time, store, item, cost, payment = data print "{0}\t{1}".format(1, cost)
reducer.py
import sys stotal = 0 trans = 0 line in sys.stdin: data_mapped = line.strip().split("\t") if len(data_mapped) != 2: continue stotal += float(data_mapped[1]) trans += 1 print transactions, "\t", salestotal
keeps throwing error:
undef/bin/hadoop job -dmapred.job.tracker=0.0.0.0:8021 -kill job_201404041914_0012 14/04/04 23:13:53 info streaming.streamjob: tracking url: http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201404041914_0012 14/04/04 23:13:53 error streaming.streamjob: job not successful. error: na 14/04/04 23:13:53 info streaming.streamjob: killjob... streaming command failed!
i've tried both explicitly calling python function , specifying python interpreter. (i.e. /usr/bin/env python)
any idea going wrong?
the job failing, because reducer.py
, has syntax error.
the problem line:
print transactions, "\t", salestotal
there no variables name transactions
, salestotal
.
if execute locally, error:
traceback (most recent call last): file "r.py", line 14, in <module> print transactions, "\t", salestotal nameerror: name 'transactions' not defined
the correct line should be:
print trans, "\t", stotal
Comments
Post a Comment