python - Pandas groupby: compute (relative) sizes and save in original dataframe -
my database structure such have units, belong several groups , have different variables (i focus on one, x, question). have year-based records. database looks like
unitid, groupid, year, x 0 1 1, 1990, 5 1 2 1, 1990, 2 2 2 1, 1991, 3 3 3 2, 1990, 10
etc. measure "intensity" variable, going number of units per group , year, , put database.
so far, doing
asd = df.drop_duplicates(cols=['unitid', 'year']) groups = asd.groupby(['year', 'groupid']) intensity = groups.size()
and intensity looks like
year groupid 1961 2000 4 2030 3 2040 1 2221 1 2300 2
however, don't know how put them old dataframe. can access them through intensity[0]
, intensity.loc()
gives locindexer not callable error.
secondly, nice if scale intensity. instead of "units per group-year", "units per group-year, scaled average/max units per group-year in year". if {t,g} denotes group-year cell, be:
that is, if simple intensity variable (for time , group) called intensity(t, g), create relativeintensity(t,g) = intensity(t,g)/mean(intensity(t=t,g=:))
- if fake code helps @ making myself clear.
thanks!
update
just putting answer here (explicitly) readability. first part solved by
intensity = intensity.reset_index() df['intensity'] = intensity[0]
it's a multi-index. can reset index calling .reset_index()
resultant dataframe. or can disable when compute group-by operation, specifying as_index=false
groupby()
, like:
intensity = asd.groupby(["year", "groupid"], as_index=false).size()
as second question, i'm not sure mean in instead of "units per group-year", "units per group-year, scaled average/max units per group-year in year".
. if want compute "intensity" intensity / mean(intensity)
, can use transform
method, like:
asd.groupby(["year", "groupid"])["x"].transform(lambda x: x/mean(x))
is you're looking for?
update
if want compute intensity / mean(intensity)
, mean(intensity)
based on year
, not year/groupid
subsets, first have create mean(intensity)
based on year
only, like:
intensity["mean_intensity_only_by_year"] = intensity.groupby(["year"])["x"].transform(mean)
and compute intensity / mean(intensity)
year/groupid
subset, mean(intensity)
derived year
subset:
intensity["relativeintensity"] = intensity.groupby(["year", "groupid"]).apply(lambda x: pd.dataframe( {"relativeintensity": x["x"] / x["mean_intensity_only_by_year"] } ))
maybe you're looking for, right?
Comments
Post a Comment