pandas

Stata-to-pandas utilities, used in nbstata.browse

Better stata-to-pandas

Improves on pystata.stata.pdataframe_from_data (and pystata.stata.pdataframe_from_frame) by making the pandas DataFrame index correspond to Stata observation numbers and also providing an option to output numeric values as strings using their Stata formats.


source

better_dataframe_from_stata

 better_dataframe_from_stata (stfr, var, obs, selectvar, valuelabel,
                              missingval, sformat)

source

better_pdataframe_from_data

 better_pdataframe_from_data (var=None, obs=None, selectvar=None,
                              valuelabel=False, missingval=None,
                              sformat=False)

source

better_pdataframe_from_frame

 better_pdataframe_from_frame (stfr, var=None, obs=None, selectvar=None,
                               valuelabel=False, missingval=None,
                               sformat=False)
run_sfi("""\
sysuse uslifeexp2, clear
replace le = . if _n==5
replace year = 2022 if year==1900
format year %-ty
gen str_var = "test string" if _n!=5
gen date = dofy(year)
format date %td
gen double date_tc = cofd(date)
format date_tc %tc
gen double date_tC = Cofd(date)
format date_tC %tC""")
better_pdataframe_from_data().head()
(US life expectancy, 1900–1940)
(1 real change made, 1 to missing)
(1 real change made)
(1 missing value generated)
year le str_var date date_tc date_tC
1 2022 47.299999 test string 22646.0 1.956614e+12 1.956614e+12
2 1901 49.099998 test string -21549.0 -1.861834e+12 -1.861834e+12
3 1902 51.500000 test string -21184.0 -1.830298e+12 -1.830298e+12
4 1903 50.500000 test string -20819.0 -1.798762e+12 -1.798762e+12
5 1904 NaN -20454.0 -1.767226e+12 -1.767226e+12
better_pdataframe_from_data(sformat=True).head()
year le str_var date date_tc date_tC
1 2022 47.3 test string 01jan2022 01jan2022 00:00:00 01jan2022 00:00:00
2 1901 49.1 test string 01jan1901 01jan1901 00:00:00 01jan1901 00:00:00
3 1902 51.5 test string 01jan1902 01jan1902 00:00:00 01jan1902 00:00:00
4 1903 50.5 test string 01jan1903 01jan1903 00:00:00 01jan1903 00:00:00
5 1904 . 01jan1904 01jan1904 00:00:00 01jan1904 00:00:00