pandas

Stata-to-pandas utilities, used in nbstata.browse

Better stata-to-pandas

Improves on pystata.stata.pdataframe_from_data (and pystata.stata.pdataframe_from_frame) by making the pandas DataFrame index correspond to Stata observation numbers and also providing an option to output numeric values as strings using their Stata formats.


source

better_dataframe_from_stata


def better_dataframe_from_stata(
    stfr, var, obs, selectvar, valuelabel, missingval, sformat
):

source

better_pdataframe_from_data


def better_pdataframe_from_data(
    var:NoneType=None, obs:NoneType=None, selectvar:NoneType=None, valuelabel:bool=False, missingval:NoneType=None,
    sformat:bool=False
):

source

better_pdataframe_from_frame


def better_pdataframe_from_frame(
    stfr, var:NoneType=None, obs:NoneType=None, selectvar:NoneType=None, valuelabel:bool=False,
    missingval:NoneType=None, sformat:bool=False
):
run_sfi("""\
sysuse uslifeexp2, clear
replace le = . if _n==5
replace year = 2022 if year==1900
format year %-ty
gen str_var = "test string" if _n!=5
gen date = dofy(year)
format date %td
gen double date_tc = cofd(date)
format date_tc %tc
gen double date_tC = Cofd(date)
format date_tC %tC""")
better_pdataframe_from_data().head()
(US life expectancy, 1900–1940)
(1 real change made, 1 to missing)
(1 real change made)
(1 missing value generated)
year le str_var date date_tc date_tC
1 2022 47.299999 test string 22646.0 1.956614e+12 1.956614e+12
2 1901 49.099998 test string -21549.0 -1.861834e+12 -1.861834e+12
3 1902 51.500000 test string -21184.0 -1.830298e+12 -1.830298e+12
4 1903 50.500000 test string -20819.0 -1.798762e+12 -1.798762e+12
5 1904 NaN -20454.0 -1.767226e+12 -1.767226e+12
better_pdataframe_from_data(sformat=True).head()
year le str_var date date_tc date_tC
1 2022 47.3 test string 01jan2022 01jan2022 00:00:00 01jan2022 00:00:00
2 1901 49.1 test string 01jan1901 01jan1901 00:00:00 01jan1901 00:00:00
3 1902 51.5 test string 01jan1902 01jan1902 00:00:00 01jan1902 00:00:00
4 1903 50.5 test string 01jan1903 01jan1903 00:00:00 01jan1903 00:00:00
5 1904 . 01jan1904 01jan1904 00:00:00 01jan1904 00:00:00