Better stata-to-pandas
Improves on pystata.stata.pdataframe_from_data (and pystata.stata.pdataframe_from_frame ) by making the pandas DataFrame index correspond to Stata observation numbers and also providing an option to output numeric values as strings using their Stata formats.
source
better_dataframe_from_stata
better_dataframe_from_stata (stfr, var, obs, selectvar, valuelabel,
missingval, sformat)
source
better_pdataframe_from_data
better_pdataframe_from_data (var=None, obs=None, selectvar=None,
valuelabel=False, missingval=None,
sformat=False)
source
better_pdataframe_from_frame
better_pdataframe_from_frame (stfr, var=None, obs=None, selectvar=None,
valuelabel=False, missingval=None,
sformat=False)
run_sfi(""" \
sysuse uslifeexp2, clear
replace le = . if _n==5
replace year = 2022 if year==1900
format year %-ty
gen str_var = "test string" if _n!=5
gen date = dofy(year)
format date %td
gen double date_tc = cofd(date)
format date_tc %tc
gen double date_tC = Cofd(date)
format date_tC %tC""" )
better_pdataframe_from_data().head()
(US life expectancy, 1900–1940)
(1 real change made, 1 to missing)
(1 real change made)
(1 missing value generated)
1
2022
47.299999
test string
22646.0
1.956614e+12
1.956614e+12
2
1901
49.099998
test string
-21549.0
-1.861834e+12
-1.861834e+12
3
1902
51.500000
test string
-21184.0
-1.830298e+12
-1.830298e+12
4
1903
50.500000
test string
-20819.0
-1.798762e+12
-1.798762e+12
5
1904
NaN
-20454.0
-1.767226e+12
-1.767226e+12
better_pdataframe_from_data(sformat= True ).head()
1
2022
47.3
test string
01jan2022
01jan2022 00:00:00
01jan2022 00:00:00
2
1901
49.1
test string
01jan1901
01jan1901 00:00:00
01jan1901 00:00:00
3
1902
51.5
test string
01jan1902
01jan1902 00:00:00
01jan1902 00:00:00
4
1903
50.5
test string
01jan1903
01jan1903 00:00:00
01jan1903 00:00:00
5
1904
.
01jan1904
01jan1904 00:00:00
01jan1904 00:00:00