stata_more

Helper functions that expand on pystata/sfi functionality

For a diagram of the how the principal nbstata modules depend on this one, click here.

Simple Helpers

run_direct_cleaned

 run_direct_cleaned (cmds, quietly=False, echo=False, inline=True)

run_direct_cleaned is a run_direct/pystata.stata.run wrapper function that removes extraneous blank lines from the output.

When given multi-line Stata code, run_direct outputs an extra blank line at the start and two extra lines at the end:

run_direct("""\
disp 1""", echo=True) # single-line Stata code

. disp 1
1

run_direct("""\
disp 1
disp 2""", echo=True) # multi-line Stata code


. disp 1
1

. disp 2
2

.

We can clean it up like this:

with redirect_stdout(StringIO()) as diverted:
    run_direct(dedent("""\
        disp 1
        disp 2"""))
    output = diverted.getvalue()
print("\n".join(output.splitlines()[1:-2]))

. disp 1
1

. disp 2
2

run_direct_cleaned("""\
disp 1
disp 2""", echo=True)

. disp 1
1

. disp 2
2

run_direct_cleaned also removes the blank line output by run_direct with quietly=True

prog_code = "program define _temp_prog \n disp 1 \n end"
run_direct(prog_code, quietly=True)

run_direct("capture program drop _temp_prog", quietly=True)
run_direct_cleaned(prog_code, quietly=True)

Note however that run_direct_cleaned delays text output until the code finishes running:

code = dedent('''\
    python:
    import time
    print(1)
    time.sleep(1)
    print(2)
    end
    ''')
run_direct(code, quietly=True)
run_direct_cleaned(code, quietly=True)

run_direct_cleaned may also misalign text output relative to graph output:

run_direct_cleaned(dedent('''\
    disp "the problem arises with multi-line Stata code"
    palette symbolpalette'''), echo=True)

. disp "the problem arises with multi-line Stata code"
the problem arises with multi-line Stata code

. palette symbolpalette

run_sfi should only be used for standardized code in which each line is a command suitable for the sfi.SFIToolkit.stata function. For such code, it provides much faster execution (with a noecho option). But it shares the limitations of run_single.

source

run_sfi

 run_sfi (std_code, echo=False, show_exc_warning=True)

run_sfi("""\
quietly set obs 5
quietly gen var1 = _n > 3
desc""")


Contains data
 Observations:             5                  
    Variables:             1                  
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
var1            float   %9.0g                 
-------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

source

SelectVar

 SelectVar (stata_if_code)

Class for generating Stata select_var for getAsDict

Selectvar.varname is a temporary Stata variable for use in sfi.Data.getAsDict

with SelectVar(" if var1==0") as sel_varname:
    print(f"varname: {sel_varname}")
    run_single("list, clean")
run_single("desc, simple")

varname: __000000

       var1   __000000  
  1.      0          1  
  2.      0          1  
  3.      0          1  
  4.      1          0  
  5.      1          0  
var1

source

IndexVar

 IndexVar ()

Class for generating Stata index var for use with pandas

with Timer():
    with IndexVar() as idx_var:
        run_single("desc, simple")
    run_single("desc, simple")

var1      __000001
var1
Elapsed time: 0.0004 seconds

Run commands as a Stata program

The original motivation for adding this functionality is that run_direct/pystata.stata.run can only suppress the “echo” of single commands, not multi-line Stata code:

run_direct('disp "test 1"', echo=False)

test 1

two_lines_of_code = dedent('''\
    disp "test 1"
    disp "test 2"
    ''')
run_direct(two_lines_of_code, echo=False)


. disp "test 1"
test 1

. disp "test 2"
test 2

.

As a workaround when echo is not desired, we can run multiple commands as a Stata program:

run_direct_cleaned(f"""\
program temp_nbstata_program_name
    {two_lines_of_code}
end""", quietly=True)

run_direct("temp_nbstata_program_name", quietly=False, inline=True, echo=False)

test 1
test 2

run_single(f"quietly program drop temp_nbstata_program_name")

(Note: This and the following two functions assume input Stata code standardized by standardize_code, which will be ensured by the break_out_prog_blocks within the ultimate dispatch_run wrapper function.)

source

run_as_program

 run_as_program (std_non_prog_code, prog_def_option_code='')

run_as_program(two_lines_of_code)

test 1
test 2

Not all code can be run within a program without modification, however: 1. Programs cannot be defined within another program, nor can python or mata blocks be run. 2. A program definition is a different scope for locals, so: * the program code does not have access to locals defined previously, and * locals set within the program code do not persist outside of it.

(These issues are addressed by run_noecho.)

with ExceptionExpected(SystemError):
    run_as_program("""\
        program define prog1
            disp 1
        end
        """)

run_sfi('''\
local test1 = 1
disp "test1: `test1'" ''')

test1: 1

run_as_program("""\
    disp "test1: `test1'"
    local test2 = 2 """)

test1:

run_sfi('''\
disp "test1: `test1'"
disp "test2: `test2'" ''')

test1: 1
test2:

In the “finally” block, the capture ensures that an error in the program define code doesn’t trigger another error in the “program drop” code due to the program not being defined (as in Issue #25):

with ExceptionExpected(SystemError):
    run_as_program("/* disp 5")

Divert Stata output to string

The goal here is to get output from some Stata commands without changing the Stata environment. Three challenges arise:

Preserving r() return values requires special treatment because the log on/off commands needed to ensure this output is not logged are themselves r-class.
The input std_code may also contain r-class commands.
Capturing multi-line Stata output without the commands being echoed poses additional run_as_program-related challenges with regard to local variables.

To start, we set aside the latter two issues and simply use run_direct to run the Stata code. We handle the first issue by running the log commands inside an r-class program with return add at the start.

A custom code runner may be specified. This may be useful if, for instance, the input std_code needs to access pre-existing r() results.

source

diverted_stata_output

 diverted_stata_output (std_code, runner=None)

from textwrap import dedent

two_lines_of_code = dedent('''\
    disp "test 1"
    disp "test 2"
    ''')
out = diverted_stata_output(two_lines_of_code)

print(out)


. disp "test 1"
test 1

. disp "test 2"
test 2

.

print(diverted_stata_output('disp "test 1"', run_as_program))

test 1

If we know the code we’re running is non-program code, we can get a speed improvement by running the log code together with the input std_non_prog_code.

source

diverted_stata_output_quicker

 diverted_stata_output_quicker (std_non_prog_code)

print(diverted_stata_output_quicker(two_lines_of_code))

test 1
test 2

with Timer():
    out1 = diverted_stata_output(two_lines_of_code, runner=run_sfi)
with Timer():
    out2 = diverted_stata_output_quicker(two_lines_of_code)
test_eq(out1, out2)

Elapsed time: 0.2008 seconds
Elapsed time: 0.0915 seconds

Get local macro info

https://www.statalist.org/forums/forum/general-stata-discussion/general/1457792-how-to-list-all-locals-and-store-them-in-a-macro

source

local_names

 local_names ()

run_sfi("""\
local test1 = 1
local test2 = 2""")
test_eq(set(local_names()), {'test1', 'test2'})

source

get_local_dict

 get_local_dict (_local_names=None)

test_eq(get_local_dict(), {'test1': '1', 'test2': '2'})

source

locals_code_from_dict

 locals_code_from_dict (preexisting_local_dict)

print(locals_code_from_dict(get_local_dict()))

local test2 `"2"'
local test1 `"1"'

User_expression

Given a string ‘[%fmt] exp’, replicate the output of a Stata display command (for just the one display_directive): https://www.stata.com/help.cgi?display

source

user_expression

 user_expression (input_str)

inputs_outputs = [
    ('2+2',                     '4'),
    ('= 2+2',                   '4'),
    ('%9.2f 123.456',           '   123.46'),
    ('%9.2f = 123.456',         '   123.46'),
    ('% 9.2f 123.456',          '   123.46'),
    ('%10s = "Hello, World!"',  'Hello, World!'),
    ('   ',                     ''),            # Empty input after stripping
]

for input_str, expected_result in inputs_outputs:
    result = user_expression(input_str)
    test_eq(result, expected_result)