User Guide

Instructions for using nbstata as a Jupyter kernel

Getting Started

It doesn’t take much to get nbstata up and running. Here’s how:

Prerequisites

Stata

Because nbstata uses pystata under the hood, a currently-licensed version of Stata 17+ must already be installed. (If you have an older version of Stata, consider stata_kernel instead.)

Make sure that the stata command (that is, the CLI to Stata, not to be confused with xstata, which is the GUI) works, otherwise nbstata won’t work.

In particular, if you are on Arch Linux and you get this error when running stata:

stata: error while loading shared libraries: libncurses.so.5: cannot open shared object file: No such file or directory

You need to install ncurses5-compat-libs from the AUR to fix it.

Python

In order to install the kernel, you will also need Python 3.7 or higher.

If are new to Python, I suggest installing the Anaconda distribution. This doesn’t require administrator privileges and is the simplest way to install Python, JupyterLab and many of the most popular scientific packages.

(However, the full Anaconda installation is quite large, and it includes many libraries for Python that nbstata doesn’t use. If you don’t plan to use Python and want to use less disk space, install Miniconda, a bare-bones version of Anaconda. Then, at an Anaconda prompt, type conda install jupyterlab to install JupyterLab.)

The remainder of this guide assumes you have a Anaconda installed, but all you really need is Python and JupyterLab (or some other way of making use of the Stata kernel, such as Quarto).

Install nbstata

To download and install the python package run the following at an Anaconda prompt:

pip install nbstata

Next run the Stata kernel install script, which has the following syntax (with square brackets denoting options):

python -m nbstata.install [--sys-prefix] [--prefix PREFIX] [--conf-file]

That is, the most basic kernel install command is just python -m nbstata.install. The options are explained in the next section.

Kernel setup options

Include --sys-prefix to install the kernel to sys.prefix (e.g. a virtualenv or conda env), or --prefix PREFIX if you want to specify the install path yourself (typing it in place of PREFIX).

The --conf-file option creates a configuration file for you. (Note: A configuration file will always be created if the installer cannot locate your Stata installation.)

The location of the configuration file will be:

  • [sys.prefix]/etc/nbstata.conf if --sys-prefix or --prefix is specified.
  • ~/.config/nbstata/nbstata.conf otherwise.

(If a configuration file exists in both locations at kernel runtime, the home directory (~) version takes precedence. For backwards compatibility, config files saved to ~/.nbstata.conf also work. When in doubt, the %status magic indicates the location of the operative config file.)

Updating

To upgrade from a previous version of nbstata, run:

pip install nbstata --upgrade

When updating, you don’t have to run python -m nbstata.install again.

Configuration (optional)

The following settings are permitted inside the configuration file. Aside from the first three, they may also be set within your notebook using the %set magic explained below):

  • stata_dir: Stata installation directory.
  • edition: Stata edition. Acceptable values are ‘be’, ‘se’ and ‘mp’. Default is ‘be’.
  • splash: controls display of the splash message during Stata startup. Default is ‘False’.
  • graph_format: Acceptable values are ‘png’ (the default), ‘pdf’, ‘svg’ and ‘pystata’. Specify the last option if you want to use pystata’s default setting.
  • graph_width/graph_height: By default, graphs are generated with width 5.5in and height 4in. The width or height may be specified as a number (interpreted as inches) or a number and its unit (in, cm, or px). So 3 and 3in are equivalent. Other valid examples: 300px and 7.2 cm. (Note: These values may also be set to default, which values alone enable the xsize and ysize options on Stata graph commands to influence the graph output size. Any values other than default override the xsize and ysize options.)
  • echo: controls the echo of commands, with the default being ‘None’:
    • ‘True’: the kernel will echo all commands.
    • ‘False’: the kernel will not echo single-line commands.
    • ‘None’: the kernel will not echo any command.
  • missing: What to display for a missing value in the output of the %browse, %head, and %tail magics. Default is ‘.’, following Stata. To defer to pandas’s format for NaN, specify ‘pandas’.
  • browse_auto_height: Whether to set ‘height: 100%’ for the %browse widget (default: ‘True’):
    • ‘True’: allows browse widget to expand to height of its container, such as when using ‘Create New View for Output’ in Jupyter Lab.
    • ‘False’: fixed height of around 22 rows, recommended for NBClassic and VSCode.

Settings must be under the title [nbstata]. Not all settings need be included. Example:

[nbstata]
stata_dir = '/opt/stata'
splash = True
graph_format = pystata
graph_width = default
graph_height = default
echo = False

Default Graph Format

Both pystata and stata_kernel default to the SVG image format. nbstata (like pystata-kernel) defaults to the PNG image format instead for several reasons:

  • Jupyter does not show SVG images from untrusted notebooks (link 1).
  • Notebooks with empty cells are untrusted (link 2).
  • SVG images cannot be copied and pasted directly into Word or PowerPoint.

Syntax highlighting (Optional)

Stata syntax highlighting can be installed for Jupyter Lab:

pip install jupyterlab_stata_highlight2

(If you prefer the standard Jupyter color scheme, the original jupyterlab-stata-highlight also works.)

Starting JupyterLab

You can start JupyterLab from within Anaconda Navigator. Or start it from an Anaconda prompt by running:

jupyter lab

Either should open it up in a new browser tab. From there, you can create a new Stata notebook.

Note: By default, you can only open/save notebooks within the directory from which JupyterLab is run. To access a different directory, you can instead start it up by running:

jupyter lab --notebook-dir "YOUR_PATH_HERE"

Magics

‘Magics’ are commands provided by nbstata that enhance the experience of working with Stata in Jupyter. They work only when placed at the beginning of a code cell.

Jupyter magics typically start with %, but nbstata magics may alternatively be prefixed with *% so that, if you export a Stata notebook to a .do file and run it that way, the magics will not cause errors.

nbstata currently supports the following magics:

Magic Description Full Syntax
%browse Interactively view dataset %browse [-h] [varlist] [if] [in] [, nolabel noformat]
%head View first 5 (or N) rows %head [-h] [N] [varlist] [if] [, nolabel noformat]
%tail View last 5 (or N) rows %tail [-h] [N] [varlist] [if] [, nolabel noformat]
%frbrowse Interactively view frame %frbrowse [-h] framename[: [varlist] [if] [in] [, nolabel noformat]]
%frhead View first 5 (or N) frame rows %frhead [-h] framename[: [N] [varlist] [if] [, nolabel noformat]]
%frtail View last 5 (or N) frame rows %frtail [-h] framename[: [N] [varlist] [if] [, nolabel noformat]]
%locals List locals with their values %locals
%delimit Print the current delimiter %delimit
%help Display Stata help %help [-h] command_or_topic_name
%set Set single config option %set [-h] key = value
%%set Set multiple config options %%set [-h]
%status Display Stata/config status %status
%%echo Ensure echo from cell %%echo
%%noecho Suppress echo from cell %%noecho
%%quietly Suppress all output from cell %%quietly

You can run any magic with the -h option (--help) to see brief help documentation for the magic.

%browse, %head, %tail (and frame equivalents)

Quickly view your data

*%browse [-h] [varlist] [if] [in] [, nolabel noformat]
*%head [-h] [N] [varlist] [if] [, nolabel noformat]
*%tail [-h] [N] [varlist] [if] [, nolabel noformat]

These magics provide alternatives to Stata’s browse command, which is not available in a Stata notebook. They can each be called with standard Stata varlist and if syntax. %browse also supports Stata’s in syntax, whereas %head (and %tail), modeled after pandas, display the first (or last) 5 (or N) observations that meet the (optional) if criteria.

By default, the %browse, %head, and %tail magics convert numeric Stata values to strings using their Stata format and value labels. To prevent this behavior, specify the noformat and/or nolabel options.

The output of any of these, but especially that of %browse, may be expanded into a separate Jupyter Lab tab by right clicking it and selecting “Create New View for Output.”

%frbrowse, %frhead, and %frtail do the same for a frame specified as a prefix. Examples:

%frbrowse alt_frame
%frhead alt_frame: if var1 == 1, nolabels

%locals

List local macro names and values

This takes no arguments. The output format mimics Stata’s macro list command (which only displays global macros).

%delimit

Print the current Stata command delimiter

This takes no arguments; it prints the delimiter currently set: either cr or ;. If you want to change the delimiter, use #delimit ; or #delimit cr. The delimiter will remain set until changed.

[1]: %delimit
Current Stata command delimiter: cr
[2]: #delimit ;
delimiter now ;
[3]: *%delimit
Current Stata command delimiter: ;
[4]: #delimit cr
delimiter now cr

%help

Display a help file in rich text

*%help [-h] command_or_topic_name

Add the term you want to search for after %help. For example:

Jupyter Notebook help example

The underlined terms in the output are links. Click on them to open further help in a new tab.

%set, %%set

Set configuration values

Usage:

*%set [-h] key = value
*%%set
key1 = value1
[key2 = value2]
[...]
  • key: Configuration setting name: graph_format, graph_width, graph_height, echo, or missing
  • value: Value to set. See Configuration above for more information.

Examples:

*%set graph_format = svg
%%set
echo = True
missing = N/A

To prevent the cell magic %%set from causing an error if you export the notebook to a .do file and run it that way, you may surround the key-value statements with /* and */ on separate lines, like this:

*%%set
/*
echo = True
missing = N/A
*/

%status

Display Stata status and configuration values

%status example

%%echo %%noecho, %%quietly

Toggle cell output type

Putting %%echo at the top of a cell sets the configuration option echo = True for just that cell. For example, suppose you have configured echo = None but you do want to see the Stata commands echoed for a particular cell:

[1]: *%%echo
     disp 1
     disp 2
. disp 1
1

. disp 2
2

. 

Similarly, %%noecho sets the configuration option echo = None for a single cell:

[2]: *%%noecho
     disp 1
     disp 2
1
2

%%quietly silences all cell output, including graphs. It is a convenience magic equivalent to placing the standard Stata code quietly { at the start and } at the end of the cell.

[3]: *%%quietly
     disp 1
     disp 2

Stata Implementation Details

#delimit behavior

A #delimit; command in one cell will persist into other cells, until #delimit cr is called. For example, see delimit tests.ipynb.

echo = None: potential for unanticipated errors

The default echo = None configuration does some complicated things under the hood to emulate functionality that pystata does not directly support: running multi-line Stata code without echoing the commands. While extensive automatic tests are in place to help ensure its reliability, unanticipated issues may arise. If, while using this mode, a particular code cell is not working as expected, try placing the %%echo magic at the top of it to see if that resolves the issue. (If so, please report that here.) You can also avoid such potential issues by setting the config echo = False, which will at least not echo single-line Stata commands though it will echo multiple commands.

more and pause

Stata’s more and pause commands do not work in a notebook, so these features should remain in their default ‘off’ states (i.e., set more off and pause off).

linesize

Unlike in the official Stata interface, the width of Stata output will not automatically adjust to the width of your window. Instead, you can use the set linesize Stata command to change it manually. For example:

set linesize 130

Quarto tips

nbstata can be used with Quarto, starting from either a notebook or a .qmd markdown file, to create output in a wide variety of formats. Just include jupyter: nbstata in the document-level YAML at the top and use *| as the prefix for cell options.

Inline calculations

With nbstata v0.8+, you can use the standard Quarto syntax for inline code, specifying the Stata expression as ‘[%fmt] exp’, just as you would for a Stata display command. For example:

```{stata}
*| include: False
sysuse auto, clear
regress price mpg
```
An *increase* of one mpg is associated with a *decrease* in price of \$`{stata} %5.2f abs(_b[mpg])`.

would result in output like this:

An increase of one mpg is associated with a decrease in price of $238.89.

Warning

Stata locals cannot be referenced within inline code like `x’ because the tick (or “left single quote,” as Stata’s manual calls it) conflicts with Quarto’s inline code syntax. You can instead use globals or scalars to pass things to inline code.

For example, this gives the same output as above (whereas defining ‘mpg_coef’ as a local would not work):

```{stata}
*| include: False
scalar mpg_coef = string(abs(_b[mpg]), "%5.2f")
```
An *increase* of one mpg is associated with a *decrease* in price of \$`{stata} mpg_coef`.