Using Buckaroo¶
Buckaroo is meant to be used in a jupyterlab notebook to clean and explore pandas dataframes.
Before you begin, make sure that you follow the steps in Installing Buckaroo.
The following sections cover how to use Buckaroo.
In a Jupyter Lab notebook cell¶
from buckaroo.buckaroo_widget import BuckarooWidget
BuckarooWidget(df=df) #df being the dataframe you want to explore
And you will see the UI for Buckaroo.
Using Commands¶
At the core Buckaroo commands operate on columns. You must first click on a cell (not a header) in the top pane to select a column.
Next you must click on a command like dropcol
, fillna
, or groupby
to create a new command
After creating a new command, you will see that command in the commands list, now you must edit the details of a command. Select the command by clicking on the bottom cell.
At this point you can either delete the command by clicking the X
button or change command parameters.
Writing your own commands¶
Builtin commands are found in all_transforms.py
class DropCol(Command):
command_default = [s('dropcol'), s('df'), "col"]
command_pattern = [None]
@staticmethod
def transform(df, col):
df.drop(col, axis=1, inplace=True)
return df
@staticmethod
def transform_to_py(df, col):
return " df.drop('%s', axis=1, inplace=True)" % col
command_default
is the base configuration of the command when first added, s('dropcol')
is a special notation for the function name. s('df')
is a symbol notation for the dataframe argument (see LISP section of the FAQ for details). "col"
is a placeholder for the selected column.
since dropcol
does not take any extra arguments, command_pattern
is [None]
Designing your own commands¶
The builtin commands and transforms are written to require no extra libraries for the python code. Writing the transform_to_py
code generation can be tricky. If you are using this with your own analytics library, your transform
function should mirror your actual library code with the same arguments.
so if you have a library function like
def something_complex(df, column, arg1, arg2, arg3):
#many lines of python code
Your transform
function can still be simple, as is your transform_to_py
function. You don’t have to regenerate the complex python body of something_complex
class SomethingComplex(Command):
command_default = [s('dropcol'), s('df'), "col"]
command_pattern = [
[3, 'colMap', 'colEnum', ['null', 'sum', 'mean', 'median', 'count']],
[4, 'arg2', 'enum', ['null', 'level', 'flow']],
[3, 'arg3', 'float']
]
command_pattern = [None]
@staticmethod
def transform(df, col, arg1, arg2, arg3):
return something_complex(df, col, arg1, arg2, arg3)
@staticmethod
def transform_to_py(df, col, arg1, arg2, arg3):
return " something_complex(df, '%s', %r, %r, %r)" % (col, arg1, arg2, arg3)