Roadmap¶

Buckaroo is maturing and I decided to write a roadmap

Priorities¶

Buckaroo is still for the most part, pre-users. It is maturing though, and feedback gathered can map some reasonable principles

Function as a reliable replacement for the default display of dataframes
- Exceptions in the basic display of a dataframe are a P1 error.
- Dataframes that don’t display are a P1 error.
- Taking more than a second to display a dataframe with less than 1M values is a P2 error
Buckaroo should do the least surprising thing.
- autocleaning should be turned off by default.
Bug/feature request priorities
- This is the roadmap and I’ll stick with it.
- If a user has a feature/bug request, and that is preventing them from using Buckaroo, that gets priority.

Documentation
- Readme refresh
- How to create a formatter
- Pluggable analysis framework refresh
- Customizing autocleaning
- Customizing enable/instantiation
- Order of operations Dataflow doc
Promotion
Devops improvements (CI, testing, end to end testing, packaging)
- CI passing - Done
- CI testing - Done
- End to End testing - Done
- CI version Bump - needed
- Ruff python linter - needed
Jupyter notebook compatability
- Google colab - Done
- VSCode - Done
- Warning message on notebook < 7 - Done
- Notebook 6.0 compatability ???
Code cleanup
- Typescript passes linter - Done
- snake_case camelCase normalization
- better naming
- sub module organization
Python Repr bugs
- List
- Tuple
- Nested list and tuple across python types (int, float, boolean)
- Dictionary?
Formatters
- DateTime formatter
- Float formatter with specificity
Frontend
- Autoclean toggle

I’m a bit fuzzy on this one, it’s either going to be a backend port to polars or filtering. I’ll write it as filtering for now

Filtering
- any field text search
- Should work with codegen
- Per column exact filtering
Additional sampling techniques
- Chunks (50 contiguous rows)
- Outliers - extent percentile for each colum all in a single view
- Straight random sample
UI cycling
- Everything that is now binary (summary stats on or off), is actually a single choice of multiple possible choices. Allow multiple clicks to cycle through different options.
- Enable cycling for summary_stats and sample method
Low code UI
- Add Commands for filtering

Polars backend

All of the same tests should pass.

Lowcode UI Commands in polars
- Gives auto cleaning and filtering at much higher performance. Nice way to dip my feet into polars.
- Testing that verifies eval(_to_py) == transform(df) and pl.transform(df) == pd.transform(df)
- pandas and polars equivalence is key to code gen continuing to be useful
Serialization in polars
- 2x speed bump
- straight forward
Pluggable analysis framework - for polars
- Same pluggable analysis framework, now lazy
- Summary stats run on whole dataframe - up to 1Gig

serialization speedup
- integrate parquest_wasm in the frontend
- parquet serialization on the backend
- maintain json serialization