Particle Physics // Machine Learning // Music // Baseball
tl;dr - see this GitHub gist
For a couple years now, I’ve been super interested in the Julia language. One issue I had when when I was doing public-facing baseball work, is that there are great libraries in both Python (pybaseball) and R (baseballr) for loading in baseball data, but no such library for Julia (yet!). Luckily, Julia has great interoperability support, so we can utilize those libraries to pull baseball data into Julia DataFrames - it just takes a little bit of massaging.
Prerequisite: a working Python installation with pybaseball installed, which can be installed via pip. I recommend creating a designated Python virtual environment to work with Julia, and when you build PyCall, set
ENV["PYTHON"] = venv/bin/python3. Activate that virtual environment and run
pip install pybaseball
For interoperability with Python, Julia has PyCall.jl. Once loaded into Julia, use
pyimport to load pybaseball into your Julia session. The methods within pybaseball return Pandas Dataframes, which If you’re interested in using Pandas.jl, the conversion is straightforward, however it’s not trivial to get to Julia’s DataFrames. The approach I’ve found is to immediately use the
pandas.DataFrame.to_csv, method without a file to get the dataframe as a string. Then, read that in as an IOBuffer to CSV.jl, and sink it to a Juila Dataframe.
And for an example plot…
Prerequisite: a working R installation with baseballr installed. Open R and run:
Interoperability with R is done via RCall.jl. RCall can load R libraries via the
@rlibrary macro, which can then be used to call
baseballr (provided the library is installed). Once the library is loaded, then you can call functions via an R string, and use
rcopy to migrate an R dataframe to a Julia one.
Hopefully this enables some easier baseball analysis for others in Julia. Of course, all this work can be circumnavigated by saving dataframes from respective packages as CSVs and reading them in via
CSV.jl, but who wants a million csvs laying around? There’s probably much more performant ways to go about this, but these approaches seem the quickest and most clear to me - if you have ideas or suggestions, feel free to reach out, or possibly comment on the git gist above.