Using your own data

The ability to simply embed visualizations from observable into python it isnt very useful if you cant apply the visualizations for your personal needs.

This page will walk you through some of the common ways to inject your data into an observable visualization.

CSV, Json, and Dataframes

Two of the most popular data formats for quick and easy data science are CSV and Json. Although, because we are working in python and likely doing all of our data processing in python it might be the case that we have our data in a pandas dataframe.

Reguardless of how the data is structured prior to being input in the embed function it is important that we insure that our data is the format accepted by Observable while passing it to the embed function.

Because Observable works with Javascript data must be parsed into Json fromat. In python terminology this means that the data must be stored in a list of dictionaries where each index of the list is represents a row in the data.

Examples

Many Observable notebooks will have an exposed variable that holds the data that they are manipulating and visualizing. This data is commonly passed into Observable through the Observable file attatchments feature that automaticaly parses CSV, and Json Data.

However, because we are overriding the data variable we have to parse the data ourselves. In this example we will explain how to get your data in the right format to then be assigned to the exposed data variable in Observable.

For the sake of clarity first we will embed an Observable Visualization with no modifications.

[57]:
from observable_jupyter import embed
[58]:
embed(
    '@rstorni/visualize-a-data-frame-with-observable-in-jupyter/2',
    cells = ['vegaPetalsWidget',
             'viewof sepalLengthLimits',
             'viewof sepalWidthLimits']
    )

Note

Notice how in this example we import three cells. The first cell corisponds to the scatterplot while the second and third cells corispond to the variable sliders.

CSV Data

To restructure csv data first import the csv module as that will allow you to modify your csv file.

[59]:
import csv

Next copy the following block of code and replace ‘Demo_Data/Palmer_Penguins.csv’ with the file path to your csv file.

[60]:
penguin_data = []
with open('Demo_Data/Palmer_Penguins.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        penguin_data.append(row)

As you can see here once parsed the data is a list of dictionaries were an index in that list represents a row in the csv file.

[61]:
penguin_data[0]
[61]:
{'': '0',
 'species': 'Adelie',
 'island': 'Torgersen',
 'bill_length_mm': '39.1',
 'bill_depth_mm': '18.7',
 'flipper_length_mm': '181.0',
 'body_mass_g': '3750.0',
 'sex': 'male',
 'year': '2007'}

Important

Notice how in this example we are mapping strings to strings. Because we are simply oberriding the data variable there is a high chance that values meant to be numbers will be understood as strings. Sometimes this works because code from the observable notebook modifies the the data variable but it would be less problematic to simply turn the csv file into a dataframe and then parse it since that allows you to check for typing issues.

At this point your data is formated in such a way that you can pass it into the embed function.

[62]:
embed(
    '@rstorni/visualize-a-data-frame-with-observable-in-jupyter/2',
    cells=['vegaPetalsWidget', 'viewof sepalLengthLimits', 'viewof sepalWidthLimits'],
    inputs = {
        'input_data' : penguin_data,
        'x_title' : "bill_length_mm",
        'y_title' : "bill_depth_mm",
        'legend' : "species",
        'specified_feature_1': "body_mass_g",
        'specified_feature_2': "flipper_length_mm"
    }
)

Notice How once the data is in the right format we can simply assign it to the input_data Variable. To get a truly functioning chart we will need to assign other variables such as the title, legend and features. You can get an understanding of these by looking at the Observable notebook.

Dataframes

The process restructuring a dataframe is very similar the process of restructuring a csv file.

First we will import pandas, json, and the palmerspenguins dataset

[63]:
import pandas as pd
import json
from palmerpenguins import load_penguins

We imported the palmerpenguins data inorder to have a dataframe to work with but you should feel free to use your own data for this example.

[64]:
pengunins_data = load_penguins()
pengunins_data.head()
[64]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007

Once we have our data frame we want to run the following block of code that parses the data frame into the json format. While doing this make sure that you replace pengunins_data with youur own data name and that you set orient=”records” as this will give us the right structure.

[65]:
result = pengunins_data.to_json(orient="records")
parsed = json.loads(result)
data = json.dumps(parsed, indent=4)
Formated_Data = json.loads(data)

It can be helpful to look at the first index of the list to insure that everything is in order. Remember you want one index of the list to be equal to a row in the dataframe.

[66]:
Formated_Data[0]
[66]:
{'species': 'Adelie',
 'island': 'Torgersen',
 'bill_length_mm': 39.1,
 'bill_depth_mm': 18.7,
 'flipper_length_mm': 181.0,
 'body_mass_g': 3750.0,
 'sex': 'male',
 'year': 2007}

Once you have your data in the correct format all you need to do is assign it to the ‘input_data’ variable and moddify other input variables to configure your graph.

[67]:
embed(
    '@rstorni/visualize-a-data-frame-with-observable-in-jupyter/2',
    cells=['vegaPetalsWidget', 'viewof sepalLengthLimits', 'viewof sepalWidthLimits'],
    inputs = {
        'input_data' : Formated_Data,
        'x_title' : "bill_length_mm",
        'y_title' : "bill_depth_mm",
        'legend' : "species",
        'specified_feature_1': "body_mass_g",
        'specified_feature_2': "flipper_length_mm"
    }
)

Json Data

Not ready yet

Next Steps

Now that you understand how to set up your own data for use in a visualization. You should be able to easily use any of the visualization found in the Visualization Library.

Non-Serializable Data

One thing that you might come across when working with your own data is the issue of Non-Serializable Data or data that can not be converted into pure text.

One common example is the use of Dates in data. In both python and in Javascript dates are objects with unique functions attatched to them. For this reason, we run into issues when trying to pass a Python DateTime into Observable.

Luckily there is a workaround but it involves going into observable and configuring data variables in such a way that Observable handels all of the data parsing.

⚠️ Important ⚠️

The following instructions will require you to wirte some Javascript code in an observable jupyter notebook. If you are unsure if you want to do this take a look at the data visualization library first since we have done all of the work on the Observable end to give you the best experience.

Importing Data

Import your data into python and parse it as if you were injecting it into any observable notebook.

[68]:
date_data = []
with open('Demo_Data/toy_data_set.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        date_data.append(row)

Once you have done this you need to go into observable and create a variable and name it something discriptive like CSV_Data. This will be the variable you pass your data into.

The following line of code with the exeption of the pound symbol is what you would write into your observable notebook. Additionaly make sure to write your own file path rather than the example “toy_data_set/@2.csv

[69]:
# CSV_Data = FileAttachment("toy_data_set/@2.csv").csv()

Typing Data

You will then make a variable titled data where you will parse the data you originaly input into a new object.

The following code shows how to parse four different columns in your data. The map function is essentialy adding a collumn to your data of the name defined after “…x,” and before “:”. meanwhile the code after “:” represents the typing. In the second line of code we are converting the date_column into a new column titled date with each value in that column now being a javascript date. This same logic is applied when applying parseInt to the value_colum in line four. Meanwhile, the following lines just show a new column being created.

As you can imagine you should not simply copy this code, instead apply it to the nature of your own data. Not everyone will need to use parseInt or Date().

[70]:
#data = CSV_Data.map(x =>
#          {return {...x, date:new  Date(x[date_column])}}
#         ).map(x =>
#          {return {...x, value: parseInt(x[value_column])}}
#         ).map(x =>
#          {return {...x, name: x[name_column]}}
#         ).map(x =>
#          {return {...x, category: x[category_column]}})

Note

If you are still Unsure as to what the above Javascript code is doing read the Transforming Data section of the following article: https://observablehq.com/@observablehq/visualize-a-data-frame-with-observable-in-jupyter as it gives intuative examples that relate to python list comprehentions.

Publishing and Testing

Note

If you stuggled following along or don’t understand how to use Observable take a look at the tutorial section to find videos that walk you through the process of modifying, publishing and embeding visualizations both with Observable and Observable-Jupyter.

Now that you have set up set up the functionality to parse data from observable you need to fork the original notebook and then publish it. This will require you to create an observable account and you will also need to modify the partial URL you are passing to the Embed function so that it matched the URL of your newly published notebook.

Once you have published your notebook you should be able to pass in your unparsed/mistyped data into the CSV_Data variable you have created and Observable should then handel the rest. If your code is not running after you have done this condider checking the observable notebook again and modifying any variables that should now be associated with the new columns found in the data variable rather than the CSV_Data variable.

If you sucseeded in running the visualization after modifying the Observable Notebook and publishing you have the ability to fully modify any observable notebook you can find on the Observable website.