LLM Interoperability¶
This walkthrough covers the functime.llm
module, which contains namespaced polars dataframe methods to interoperate Large Language Models (LLMs) with functime.
Let's use OpenAI's GPT models to analyze commodity price forecasts created by a functime forecaster. By default we use gpt-3.5-turbo
.
Load data¶
In [9]:
Copied!
%%capture
import os
import polars as pl
from functime.cross_validation import train_test_split
from functime.forecasting import knn
import functime.llm # We must import this to override the `llm` namespace for pl.DataFrame
from functime.llm.formatting import univariate_panel_to_wide
%%capture
import os
import polars as pl
from functime.cross_validation import train_test_split
from functime.forecasting import knn
import functime.llm # We must import this to override the `llm` namespace for pl.DataFrame
from functime.llm.formatting import univariate_panel_to_wide
In [11]:
Copied!
os.environ["OPENAI_API_KEY"] = ... # Your API key here
os.environ["OPENAI_API_KEY"] = ... # Your API key here
In [12]:
Copied!
y = pl.read_parquet("../../data/commodities.parquet")
entity_col, time_col, target_col = y.columns
test_size = 30
freq = "1mo"
y_train, y_test = train_test_split(test_size)(y)
print("🎯 Target variable (y) -- train set:\n", y_train.collect())
y = pl.read_parquet("../../data/commodities.parquet")
entity_col, time_col, target_col = y.columns
test_size = 30
freq = "1mo"
y_train, y_test = train_test_split(test_size)(y)
print("🎯 Target variable (y) -- train set:\n", y_train.collect())
🎯 Target variable (y) -- train set: shape: (45_453, 3) ┌───────────────────┬─────────────────────┬───────┐ │ commodity_type ┆ time ┆ price │ │ --- ┆ --- ┆ --- │ │ str ┆ datetime[ns] ┆ f64 │ ╞═══════════════════╪═════════════════════╪═══════╡ │ Coal, Australian ┆ 1970-01-01 00:00:00 ┆ 7.8 │ │ Coal, Australian ┆ 1970-02-01 00:00:00 ┆ 7.8 │ │ Coal, Australian ┆ 1970-03-01 00:00:00 ┆ 7.8 │ │ Coal, Australian ┆ 1970-04-01 00:00:00 ┆ 7.8 │ │ … ┆ … ┆ … │ │ Natural gas index ┆ 2020-06-01 00:00:00 ┆ 33.99 │ │ Natural gas index ┆ 2020-07-01 00:00:00 ┆ 34.91 │ │ Natural gas index ┆ 2020-08-01 00:00:00 ┆ 45.85 │ │ Natural gas index ┆ 2020-09-01 00:00:00 ┆ 46.07 │ └───────────────────┴─────────────────────┴───────┘
We'll make a prediction using a knn forecaster.
In [13]:
Copied!
# Univariate time-series fit with automated lags
forecaster = knn(
freq="1mo",
lags=24
)
forecaster.fit(y=y_train)
y_pred = forecaster.predict(fh=test_size)
print("📊 Preds:\n", y_pred)
# Univariate time-series fit with automated lags
forecaster = knn(
freq="1mo",
lags=24
)
forecaster.fit(y=y_train)
y_pred = forecaster.predict(fh=test_size)
print("📊 Preds:\n", y_pred)
📊 Preds: shape: (2_130, 3) ┌─────────────────────────┬─────────────────────┬─────────────┐ │ commodity_type ┆ time ┆ price │ │ --- ┆ --- ┆ --- │ │ str ┆ datetime[μs] ┆ f64 │ ╞═════════════════════════╪═════════════════════╪═════════════╡ │ Tobacco, US import u.v. ┆ 2020-10-01 00:00:00 ┆ 4350.390137 │ │ Tobacco, US import u.v. ┆ 2020-11-01 00:00:00 ┆ 4350.390137 │ │ Tobacco, US import u.v. ┆ 2020-12-01 00:00:00 ┆ 4350.390137 │ │ Tobacco, US import u.v. ┆ 2021-01-01 00:00:00 ┆ 4340.333984 │ │ … ┆ … ┆ … │ │ Sawnwood, Cameroon ┆ 2022-12-01 00:00:00 ┆ 534.277954 │ │ Sawnwood, Cameroon ┆ 2023-01-01 00:00:00 ┆ 529.589966 │ │ Sawnwood, Cameroon ┆ 2023-02-01 00:00:00 ┆ 523.410034 │ │ Sawnwood, Cameroon ┆ 2023-03-01 00:00:00 ┆ 510.354004 │ └─────────────────────────┴─────────────────────┴─────────────┘
We'll also provide a short description of the dataset to aid the LLM in its analysis.
In [14]:
Copied!
dataset_context = "This dataset comprises of forecasted commodity prices between 2020 to 2023."
dataset_context = "This dataset comprises of forecasted commodity prices between 2020 to 2023."
Analyze Forecasts¶
Let's take a look at aluminum and European banana prices. We'll first transform the panel dataframe into a wide format to reduce redundant information (e.g. repeated time/entity values) sent to the LLM.
In [15]:
Copied!
selection = ["Aluminum", "Banana, Europe"]
prices = y_pred.filter(pl.col(entity_col).is_in(selection)).pipe(
univariate_panel_to_wide, shrink_dtype=True
)
print("📊 'Aluminum' and 'Banana, Europe' prices (wide):\n", prices)
selection = ["Aluminum", "Banana, Europe"]
prices = y_pred.filter(pl.col(entity_col).is_in(selection)).pipe(
univariate_panel_to_wide, shrink_dtype=True
)
print("📊 'Aluminum' and 'Banana, Europe' prices (wide):\n", prices)
📊 'Aluminum' and 'Banana, Europe' prices (wide): shape: (30, 3) ┌─────────────────────┬─────────────┬────────────────┐ │ time ┆ Aluminum ┆ Banana, Europe │ │ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ f32 ┆ f32 │ ╞═════════════════════╪═════════════╪════════════════╡ │ 2020-10-01 00:00:00 ┆ 1575.267944 ┆ 0.868 │ │ 2020-11-01 00:00:00 ┆ 1588.387939 ┆ 0.846 │ │ 2020-12-01 00:00:00 ┆ 1602.702026 ┆ 0.824 │ │ 2021-01-01 00:00:00 ┆ 1583.288086 ┆ 0.824 │ │ … ┆ … ┆ … │ │ 2022-12-01 00:00:00 ┆ 1343.609985 ┆ 1.186 │ │ 2023-01-01 00:00:00 ┆ 1343.609985 ┆ 1.144 │ │ 2023-02-01 00:00:00 ┆ 1396.969971 ┆ 1.126 │ │ 2023-03-01 00:00:00 ┆ 1400.67395 ┆ 1.08 │ └─────────────────────┴─────────────┴────────────────┘
In [16]:
Copied!
analysis = prices.llm.analyze(context=dataset_context) # This may take a few seconds
print("📊 Analysis:\n", analysis)
analysis = prices.llm.analyze(context=dataset_context) # This may take a few seconds
print("📊 Analysis:\n", analysis)
📊 Analysis: - The Aluminum price shows a decreasing trend from October 2020 (1575.27 USD) to March 2021 (1385.47 USD), followed by a slight increase until March 2023 (1400.67 USD). - Banana prices in Europe exhibit a fluctuating trend with no clear direction. There is no significant change in prices between October 2020 (0.868 USD) and October 2021 (0.86 USD). However, from October 2021 to March 2023, there is a gradual decline in prices, reaching 1.08 USD. - The Aluminum price experienced a significant drop in February 2021, with a decrease of 6.88% compared to the previous month. - In contrast, Banana prices in Europe had a small drop in February 2021, with a decrease of 2.36% compared to the previous month. - Anomalies in the Aluminum price are observed in February 2021 and May 2021, with decreases of 6.88% and 3.08% respectively, compared to the previous month. - Banana prices in Europe show an anomaly in October 2021, with an increase of 5.58% compared to the previous month. - Seasonality is not evident in the Aluminum price as the fluctuations do not follow a regular pattern over the months. - Banana prices in Europe do not exhibit clear seasonality either, with irregular fluctuations throughout the dataset. - The highest Aluminum price is recorded in February 2022 (1401.99 USD), while the lowest is observed in March 2022 (1385.47 USD). - The highest Banana price in Europe is recorded in June 2022 (1.188 USD), while the lowest is observed in May 2021 (0.806 USD).
Compare Forecasts¶
Let's now compare the previous selection with a new one. We'll refer to these as baskets A and B.
In [17]:
Copied!
basket_a = prices
selection_b = ["Chicken", "Cocoa"]
basket_b = y_pred.filter(pl.col(entity_col).is_in(selection_b)).pipe(
univariate_panel_to_wide, shrink_dtype=True
)
print("📊 Basket A -- 'Aluminum' and 'Banana, Europe' (wide):\n", basket_a)
print("📊 Basket B -- 'Chicken' and 'Cocoa' (wide):\n", basket_b)
basket_a = prices
selection_b = ["Chicken", "Cocoa"]
basket_b = y_pred.filter(pl.col(entity_col).is_in(selection_b)).pipe(
univariate_panel_to_wide, shrink_dtype=True
)
print("📊 Basket A -- 'Aluminum' and 'Banana, Europe' (wide):\n", basket_a)
print("📊 Basket B -- 'Chicken' and 'Cocoa' (wide):\n", basket_b)
📊 Basket A -- 'Aluminum' and 'Banana, Europe' (wide): shape: (30, 3) ┌─────────────────────┬─────────────┬────────────────┐ │ time ┆ Aluminum ┆ Banana, Europe │ │ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ f32 ┆ f32 │ ╞═════════════════════╪═════════════╪════════════════╡ │ 2020-10-01 00:00:00 ┆ 1575.267944 ┆ 0.868 │ │ 2020-11-01 00:00:00 ┆ 1588.387939 ┆ 0.846 │ │ 2020-12-01 00:00:00 ┆ 1602.702026 ┆ 0.824 │ │ 2021-01-01 00:00:00 ┆ 1583.288086 ┆ 0.824 │ │ … ┆ … ┆ … │ │ 2022-12-01 00:00:00 ┆ 1343.609985 ┆ 1.186 │ │ 2023-01-01 00:00:00 ┆ 1343.609985 ┆ 1.144 │ │ 2023-02-01 00:00:00 ┆ 1396.969971 ┆ 1.126 │ │ 2023-03-01 00:00:00 ┆ 1400.67395 ┆ 1.08 │ └─────────────────────┴─────────────┴────────────────┘ 📊 Basket B -- 'Chicken' and 'Cocoa' (wide): shape: (30, 3) ┌─────────────────────┬─────────┬───────┐ │ time ┆ Chicken ┆ Cocoa │ │ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ f32 ┆ f32 │ ╞═════════════════════╪═════════╪═══════╡ │ 2020-10-01 00:00:00 ┆ 1.492 ┆ 2.41 │ │ 2020-11-01 00:00:00 ┆ 1.588 ┆ 2.42 │ │ 2020-12-01 00:00:00 ┆ 1.606 ┆ 2.408 │ │ 2021-01-01 00:00:00 ┆ 1.536 ┆ 2.372 │ │ … ┆ … ┆ … │ │ 2022-12-01 00:00:00 ┆ 1.428 ┆ 2.664 │ │ 2023-01-01 00:00:00 ┆ 1.42 ┆ 2.636 │ │ 2023-02-01 00:00:00 ┆ 1.42 ┆ 2.678 │ │ 2023-03-01 00:00:00 ┆ 1.376 ┆ 2.696 │ └─────────────────────┴─────────┴───────┘
Now compare!
In [18]:
Copied!
comparison = basket_a.llm.compare(
as_label="Basket A", others={"Basket B": basket_b}
) # This may take a few seconds
print("📊 Comparison:\n", comparison)
comparison = basket_a.llm.compare(
as_label="Basket A", others={"Basket B": basket_b}
) # This may take a few seconds
print("📊 Comparison:\n", comparison)
📊 Comparison: Basket A and Basket B represent two different sets of time series data. We will compare and contrast these data sets in terms of trend, seasonality, and anomalies. **Trend Analysis:** For Basket A, the Aluminum prices show a slight decreasing trend over time, with a decrease of 11.7% from October 2020 to March 2023. On the other hand, Banana prices in Europe show a fluctuating trend with no clear direction, but overall, there is a slight increase of 30.3% during the same period. For Basket B, the Chicken prices exhibit a slight increasing trend, with an increase of 4.6% from October 2020 to March 2023. The Cocoa prices, on the other hand, show a relatively stable trend with some fluctuations, but no clear direction. **Seasonality Analysis:** Basket A does not exhibit any clear seasonality patterns in either Aluminum or Banana prices. The prices seem to fluctuate randomly without any consistent seasonal patterns. Basket B also does not show any significant seasonality patterns in Chicken or Cocoa prices. The prices vary without following a specific seasonal trend. **Anomaly Analysis:** Basket A does not have any obvious anomalies in the Aluminum prices. However, in the Banana prices, there is a significant anomaly in November 2021, where the price jumps by 6.3% compared to the previous month. This anomaly could be due to factors such as supply disruptions or changes in demand. Basket B does not show any clear anomalies in either Chicken or Cocoa prices. The prices fluctuate within a relatively stable range without any sudden or unexpected changes. In summary, Basket A and Basket B exhibit different trends over time. Basket A shows a decreasing trend in Aluminum prices and a fluctuating trend in Banana prices. Basket B shows an increasing trend in Chicken prices and a relatively stable trend in Cocoa prices. Both baskets do not display any clear seasonality patterns. Basket A has an anomaly in November 2021 in Banana prices, while Basket B does not show any significant anomalies.