AWS Glue for Ray + Facebook Prophet: Train Large scale time series forecast models

Olalekan Elesin
5 min readDec 3, 2022

--

AWS Glue for Ray output training Facebook Prophet model inheriting Ray BaseTrainer class

AWS Reinvent had a lot of new product launches and existing products like Amazon SageMaker, Amazon EKS, and AWS Glue, with new feature announcements. One that stood out for me was the AWS Glue for Ray support. This is because I have seen my teams build data transformation jobs on AWS Glue using the AWS SDK for Pandas and Pandas API. The challenge they usually encountered, especially when building time series forecast models, is that Pandas executes its functions as a single process using a single CPU core and as a result, the model training ran for hours. In one case, a data scientist had to wait for more than 5 hours training a time series forecast model.

In this rest of this rest of this blog post, we will show how to use AWS Glue for Ray to train large-scale forecasting models using Facebook Prophet. Ray is a distributed execution framework that allows users to scale their computations across multiple machines. Facebook Prophet is a popular open-source library for time series forecasting.

Solution overview

For this example, we will use a sample dataset from the Facebook Prophet Github project. The goal is to train a time series forecast model using the distributed capacity in Ray.

Next, we will setup development environment. For our development environment, we use a Jupyter notebook to run the code.

You’re required to install the AWS Glue interactive sessions locally or run interactive sessions with an AWS Glue Studio notebook. Using AWS Glue Interactive sessions will help you follow and run the series of demonstration steps.

Refer to Getting started with AWS Glue interactive sessions for instructions to spin up a notebook on an AWS Glue interactive session.

Run your code using Ray in a Jupyter notebook

This section walks you through several notebook paragraphs on how to use AWS Glue for Ray. In this exercise, we look at the customer reviews from the Amazon Customer Review Parquet dataset, perform some Ray transformations, and write the results to Amazon S3 in a Parquet format.

  1. On Jupyter console, under New, choose Glue Python.
  2. Signify you want to use Ray as the engine by using the %glue_ray magic.
  3. Import the Ray library along with additional Python libraries:
%glue_ray

%additional_python_modules prophet


import ray
import pandas
import pyarrow
from ray import data
import time
from ray.data import ActorPoolStrategy
from prophet import Prophet

4. Initialize a Ray Cluster with AWS Glue.

ray.init('auto')

5. Next, we read the dataset from S3 which is CSV format:

start = time.time()
df = ray.data.read_csv('s3://datafy-data-lake/dev/demo/example_wp_log_peyton_manning.csv')
end = time.time()
print(f"Reading the data to dataframe: {end - start} seconds")

df.show(4)

Training Facebook Prophet Model with Ray

We will keep this simple for the purpose of this blogpost. Ray Trainer supports a number of open source AI frameworks such as Pytorch, Tensorflow, Horovod, XGboost, LightGBM, HuggingFace, Scikit-Learn, and RLlib. To train our simple Prophet forecast model on Ray, we will do the following:

model_params = {
"growth": "linear",
"seasonality_mode": "multiplicative",
"yearly_seasonality": True,
"weekly_seasonality": True,
"daily_seasonality": False,
}

def train_forecast_model(data):
m = Prophet(**model_params)
m.fit(data)
future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)
return forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]


prediction_ds = df.map_batches(train_forecast_model)

Now, let’s inspect the forecasted values including the y_hat_lower and y_hat_higher :

prediction_ds.to_pandas()

Finally, you can write your forecasts to an Amazon S3 bucket (data lake) and build visualizations with Amazon QuickSight directly from the file in S3 or through Amazon Athena.

What if there’s more?

Like the popular frameworks with built in trainers in Ray, one can build a trainer for Facebook Prophet on Ray by inheriting the Ray BaseTrainer class. See example below:

class FBProphetTrainer(BaseTrainer):

def set_up(self):
self.model = Prophet()
self.rmses = []

def training_loop(self):
param_grid = param_grid = {
'changepoint_prior_scale': [0.001, 0.01, 0.1, 0.5],
'seasonality_prior_scale': [0.01, 0.1, 1.0, 10.0],
}
self.params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]
train_dataset = self.datasets["train"]
# cutoffs_str = self.params["cutoffs"]
cutoffs = pd.to_datetime(['2013-02-15', '2013-08-15', '2014-02-15']) # [ pd.to_datetime(cutoff) for cutoff in cut_offs_str]
# cross_val_dataset = self.datasets["cross_validation"]
train_df = train_dataset.to_pandas()
# cross_val_df = cross_val_dataset.to_pandas()

for index, params in enumerate(self.params):
self.model = Prophet(**params).fit(train_df)
cross_val_df = cross_validation(self.model, cutoffs=cutoffs, horizon='30 days', parallel="processes")
df_p = performance_metrics(cross_val_df, rolling_window=1)
epoch = index
rmse = df_p['rmse'].values[0]
session.report({"rmse": rmse, "epoch": index})

Then instantiate the trainer and train a new model:

my_trainer = FBProphetTrainer(
datasets={"train": df},
# params={"cutoffs": ['2013-02-15', '2013-08-15', '2014-02-15']},
scaling_config=ScalingConfig(num_workers=4),
run_config=RunConfig(local_dir='/tmp/ray_results')
)

result = my_trainer.fit()
Ray output training Facebook Prophet model inheriting Ray BaseTrainer class

Clean up

To avoid incurring future charges, stop the AWS Glue Interactive Session and Jupyter notebook

%stop_session

Conclusion

In this post, we demonstrated how you can use AWS Glue for Ray to train a Facebook Prophet forecast model in a distributed environment. You can extend the code above using the Ray library depending on your use cases. I can’t wait to see what you would build with AWS Glue for Ray.

Refer to the Ray documentation for additional information and use cases.

I hope you found this a good read. If you would like to discuss your machine learning use cases, you can reach me via email, follow me on Twitter or connect with me on LinkedIn. Can’t wait to hear from you!!

--

--

Olalekan Elesin
Olalekan Elesin

Written by Olalekan Elesin

Enterprise technologist with experience across technical leadership, architecture, cloud, machine learning, big-data and other cool stuff.

No responses yet