Skip to main content

Amazon EMR (Spark)

plan support

Included in all plans

IP Whitelist for Wren AI Cloud

Wren AI Cloud must be able to reach the EMR Spark primary node from its outbound IP address. Please add the IP address of the Wren AI service to the firewall / security group of your EMR cluster.

Scroll to the bottom of the data source connection page to find the IP address.

Spark Connector

Wren AI requires Spark Connect Server to be running on the EMR primary node.


Before you begin

Before connecting Wren AI to EMR Spark, make sure:


EMR & Spark version compatibility

Supported EMR versions

EMR ReleaseSupport Status
EMR 7.3.0✅ Supported
EMR 7.4.0✅ Supported
EMR 7.5.0✅ Supported
EMR 7.6.0✅ Supported
EMR 7.7.0✅ Supported
EMR 7.8.0✅ Supported
EMR 7.9.0✅ Supported
EMR 7.10.0✅ Supported
EMR 7.11.0✅ Supported
EMR 7.12.0✅ Supported
info

Currently, only EMR versions from 7.3.0 to 7.12.0 are supported. EMR 6.x and earlier EMR 7.x releases are not supported.


To add a EMR Spark connection, click on the EMR Spark option in Connect a data source section.

connect

Connect

Fill in the connection settings: emr-spark

Display name

The display name for the database in the Wren AI interface.

Spark connect hostname

The EMR Spark primary node DNS emr-spark-primary-node

Port

The port used by Spark Connect. By default, Spark Connect listens on 15002.

Troubleshooting

Start Spark Connect Server using an EMR Step

If your EMR Spark cluster does not have Spark Connect Server running, you need to add a step to the cluster to start it on the primary node.


Add a step from the EMR Console

  1. Open the Amazon EMR Console
  2. Select your EMR cluster
  3. Go to the Steps tab emr-spark-steps
  4. Click Add step
  5. Configure the step with the following values:
  • Step type: Custom JAR
  • Name:Spark Connect Server
  • JAR location: command-runner.jar
  • Arguments: spark-submit --packages org.apache.spark:spark-connect_2.12:3.5.0 --class org.apache.spark.sql.connect.service.SparkConnectServer /usr/lib/spark/jars/spark-connect_2.12-3.5.0.jar

Cannot connect to EMR from Wren AI Cloud

Make sure the security group of the EMR primary EC2 instance allows inbound traffic from the Wren AI Cloud outbound IP (34.57.198.97).

emr-spark-security-groups

emr-spark-inbound-rule

Check the following:

  • The inbound rule includes the Spark Connect port (default: 15002)
  • The source IP matches the outbound IP (34.57.198.97) shown at the bottom of the Wren AI data source connection page

If the security group is missing this rule, Wren AI will not be able to connect to your EMR cluster.

Select Tables

All tables of your connected Snowflake dataset will be listed in this step. Select which tables you want to use in Wren AI. Each selected table will be created as a data model. See the Modeling documentations to learn more about what is data models. select_table

Define relationships

Define the relationships among selected tables in this step. If you have defined primary keys and foreign keys in your Snowflake dataset, we will list suggested relationships based on the information. If not, you can also add relationships by clicking the Add relationships button on the table blocks. relationship

Define following properties in a relationship:

  • From: Select the left side table and column of this relationship.
  • To: Select the right side table and column of this relationship.
  • Relationship Type: Select the type of relationship. add_relationship

Find more information about relationship in Modeling - Working with Relationships

You can also skip this step and finish connection.