Open
Description
Is there an existing issue for this?
- I have searched the existing issues and did not find a match.
Who can help?
No response
What are you working on?
I am trying to get spark-nlp to work on Databricks using an example from the documentation.
Current Behavior
sentence_detector_dl download started this may take some time.
Approximate size to download 514.9 KB
[ / ]
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: java.lang.NoClassDefFoundError: Could not initialize class org.tensorflow.Graph
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.readGraph(TensorflowWrapper.scala:415)
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.unpackWithoutBundle(TensorflowWrapper.scala:330)
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:484)
at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel(TensorflowSerializeModel.scala:154)
at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel$(TensorflowSerializeModel.scala:123)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLModel$.readTensorflowModel(SentenceDetectorDLModel.scala:648)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.readSentenceDetectorDLGraph(SentenceDetectorDLModel.scala:621)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.readSentenceDetectorDLGraph$(SentenceDetectorDLModel.scala:616)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLModel$.readSentenceDetectorDLGraph(SentenceDetectorDLModel.scala:648)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.$anonfun$$init$$1(SentenceDetectorDLModel.scala:625)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.$anonfun$$init$$1$adapted(SentenceDetectorDLModel.scala:625)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:50)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:49)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:61)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:61)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:38)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:518)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:510)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:709)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Expected Behavior
Code should run without any errors.
Steps To Reproduce
from sparknlp.base import DocumentAssembler
from sparknlp.annotator import SentenceDetectorDLModel, MarianTransformer
from pyspark.ml import Pipeline
document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx").setInputCols("document").setOutputCol("sentence")
marian_transformer = MarianTransformer.pretrained().setInputCols("sentence").setOutputCol("translation")
pipeline = Pipeline().setStages([document_assembler, sentence_detector, marian_transformer])
data = spark.createDataFrame([["You can use Spark NLP to translate text. " + \
"This example pipeline translates English to French"]]).toDF("text")
# Create a pipeline model that can be reused across multiple data frames
model = pipeline.fit(data)
# You can use the model on any data frame that has a “text” column
result = model.transform(data)
display(result.select("text", "translation.result"))
Spark NLP version and Apache Spark
Spark NLP version: 5.1.2
Spark version: 3.4.1
Databricks Runtime Version: 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)
Type of Spark Application
Python Application
Java Version
No response
Java Home Directory
No response
Setup and installation
I iinstalled the libraries below directly on the cluster.
spark-nlp==5.1.2
com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response