Search references for APACHE SPARK. Phrases containing APACHE SPARK
See searches and references containing APACHE SPARK!APACHE SPARK
Open-source data analytics cluster computing framework
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit
Apache_Spark
American computer scientist and businessman (born 1983)
artificial intelligence (AI) platform, and for his early contributions to Apache Spark. He also co-founded Perplexity, an AI-powered search engine; the early-stage
Andy_Konwinski
Swedish computer scientist
Berkeley. He coauthored several influential papers, including Apache Mesos and Apache Spark SQL. Ghodsi received his PhD from KTH Royal Institute of Technology
Ali_Ghodsi
San Francisco-based software company
in San Francisco. It was founded in 2013 by the original creators of Apache Spark at the University of California, Berkeley. It offers a cloud-based platform
Databricks
Computer scientist and engineer
and Chief Architect of Databricks. He is best known for his work on Apache Spark, a leading open-source Big Data project. He was designer and lead developer
Reynold_Xin
American-Canadian computer programmer, author, and open source evangelist
on Apache Spark, her advocacy in the open-source software movement, and her creation and maintenance of a variety of related projects including spark-testing-base
Holden_Karau
Query language for property graphs
Stefan Plantikow (who was the first lead engineer of Neo4j's Cypher for Apache Spark project) and Stephen Cannan (Technical Corrigenda editor of SQL). They
Graph_Query_Language
Romanian–American computer scientist
co-founded Conviva and Databricks with other original developers of Apache Spark and Anyscale with other original developers of Ray. In 2022, Forbes ranked
Ion_Stoica
Software framework
dynamic random-access memory. Arrow can be used with Apache Parquet, Apache Spark, NumPy, PySpark, pandas and other data processing libraries. The project
Apache_Arrow
Column-oriented data storage format
open-source software portal Apache Arrow Apache Pig Apache Hive Apache Impala Apache Drill Apache Kudu Apache Spark Apache Thrift Trino (SQL query engine)
Apache_Parquet
Romanian-Canadian computer scientist and engineer
a Romanian-Canadian computer scientist, educator and the creator of Apache Spark. In 2022, Forbes ranked him and Ion Stoica as the 3rd-richest Romanians
Matei_Zaharia
Open-source machine learning algorithms
many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala libraries
Apache_Mahout
Tabular data representation in memory
relational databases (Oracle, MySQL etc.), the in-memory format of Apache Spark, and Apache Avro. Tabular data is two dimensional — data is modeled as rows
Data_orientation
Distributed data processing framework
such as Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper, Apache Impala, Apache Flume, Apache Sqoop, Apache Oozie,
Apache_Hadoop
and deep learning algorithms Apache Mahout — scalable machine learning library for big data built on Hadoop and Spark Apache SINGA — distributed deep learning
Lists of open-source artificial intelligence software
Lists_of_open-source_artificial_intelligence_software
Open-source data analytics software
called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce
Apache_Pig
List of projects maintained by the Apache Software Foundation
platforms such as Apache Spark Beam, an uber-API for big data Bigtop: a project for the development of packaging and tests of the Apache Hadoop ecosystem
List of Apache Software Foundation projects
List_of_Apache_Software_Foundation_projects
UC Berkeley research lab
Data Analytics Stack), many know it as the lab that invented Apache Mesos, and Apache Spark, and Alluxio. Berkeley launched RISELab as the successor to
AMPLab
Gradient boosting machine learning library
machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask. XGBoost gained much popularity and attention
XGBoost
Column-oriented data storage format
is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink, and Apache Hadoop. In February 2013, the Optimized Row Columnar
Apache_ORC
Open-source machine learning system for end-to-end data science lifecycle
becomes Apache Incubator project IBM donates machine learning tech to Apache Spark open source community IBM's SystemML Moves Forward as Apache Incubator
Apache_SystemDS
System for distributed coordination
Apache Hadoop Apache Accumulo Apache HBase Apache Hive Apache Kafka (up to version 4.0.0) Apache Drill Apache Solr Apache Spark Apache NiFi Apache Druid
Apache_ZooKeeper
Open-source API to access Microsoft Office formats
modules for Big Data platforms (e.g. Apache Hive/Apache Flink/Apache Spark), which provide certain functionality of Apache POI, such as the processing of Excel
Apache_POI
Open-source remote procedure call framework
when a schema changes (unless desired for statically-typed languages). Apache Spark SQL can access Avro as a data source. An Avro Object Container File consists
Apache_Avro
Unified programming model for data processing pipelines
(distributed processing back-ends) including Apache Flink, Apache Samza, Apache Spark, and Google Cloud Dataflow. Apache Beam is one implementation of the Dataflow
Apache_Beam
Text processing programming library
and Scala programming languages. The library is built on top of Apache Spark and its Spark ML library. Its purpose is to provide an API for natural language
Spark_NLP
Software bus for high-volume data feeds
Free and open-source software portal RabbitMQ Redis NATS Apache Flink Apache Samza Apache Spark Streaming Data Distribution Service Enterprise Integration
Apache_Kafka
Algorithm for anomaly detection
Spark iForest - A distributed implementation in Scala and Python, which runs on Apache Spark. Written by Yang, Fangzhou. Isolation Forest - A Spark/Scala
Isolation_forest
Open-source distributed stream processing
including Apache Kafka. Samza provides fault tolerance, isolation and stateful processing. Unlike batch systems such as Apache Hadoop or Apache Spark, it provides
Apache_Samza
Software to manage computer clusters
2013 that it uses Mesos to run data processing systems like Apache Hadoop and Apache Spark. The Internet auction website eBay stated in April 2014 that
Apache_Mesos
Data analysis software
Apache Sedona (formerly GeoSpark) is an open-source framework designed for processing and analyzing large-scale spatial data in a distributed computing
Apache_Sedona
Big data table format
Apache Iceberg is a high-performance open-source format for large analytic tables. Iceberg enables the use of SQL tables for big data while making it possible
Apache_Iceberg
Database of data representing objects in geometric space
Sedona supports scalable geospatial processing and spatial SQL on top of Apache Spark for databases and big data analytics systems. Esri Geodatabase (Enterprise
Spatial_database
Open source software library
on how to scale and improve scheduling and performance of millions of Apache Spark tasks. Today it is a commercial company that offers an open source system
DBOS
Computer programming paradigm
XProc Apache Beam: Java/Scala SDK that unifies streaming (and batch) processing with several execution engines supported (Apache Spark, Apache Flink,
Dataflow_programming
SDK and Platform for responsive, elastic, and resilient agentic, cloud, and edge apps
web applications offers integration with Akka Up until version 1.6, Apache Spark used Akka for communication between nodes The Socko Web Server library
Akka_(toolkit)
Graph database
reporting, and ETL through integration with big data platforms (Apache Spark, Apache Giraph, Apache Hadoop). JanusGraph supports geo, numeric range, and full-text
JanusGraph
Repository of data stored in a raw format
expertise in Java, map reduce and higher-level tools like Apache Pig, Apache Spark and Apache Hive (which were also originally batch-oriented). Poorly
Data_lake
Popular frameworks running on top of Alluxio include Apache Spark, Presto, TensorFlow, Trino, Apache Hive, and PyTorch, etc.[citation needed] Alluxio can
Alluxio
Software development kit for web applications
Apache Flex, formerly Adobe Flex, is a software development kit (SDK) for the development and deployment of cross-platform rich web applications based
Apache_Flex
File format and file compression program
data applications with cluster computing frameworks like Hadoop and Apache Spark, as a compressed block can be decompressed without having to process
Bzip2
Set of software subsystems or components needed to create a complete platform
Apache Spark (big data and MapReduce) Apache Mesos (node startup/shutdown) Akka (toolkit) (actor implementation) Apache Cassandra (database) Apache Kafka
Solution_stack
Software platform for data science
platform based on Apache Spark". Computerworld. Archived from the original on 2017-09-11. Retrieved 2017-09-11. "IBM Launches Apache Spark-Based Data Science
IBM_Watson_Studio
Data-processing architecture
this layer include Apache Kafka, Amazon Kinesis, Apache Storm, SQLstream, Apache Samza, Apache Spark, Azure Stream Analytics, Apache Flink. Output is typically
Lambda_architecture
processing framework Apache Mahout – scalable machine learning library Apache Spark – unified analytics engine Dask – parallel computing for analytics in
List of free and open-source software packages
List_of_free_and_open-source_software_packages
Deep learning training framework
to provide APIs for PyTorch, Keras and Apache MXNet, as well as integrations with frameworks such as Apache Spark and Ray, support for elastic training
Horovod_(machine_learning)
Software optimization technique
to see. Lazy evaluation is fundamental in big data frameworks such as Apache Spark, where computations on distributed datasets are delayed until results
Lazy_evaluation
Computer scientist
Inc. During his PhD, he also co-created the Apache Spark Streaming project and became an Apache Spark committer. Li, Haoyuan (7 May 2018). Alluxio:
Haoyuan_Li
Web server written in Java
server is used in products such as Apache ActiveMQ, Alfresco, Scalatra, Apache Geronimo, Apache Maven, Apache Spark, Google App Engine, Eclipse, FUSE,
Jetty_(web_server)
Object-oriented programming language
features, offering an implementation compatible with the standard library (Apache Harmony). The use of Java-related technology in Android led to a legal dispute
Java_(programming_language)
New Enterprise Associates, Intel, and others. Reza is a coauthor of Apache Spark, in particular its Machine Learning library, MLlib. Through open source
Reza_Zadeh
Tabular comparison of deep learning software
on 2017-02-11. Retrieved 2016-03-02. Deeplearning4j. "Deeplearning4j on Spark". Deeplearning4j. Archived from the original on 2017-07-13. Retrieved 2016-09-01
Comparison of deep learning software
Comparison_of_deep_learning_software
Open-source stream processing platform
China's most popular open source software award Apache ActiveMQ Apache Flink Apache Qpid Apache Samza Apache Spark Streaming Data Distribution Service Enterprise
Apache_RocketMQ
Topics referred to by the same term
media applications developed by Adobe Systems Apache Spark, a cluster computing framework Cisco Spark (application), a collaboration application and
Spark
Database engine
schema on read and transparently converts queries to MapReduce, Apache Tez and Spark jobs. All three execution engines can run in Hadoop's resource negotiator
Apache_Hive
Cloud-based service and infrastructure
platform for running Apache Hadoop and Apache Spark jobs Cloud Composer – Managed workflow orchestration service built on Apache Airflow Cloud Datalab
Google_Cloud_Platform
Bangladeshi-American computer scientist
leads SymbioticLab. He is the creator of coflow and the co-creator of Apache Spark. Chowdhury specializes in the fields of computer networking and large-scale
Mosharaf_Chowdhury
American software company
Platform (HDP): based on Apache Hadoop, Apache Hive, Apache Spark Hortonworks DataFlow (HDF): based on Apache NiFi, Apache Storm, Apache Kafka Hortonworks DataPlane
Hortonworks
List of programming software
OODT Apache Oozie Apache OpenNLP Apache PDFBox Apache Pig Apache POI Apache Qpid Apache River (Jini) Apache Samza Apache Solr Apache Spark Apache Xerces
List_of_JVM_languages
Java software and development tools
related projects. Apache Ant – build automation tool Apache Batik – SVG processing Apache Cayenne – object-relational mapping Apache Xerces – collection
List of Java software and tools
List_of_Java_software_and_tools
Relational model database server
original on 2019-09-10. Retrieved 2019-09-09. "Apache Spark - Unified Analytics Engine for Big Data". spark.apache.org. Archived from the original on 2020-09-02
IBM_Db2
American business software company
single computer cluster, including big data workloads such as Apache Hadoop and Apache Spark, a distributed file system, a multi-model database management
MapR
Statistical regression method
principal component analysis, including elastic net regularized regression. Apache Spark provides support for Elastic Net Regression in its MLlib machine learning
Elastic_net_regularization
digital service providers Databricks, a company founded by the creators of Apache Spark Dataiku Datatoleads, a big data aggregator focusing on buying and reselling
List_of_big_data_companies
frameworks, libraries, and computer programs used for machine learning. Apache OpenNLP — natural language processing toolkit CUDA — GPU computing platform
Comparison of machine learning software
Comparison_of_machine_learning_software
Content-based image retrieval
for category recognition, image hashes are stored in Google Bigtable; Apache Spark jobs are operated by Google Cloud Platform's Dataproc for image hash
Reverse_image_search
Statistical model used in time series analysis
Scala: spark-timeseries library contains ARIMA implementation for Scala, Java and Python. Implementation is designed to run on Apache Spark. PostgreSQL/MadLib:
Autoregressive integrated moving average
Autoregressive_integrated_moving_average
Open Source Database Project
Spark, etc. analysis ecosystems and Grafana visualization tool. The Apache 2.0 License is a permissive free software license written by the Apache Software
Apache_IoTDB
Data processing chain
the advent of data analytics engines such as Hadoop, or more recently Apache Spark, it's been possible to distribute large datasets across multiple processing
Pipeline_(computing)
Overview of and topical guide to Java
Java Edition NetBeans Apache Software Foundation – Apache Commons, Apache Maven, Apache Tomcat, Apache Kafka Eclipse Foundation – Adoptium, Eclipse IDE
Outline of the Java programming language
Outline_of_the_Java_programming_language
Open-source distributed analytics engine
Apache Kylin is built on top of Apache Hadoop, Apache Hive, Apache HBase, Apache Parquet, Apache Calcite, Apache Spark and other technologies. These technologies
Apache_Kylin
Examples of Data engineering tools. Apache Airflow Apache Flink Apache Hadoop Apache Kafka Apache NiFi Apache Spark Dask Data build tool (dbt) Examples
List_of_data_science_software
Software libraries
The Apache Commons is a project of the Apache Software Foundation, formerly under the Jakarta Project. The purpose of the Commons is to provide reusable
Apache_Commons
Concept in statistics
with high memory". "Basic Statistics – RDD-based API – Spark 3.0.1 Documentation". spark.apache.org. Retrieved 2020-11-05. "kdensity — Univariate kernel
Kernel_density_estimation
Topics referred to by the same term
processing libraries: pandas (software) § DataFrame The Dataframe API in Apache Spark DFLib for Java Data frames in the R programming language Frame (networking)
Dataframe
Open source platform
for the R and Python programming languages, and various Apache offerings (Apache Hadoop and Spark, as well as Maven). H2O Flow: a graphical web-based interactive
H2O_(software)
interfaces support parallelism in host languages. Apache Beam Apache Flink Apache Hadoop Apache Spark CUDA OpenCL OpenHMPP OpenMP for C, C++, and Fortran
List of concurrent and parallel programming languages
List_of_concurrent_and_parallel_programming_languages
Specific model for organizing a set of computers
operating system. Commonly used Big Data software stacks are Apache Hadoop and Apache Spark. A report of the Aiyara hardware which successfully processed
Aiyara_cluster
Open-source distributed stream processing
Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by
Apache_Storm
Technique for database mining
exist for various machine learning systems or modules like MLlib for Apache Spark. Jiawei Han; Hong Cheng; Dong Xin; Xifeng Yan (2007). "Frequent pattern
Frequent_pattern_discovery
BigDL is a distributed deep learning framework for Apache Spark, created by Jason Dai at Intel. BigDL has its source code hosted on GitHub. Comparison
BigDL
Software library for data analysis
Polars is Python-centric. Apache Spark has a Python API, PySpark, for distributed big data processing. Similar to Dask, Spark is focused on distributed
Polars_(software)
Programming tool blending code and documents
intelligence software. Example of projects or products of notebooks: Apache Spark Notebook – Apache License 2.0 GNU TeXmacs (a document processor which can act
Notebook_interface
Software engineering approach to designing and developing information systems
and edges represent the flow of data. Popular implementations include Apache Spark, and the deep learning specific TensorFlow. More recent implementations
Data_engineering
Database management system ranking
FileMaker SAP HANA SAP Adaptive Server Apache Spark Microsoft Azure Cosmos DB InfluxDB PostGIS ClickHouse Apache HBase OpenSearch Firebird Memcached Microsoft
DB-Engines_ranking
scikit-learn — library for machine learning. Spark MLlib — distributed machine learning library for Apache Spark. TensorFlow — software library for machine
List_of_Python_software
Deep learning framework
speech, and multimedia. Yahoo! has also integrated Caffe with Apache Spark to create CaffeOnSpark, a distributed deep learning framework. In April 2017, Facebook
Caffe_(software)
American software company
"Couchbase 4.6 Developer Preview Released, Adds Real-Time Connectors for Apache Spark 2.0 and Kafka". InfoQ. Retrieved 2025-08-21. "Couchbase Developer Preview
Couchbase,_Inc.
Cloud-based data storage and analytics service
The suggested replacement technologies are Azure Synapse Analytics and Apache Spark. Data lake "Data Lake". Microsoft Azure. Retrieved 2019-06-17. Harris
Azure_Data_Lake
"Astronomer Raises $5.7 Million in Funding to Deliver Enterprise Grade Apache Airflow". PR Newswire. "Asterisk Version 1.0 released at Astricon". VentureVoIP
List of commercial open-source applications and services
List_of_commercial_open-source_applications_and_services
General-purpose programming language
solution written in Scala is Apache Spark. Additionally, Apache Kafka, the publish–subscribe message queue popular with Spark and other stream processing
Scala_(programming_language)
Database using graph structures for queries
"ISO/IEC 39075:2024". ISO. Retrieved 2026-04-29. TinkerPop, Apache. "Apache TinkerPop". Apache TinkerPop. Retrieved 2016-11-02. Gosnell, Denise; Broecheler
Graph_database
Intelligence. Retrieved 2025-12-10. "GigaSpaces Launches the Next Generation Apache Spark Distribution – BRM". Retrieved 2025-12-10. "GigaSpaces Spins Off Cloudify
GigaSpaces
Software company
Azure Synapse Analytics, Cloudera, Databricks, Snowflake, Hadoop, Apache Kafka, Apache Spark Windows, Unix, Linux 2004 Ironstream Utility IBM i and z/OS forwarder
Precisely_(company)
In databases, cached query results
UNIQUE CLUSTERED INDEX XV ON MV_MY_VIEW (COL1); Apache Kafka (since v0.10.2), Apache Spark (since v2.0), Apache Flink, Kinetica DB, Materialize, RisingWave
Materialized_view
Python library for parallel computing
Retrieved 2022-05-12. Patel, Harshil. "Which library should I use? Apache Spark, Dask, and Pandas Performance Compared (With Benchmarks)". censius.ai
Dask_(software)
Overview of and topical guide to machine learning
Levandowski Anti-unification (computer science) Apache Flume Apache Giraph Apache Mahout Apache SINGA Apache Spark Apache SystemML Aphelion (software) Arabic Speech
Outline_of_machine_learning
(a.k.a. JCR) content repository such as Apache Jackrabbit. Apache Solr Enterprise search platform Apache Spark Fast and general engine for big data processing
List_of_Java_frameworks
SaaS software company
built on a core composed of open-source search technology Apache Solr and the Apache Spark computation framework. Three years later, in mid-2017, Lucidworks
Lucidworks
Software company
Native integration with open source big data technologies like Apache Kafka and Apache Spark. Support for standard programming interfaces, including ODBC
Vertica
APACHE SPARK
APACHE SPARK
Boy/Male
American, British, English
Lives Near Water
Boy/Male
Hebrew
Ready; prepared.
Girl/Female
American, Australian
Storage Place
Boy/Male
Spanish
Free.
Female
Greek
(Αγάθη) Greek name derived from the word agathos, AGATHE means "good." It is the feminine form of Agathias.
Surname or Lastname
North German
North German : variant of Asch.English : variant spelling of Ash (asche was the regular Middle English spelling of this word).
Surname or Lastname
English or Scottish
English or Scottish : unexplained.
Girl/Female
Latin
A Lemnian woman.
Girl/Female
Native American
Little one.
Female
Greek
(ἈÏάχνη) Greek myth name of a young girl who was turned into a spider by Athena, ARACHNE means "spider."
Girl/Female
Hindu, Indian
Fame; Sparkle
Boy/Male
Shakespearean
All's Well That Ends Well.' A clown and servant to the Countess of Rousillon.
Girl/Female
British, English, Greek
Good
Girl/Female
French German
Kind.
Female
Native American
Native American Cheyenne name AYASHE means "little one."
Male
English
English surname transferred to forename use, derived from the French personal name Pascal, PACE means "Passover; Easter."
Surname or Lastname
English
English : from a vernacular short form of the Latin personal name Paschalis (see Pascal, Italian Pasquale).nickname for a mild-mannered and peaceable person, from Middle English pace, pece ‘peace’, ‘concord’, ‘amity’ (via Anglo-Norman French from Latin pax, genitive pacis).Italian : from the medieval personal name Pace, used for both men and women, from the word pace ‘peace’ (see 1).
Girl/Female
Greek Latin
Changed into a spider by Athena.
Boy/Male
Armenian, Australian
Nomadic Cart
Female
French
Medieval French form of Latin Agatha, AGACE means "good."
APACHE SPARK
APACHE SPARK
Girl/Female
Indian
Blessing of God, Gods gift
Girl/Female
English Norse Chinese
Waterfall.
Female
Italian
Italian and Spanish form of Latin Elwisia, ELOISA means "hale-wide; very healthy and sound."Â
Boy/Male
Argentina, Bengali, Indian
Loved by Everyone
Male
Native American
Native American Shawnee name LALAWETHIKA means "he makes noise."
Girl/Female
Indian
Friend
Girl/Female
Hindu, Indian, Marathi, Sanskrit
Stable; Immovable
Male
Arthurian
, (Sir), knight of the Round Table.
Boy/Male
Hindu, Indian
Beautiful
Male
Chamoru
, land.
APACHE SPARK
APACHE SPARK
APACHE SPARK
APACHE SPARK
APACHE SPARK
n.
A genus (Atriplex) of herbs or low shrubs of the Goosefoot family, most of them with a mealy surface.
n.
A special involucre formed of one leaf and inclosing a spadix, as in aroid plants and palms. See the Note under Bract, and Illust. of Spadix.
v. t.
To develop, guide, or control the pace or paces of; to teach the pace; to break in.
n.
Want of feeling; privation of passion, emotion, or excitement; dispassion; -- applied either to the body or the mind. As applied to the mind, it is a calmness, indolence, or state of indifference, incapable of being ruffled or roused to active interest or exertion by pleasure, pain, or passion.
n.
A tender to a fleet, formerly used for conveying men, orders, or treasure.
v.
To scratch.
v. t.
To measure by steps or paces; as, to pace a piece of ground.
n.
To arrange or adjust the spaces in or between; as, to space words, lines, or letters.
n.
Manner of stepping or moving; gait; walk; as, the walk, trot, canter, gallop, and amble are paces of the horse; a swaggering pace; a quick pace.
n.
Ache or pain in the ear.
a.
Having a spathe; resembling a spathe; spathal.
v. i.
Continued pain, as distinguished from sudden twinges, or spasmodic pain. "Such an ache in my bones."
v. t.
One attached to another person or thing, as a part of a suite or staff. Specifically: One attached to an embassy.
n.
One of the series of boilers in which the cane juice is treated in making sugar; especially, the last boiler of the series.
n.
The raccoon.
n.
A plume or bunch of feathers, esp. such a bunch worn on the helmet; any military plume, or ornamental group of feathers.
n.
A quantity or portion of extension; distance from one thing to another; an interval between any two or more objects; as, the space between two stars or two hills; the sound was heard for the space of a mile.
n. pl.
A group of nomadic North American Indians including several tribes native of Arizona, New Mexico, etc.
adv.
With a quick pace; quick; fast; speedily.
n.
See Appaume.