Search references for APACHE SPARK. Phrases containing APACHE SPARK
See searches and references containing APACHE SPARK!APACHE SPARK
Open-source data analytics cluster computing framework
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit
Apache_Spark
American computer scientist and businessman (born 1983)
artificial intelligence (AI) platform, and for his early contributions to Apache Spark. He also co-founded Perplexity, an AI-powered search engine; the early-stage
Andy_Konwinski
Computer scientist and engineer
and Chief Architect of Databricks. He is best known for his work on Apache Spark, a leading open-source Big Data project. He was designer and lead developer
Reynold_Xin
San Francisco-based software company
in San Francisco. It was founded in 2013 by the original creators of Apache Spark at the University of California, Berkeley. It offers a cloud-based platform
Databricks
Swedish computer scientist
Berkeley. He coauthored several influential papers, including Apache Mesos and Apache Spark SQL. Ghodsi received his PhD from KTH Royal Institute of Technology
Ali_Ghodsi
American-Canadian computer programmer, author, and open source evangelist
on Apache Spark, her advocacy in the open-source software movement, and her creation and maintenance of a variety of related projects including spark-testing-base
Holden_Karau
Romanian–American computer scientist
co-founded Conviva and Databricks with other original developers of Apache Spark and Anyscale with other original developers of Ray. In 2022, Forbes ranked
Ion_Stoica
Query language for property graphs
Stefan Plantikow (who was the first lead engineer of Neo4j's Cypher for Apache Spark project) and Stephen Cannan (Technical Corrigenda editor of SQL). They
Graph_Query_Language
Open-source machine learning algorithms
many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala libraries
Apache_Mahout
Software framework
dynamic random-access memory. Arrow can be used with Apache Parquet, Apache Spark, NumPy, PySpark, pandas and other data processing libraries. The project
Apache_Arrow
Romanian-Canadian computer scientist and engineer
a Romanian-Canadian computer scientist, educator and the creator of Apache Spark. In 2026, Forbes ranked him and Ion Stoica as the richest Romanians,
Matei_Zaharia
Column-oriented data storage format
open-source software portal Apache Arrow Apache Pig Apache Hive Apache Impala Apache Drill Apache Kudu Apache Spark Apache Thrift Trino (SQL query engine)
Apache_Parquet
Distributed data processing framework
such as Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper, Apache Impala, Apache Flume, Apache Sqoop, Apache Oozie,
Apache_Hadoop
Text processing programming library
and Scala programming languages. The library is built on top of Apache Spark and its Spark ML library. Its purpose is to provide an API for natural language
Spark_NLP
Tabular data representation in memory
relational databases (Oracle, MySQL etc.), the in-memory format of Apache Spark, and Apache Avro. Tabular data is two dimensional — data is modeled as rows
Data_orientation
Open-source machine learning system for end-to-end data science lifecycle
becomes Apache Incubator project IBM donates machine learning tech to Apache Spark open source community IBM's SystemML Moves Forward as Apache Incubator
Apache_SystemDS
Open-source API to access Microsoft Office formats
modules for Big Data platforms (e.g. Apache Hive/Apache Flink/Apache Spark), which provide certain functionality of Apache POI, such as the processing of Excel
Apache_POI
List of projects maintained by the Apache Software Foundation
platforms such as Apache Spark Beam, an uber-API for big data Bigtop: a project for the development of packaging and tests of the Apache Hadoop ecosystem
List of Apache Software Foundation projects
List_of_Apache_Software_Foundation_projects
UC Berkeley research lab
Data Analytics Stack), many know it as the lab that invented Apache Mesos, and Apache Spark, and Alluxio. Berkeley launched RISELab as the successor to
AMPLab
Open-source remote procedure call framework
when a schema changes (unless desired for statically-typed languages). Apache Spark SQL can access Avro as a data source. An Avro Object Container File consists
Apache_Avro
Gradient boosting machine learning library
machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask. XGBoost gained much popularity and attention
XGBoost
Open-source distributed stream processing
including Apache Kafka. Samza provides fault tolerance, isolation and stateful processing. Unlike batch systems such as Apache Hadoop or Apache Spark, it provides
Apache_Samza
and deep learning algorithms Apache Mahout — scalable machine learning library for big data built on Hadoop and Spark Apache SINGA — distributed deep learning
Lists of open-source artificial intelligence software
Lists_of_open-source_artificial_intelligence_software
Column-oriented data storage format
is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink, and Apache Hadoop. In February 2013, the Optimized Row Columnar
Apache_ORC
Unified programming model for data processing pipelines
(distributed processing back-ends) including Apache Flink, Apache Samza, Apache Spark, and Google Cloud Dataflow. Apache Beam is one implementation of the Dataflow
Apache_Beam
Big data table format
like Spark, Trino, Flink, Presto, Hive, Impala, and Pig to safely work with the same tables, at the same time. Iceberg is released under the Apache License
Apache_Iceberg
Open-source data analytics software
called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce
Apache_Pig
BigDL is a distributed deep learning framework for Apache Spark, created by Jason Dai at Intel. BigDL has its source code hosted on GitHub. Comparison
BigDL
System for distributed coordination
Apache Hadoop Apache Accumulo Apache HBase Apache Hive Apache Kafka (up to version 4.0.0) Apache Drill Apache Solr Apache Spark Apache NiFi Apache Druid
Apache_ZooKeeper
Software bus for high-volume data feeds
Free and open-source software portal RabbitMQ Redis NATS Apache Flink Apache Samza Apache Spark Streaming Data Distribution Service Enterprise Integration
Apache_Kafka
Graph database
reporting, and ETL through integration with big data platforms (Apache Spark, Apache Giraph, Apache Hadoop). JanusGraph supports geo, numeric range, and full-text
JanusGraph
Database of data representing objects in geometric space
Sedona supports scalable geospatial processing and spatial SQL on top of Apache Spark for databases and big data analytics systems. Esri Geodatabase (Enterprise
Spatial_database
Algorithm for anomaly detection
Spark iForest - A distributed implementation in Scala and Python, which runs on Apache Spark. Written by Yang, Fangzhou. Isolation Forest - A Spark/Scala
Isolation_forest
Open source software library
on how to scale and improve scheduling and performance of millions of Apache Spark tasks. Today it is a commercial company that offers an open source system
DBOS
Software to manage computer clusters
2013 that it uses Mesos to run data processing systems like Apache Hadoop and Apache Spark. The Internet auction website eBay stated in April 2014 that
Apache_Mesos
SDK and Platform for responsive, elastic, and resilient agentic, cloud, and edge apps
web applications offers integration with Akka Up until version 1.6, Apache Spark used Akka for communication between nodes The Socko Web Server library
Akka_(toolkit)
Open-source distributed analytics engine
Apache Kylin is built on top of Apache Hadoop, Apache Hive, Apache HBase, Apache Parquet, Apache Calcite, Apache Spark and other technologies. These technologies
Apache_Kylin
Repository of data stored in a raw format
expertise in Java, map reduce and higher-level tools like Apache Pig, Apache Spark and Apache Hive (which were also originally batch-oriented). Poorly
Data_lake
Computer programming paradigm
XProc Apache Beam: Java/Scala SDK that unifies streaming (and batch) processing with several execution engines supported (Apache Spark, Apache Flink,
Dataflow_programming
Data analysis software
Apache Sedona (formerly GeoSpark) is an open-source framework designed for processing and analyzing large-scale spatial data in a distributed computing
Apache_Sedona
Software development kit for web applications
Apache Flex, formerly Adobe Flex, is a software development kit (SDK) for the development and deployment of cross-platform rich web applications based
Apache_Flex
File format and file compression program
data applications with cluster computing frameworks like Hadoop and Apache Spark, as a compressed block can be decompressed without having to process
Bzip2
Popular frameworks running on top of Alluxio include Apache Spark, Presto, TensorFlow, Trino, Apache Hive, and PyTorch, etc.[citation needed] Alluxio can
Alluxio
Data-processing architecture
this layer include Apache Kafka, Amazon Kinesis, Apache Storm, SQLstream, Apache Samza, Apache Spark, Azure Stream Analytics, Apache Flink. Output is typically
Lambda_architecture
Set of software subsystems or components needed to create a complete platform
Apache Spark (big data and MapReduce) Apache Mesos (node startup/shutdown) Akka (toolkit) (actor implementation) Apache Cassandra (database) Apache Kafka
Solution_stack
Computer scientist
Inc. During his PhD, he also co-created the Apache Spark Streaming project and became an Apache Spark committer. Li, Haoyuan (7 May 2018). Alluxio:
Haoyuan_Li
Software platform for data science
platform based on Apache Spark". Computerworld. Archived from the original on 2017-09-11. Retrieved 2017-09-11. "IBM Launches Apache Spark-Based Data Science
IBM_Watson_Studio
Open-source stream processing platform
China's most popular open source software award Apache ActiveMQ Apache Flink Apache Qpid Apache Samza Apache Spark Streaming Data Distribution Service Enterprise
Apache_RocketMQ
Deep learning training framework
to provide APIs for PyTorch, Keras and Apache MXNet, as well as integrations with frameworks such as Apache Spark and Ray, support for elastic training
Horovod_(machine_learning)
New Enterprise Associates, Intel, and others. Reza is a coauthor of Apache Spark, in particular its Machine Learning library, MLlib. Through open source
Reza_Zadeh
Software libraries
The Apache Commons is a project of the Apache Software Foundation, formerly under the Jakarta Project. The purpose of the Commons is to provide reusable
Apache_Commons
Object-oriented programming language
features, offering an implementation compatible with the standard library (Apache Harmony). The use of Java-related technology in Android led to a legal dispute
Java_(programming_language)
Specific model for organizing a set of computers
operating system. Commonly used Big Data software stacks are Apache Hadoop and Apache Spark. A report of the Aiyara hardware which successfully processed
Aiyara_cluster
Software optimization technique
to see. Lazy evaluation is fundamental in big data frameworks such as Apache Spark, where computations on distributed datasets are delayed until results
Lazy_evaluation
processing framework Apache Mahout – scalable machine learning library Apache Spark – unified analytics engine Dask – parallel computing for analytics in
List of free and open-source software packages
List_of_free_and_open-source_software_packages
Topics referred to by the same term
processing libraries: pandas (software) § DataFrame The Dataframe API in Apache Spark DFLib for Java Data frames in the R programming language Frame (networking)
Dataframe
Web server written in Java
server is used in products such as Apache ActiveMQ, Alfresco, Scalatra, Apache Geronimo, Apache Maven, Apache Spark, Google App Engine, Eclipse, FUSE,
Jetty_(web_server)
Cloud-based service and infrastructure
platform for running Apache Hadoop and Apache Spark jobs Cloud Composer – Managed workflow orchestration service built on Apache Airflow Cloud Datalab
Google_Cloud_Platform
American software company
Platform (HDP): based on Apache Hadoop, Apache Hive, Apache Spark Hortonworks DataFlow (HDF): based on Apache NiFi, Apache Storm, Apache Kafka Hortonworks DataPlane
Hortonworks
Open-source distributed stream processing
Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by
Apache_Storm
digital service providers Databricks, a company founded by the creators of Apache Spark Dataiku DataStax Domo, Inc. Fluentd Greenplum Groundhog Technologies
List_of_big_data_companies
Statistical regression method
principal component analysis, including elastic net regularized regression. Apache Spark provides support for Elastic Net Regression in its MLlib machine learning
Elastic_net_regularization
Software company
Azure Synapse Analytics, Cloudera, Databricks, Snowflake, Hadoop, Apache Kafka, Apache Spark Windows, Unix, Linux 2004 Ironstream Utility IBM i and z/OS forwarder
Precisely_(company)
Topics referred to by the same term
media applications developed by Adobe Systems Apache Spark, a cluster computing framework Cisco Spark (application), a collaboration application and
Spark
Relational model database server
original on 2019-09-10. Retrieved 2019-09-09. "Apache Spark - Unified Analytics Engine for Big Data". spark.apache.org. Archived from the original on 2020-09-02
IBM_Db2
Tabular comparison of deep learning software
on 2017-02-11. Retrieved 2016-03-02. Deeplearning4j. "Deeplearning4j on Spark". Deeplearning4j. Archived from the original on 2017-07-13. Retrieved 2016-09-01
Comparison of deep learning software
Comparison_of_deep_learning_software
Java software and development tools
related projects. Apache Ant – build automation tool Apache Batik – SVG processing Apache Cayenne – object-relational mapping Apache Xerces – collection
List of Java software and tools
List_of_Java_software_and_tools
List of programming software
OODT Apache Oozie Apache OpenNLP Apache PDFBox Apache Pig Apache POI Apache Qpid Apache River (Jini) Apache Samza Apache Solr Apache Spark Apache Xerces
List_of_JVM_languages
Deep learning framework
speech, and multimedia. Yahoo! has also integrated Caffe with Apache Spark to create CaffeOnSpark, a distributed deep learning framework. In April 2017, Facebook
Caffe_(software)
frameworks, libraries, and computer programs used for machine learning. Apache OpenNLP — natural language processing toolkit CUDA — GPU computing platform
Comparison of machine learning software
Comparison_of_machine_learning_software
Statistical model used in time series analysis
Scala: spark-timeseries library contains ARIMA implementation for Scala, Java and Python. Implementation is designed to run on Apache Spark. PostgreSQL/MadLib:
Autoregressive integrated moving average
Autoregressive_integrated_moving_average
Content-based image retrieval
for category recognition, image hashes are stored in Google Bigtable; Apache Spark jobs are operated by Google Cloud Platform's Dataproc for image hash
Reverse_image_search
Examples of Data engineering tools. Apache Airflow Apache Flink Apache Hadoop Apache Kafka Apache NiFi Apache Spark Dask Data build tool (dbt) Examples
List_of_data_science_software
Concept in statistics
with high memory". "Basic Statistics – RDD-based API – Spark 3.0.1 Documentation". spark.apache.org. Retrieved 2020-11-05. "kdensity — Univariate kernel
Kernel_density_estimation
Overview of and topical guide to machine learning
Levandowski Anti-unification (computer science) Apache Flume Apache Giraph Apache Mahout Apache SINGA Apache Spark Apache SystemML Aphelion (software) Arabic Speech
Outline_of_machine_learning
interfaces support parallelism in host languages. Apache Beam Apache Flink Apache Hadoop Apache Spark CUDA OpenCL OpenHMPP OpenMP for C, C++, and Fortran
List of concurrent and parallel programming languages
List_of_concurrent_and_parallel_programming_languages
American business software company
single computer cluster, including big data workloads such as Apache Hadoop and Apache Spark, a distributed file system, a multi-model database management
MapR
Database engine
schema on read and transparently converts queries to MapReduce, Apache Tez and Spark jobs. All three execution engines can run in Hadoop's resource negotiator
Apache_Hive
scikit-learn — library for machine learning. Spark MLlib — distributed machine learning library for Apache Spark. TensorFlow — software library for machine
List_of_Python_software
Python library, running on Apache Spark. Yes PipelineDP Google, OpenMined 2022 Python library, running on Apache Spark, Apache Beam, or locally. Yes PSI
List of implementations of differentially private analyses
List_of_implementations_of_differentially_private_analyses
Overview of and topical guide to Java
Java Edition NetBeans Apache Software Foundation – Apache Commons, Apache Maven, Apache Tomcat, Apache Kafka Eclipse Foundation – Adoptium, Eclipse IDE
Outline of the Java programming language
Outline_of_the_Java_programming_language
Software engineering approach to designing and developing information systems
and edges represent the flow of data. Popular implementations include Apache Spark, and the deep learning specific TensorFlow. More recent implementations
Data_engineering
Data processing chain
the advent of data analytics engines such as Hadoop, or more recently Apache Spark, it's been possible to distribute large datasets across multiple processing
Pipeline_(computing)
Data science software
updates to KNIME Server and KNIME Big Data Extensions, provide support for Apache Spark 2.3, Parquet and HDFS-type storage.[citation needed] For the sixth year
KNIME
Technique for database mining
exist for various machine learning systems or modules like MLlib for Apache Spark. Jiawei Han; Hong Cheng; Dong Xin; Xifeng Yan (2007). "Frequent pattern
Frequent_pattern_discovery
Software library for data analysis
Polars is Python-centric. Apache Spark has a Python API, PySpark, for distributed big data processing. Similar to Dask, Spark is focused on distributed
Polars_(software)
Clustered file system
such as Apache Hadoop and Apache Spark. In addition to file-oriented access, MapR FS supports access to tables and message streams using the Apache HBase
MapR_FS
Cloud-based data storage and analytics service
The suggested replacement technologies are Azure Synapse Analytics and Apache Spark. Data lake "Data Lake". Microsoft Azure. Retrieved 2019-06-17. Harris
Azure_Data_Lake
Bangladeshi-American computer scientist
leads SymbioticLab. He is the creator of coflow and the co-creator of Apache Spark. Chowdhury specializes in the fields of computer networking and large-scale
Mosharaf_Chowdhury
Open source platform
for the R and Python programming languages, and various Apache offerings (Apache Hadoop and Spark, as well as Maven). H2O Flow: a graphical web-based interactive
H2O_(software)
(a.k.a. JCR) content repository such as Apache Jackrabbit. Apache Solr Enterprise search platform Apache Spark Fast and general engine for big data processing
List_of_Java_frameworks
Programming tool blending code and documents
intelligence software. Example of projects or products of notebooks: Apache Spark Notebook – Apache License 2.0 GNU TeXmacs (a document processor which can act
Notebook_interface
Open-source set of common libraries for Java
standard JCF does not provide sufficient functionality, and its complement Apache Commons Collections had not adopted generics in order to maintain backward
Google_Guava
In databases, cached query results
UNIQUE CLUSTERED INDEX XV ON MV_MY_VIEW (COL1); Apache Kafka (since v0.10.2), Apache Spark (since v2.0), Apache Flink, Kinetica DB, Materialize, RisingWave
Materialized_view
Python library for parallel computing
Retrieved 2022-05-12. Patel, Harshil. "Which library should I use? Apache Spark, Dask, and Pandas Performance Compared (With Benchmarks)". censius.ai
Dask_(software)
Database using graph structures for queries
"ISO/IEC 39075:2024". ISO. Retrieved 2026-04-29. TinkerPop, Apache. "Apache TinkerPop". Apache TinkerPop. Retrieved 2016-11-02. Gosnell, Denise; Broecheler
Graph_database
Database management system ranking
FileMaker SAP HANA SAP Adaptive Server Apache Spark Microsoft Azure Cosmos DB InfluxDB PostGIS ClickHouse Apache HBase OpenSearch Firebird Memcached Microsoft
DB-Engines_ranking
American software company
"Couchbase 4.6 Developer Preview Released, Adds Real-Time Connectors for Apache Spark 2.0 and Kafka". InfoQ. Retrieved 2025-08-21. "Couchbase Developer Preview
Couchbase,_Inc.
SaaS software company
built on a core composed of open-source search technology Apache Solr and the Apache Spark computation framework. Three years later, in mid-2017, Lucidworks
Lucidworks
Array of numbers
2022), "Stark: Fast and scalable Strassen's matrix multiplication using Apache Spark", IEEE Transactions on Big Data, 8 (3): 699–710, arXiv:1811.07325, doi:10
Matrix_(mathematics)
APACHE SPARK
APACHE SPARK
Boy/Male
Spanish
Free.
Boy/Male
American, British, English
Lives Near Water
Girl/Female
British, English, Greek
Good
Boy/Male
Armenian, Australian
Nomadic Cart
Surname or Lastname
English
English : from a vernacular short form of the Latin personal name Paschalis (see Pascal, Italian Pasquale).nickname for a mild-mannered and peaceable person, from Middle English pace, pece ‘peace’, ‘concord’, ‘amity’ (via Anglo-Norman French from Latin pax, genitive pacis).Italian : from the medieval personal name Pace, used for both men and women, from the word pace ‘peace’ (see 1).
Surname or Lastname
North German
North German : variant of Asch.English : variant spelling of Ash (asche was the regular Middle English spelling of this word).
Boy/Male
Hebrew
Ready; prepared.
Female
Greek
(Αγάθη) Greek name derived from the word agathos, AGATHE means "good." It is the feminine form of Agathias.
Girl/Female
American, Australian
Storage Place
Girl/Female
Latin
A Lemnian woman.
Girl/Female
Greek Latin
Changed into a spider by Athena.
Female
French
Medieval French form of Latin Agatha, AGACE means "good."
Girl/Female
French German
Kind.
Boy/Male
Shakespearean
All's Well That Ends Well.' A clown and servant to the Countess of Rousillon.
Male
English
English surname transferred to forename use, derived from the French personal name Pascal, PACE means "Passover; Easter."
Female
Native American
Native American Cheyenne name AYASHE means "little one."
Girl/Female
Hindu, Indian
Fame; Sparkle
Female
Greek
(ἈÏάχνη) Greek myth name of a young girl who was turned into a spider by Athena, ARACHNE means "spider."
Surname or Lastname
English or Scottish
English or Scottish : unexplained.
Girl/Female
Native American
Little one.
APACHE SPARK
APACHE SPARK
Girl/Female
Hebrew
Pleasantness; acceptance; delightful.
Boy/Male
Russian American Slavic
Fight. Fighter. Famous bearers: Russian writer Boris Pasternak, author of Dr Zhivagoz; Boris...
Surname or Lastname
English and Scottish
English and Scottish : variant spelling of Elliott.Andrew Eliot, a shoemaker of East Coker, Somerset, England, who emigrated to Boston MA in 1670, was the founder of a distinguished American family which included the poet T. S. Eliot (1888–1965), who was born in St. Louis, MO.
Boy/Male
Tamil
A vow to a deity, Wish
Girl/Female
Muslim
Excelling
Girl/Female
American, Australian
Combination of Jasmine and Lene
Girl/Female
Tamil
Giving honor
Girl/Female
Latin Shakespearean
Staff bearer.
Female
Egyptian
, the granddaughter of Tetet.
Boy/Male
Indian
Good, Righteous, Safe, Whole, Flawless
APACHE SPARK
APACHE SPARK
APACHE SPARK
APACHE SPARK
APACHE SPARK
n.
A special involucre formed of one leaf and inclosing a spadix, as in aroid plants and palms. See the Note under Bract, and Illust. of Spadix.
v. t.
To develop, guide, or control the pace or paces of; to teach the pace; to break in.
n. pl.
A group of nomadic North American Indians including several tribes native of Arizona, New Mexico, etc.
n.
A genus (Atriplex) of herbs or low shrubs of the Goosefoot family, most of them with a mealy surface.
n.
See Appaume.
n.
Manner of stepping or moving; gait; walk; as, the walk, trot, canter, gallop, and amble are paces of the horse; a swaggering pace; a quick pace.
a.
Having a spathe; resembling a spathe; spathal.
v. i.
Continued pain, as distinguished from sudden twinges, or spasmodic pain. "Such an ache in my bones."
n.
The raccoon.
n.
Want of feeling; privation of passion, emotion, or excitement; dispassion; -- applied either to the body or the mind. As applied to the mind, it is a calmness, indolence, or state of indifference, incapable of being ruffled or roused to active interest or exertion by pleasure, pain, or passion.
n.
To arrange or adjust the spaces in or between; as, to space words, lines, or letters.
n.
One of the series of boilers in which the cane juice is treated in making sugar; especially, the last boiler of the series.
adv.
With a quick pace; quick; fast; speedily.
v. t.
To measure by steps or paces; as, to pace a piece of ground.
v. t.
One attached to another person or thing, as a part of a suite or staff. Specifically: One attached to an embassy.
n.
A plume or bunch of feathers, esp. such a bunch worn on the helmet; any military plume, or ornamental group of feathers.
v.
To scratch.
n.
Ache or pain in the ear.
n.
A quantity or portion of extension; distance from one thing to another; an interval between any two or more objects; as, the space between two stars or two hills; the sound was heard for the space of a mile.
n.
A tender to a fleet, formerly used for conveying men, orders, or treasure.