Skip to content

Commit 50dd117

Browse files
[Branch-1.4] Port #9851 #9879 to fix release issue (#9895)
* [VL] Fix link issues found in release process (#9851) * [GLUTEN-9878] Update LICENSE and NOTICE to list all licenses used for copied code. (#9879) * Update LICENSE and NOTICE to list all licenses used for copied code. * Update script from velox, gluten 2025, NOTICE-binary. --------- Co-authored-by: PHILO-HE <philo@apache.org>
1 parent bb28bb7 commit 50dd117

File tree

7 files changed

+105
-29
lines changed

7 files changed

+105
-29
lines changed

LICENSE

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,3 +200,63 @@
200200
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201201
See the License for the specific language governing permissions and
202202
limitations under the License.
203+
204+
This product bundles various third-party components also under the
205+
Apache Software License 2.0.
206+
207+
Apache DataFusion(https://github.com/apache/datafusion)
208+
./.github/workflows/take.yml
209+
210+
Apache Spark(https://github.com/apache/spark)
211+
./backends-clickhouse/src/main/scala/org/apache/spark/sql/execution/CHColumnarWrite.scala
212+
./backends-clickhouse/src/main/scala/org/apache/spark/sql/execution/SparkWriteFilesCommitProtocol.scala
213+
./cpp-ch/local-engine/Parser/aggregate_function_parser/BloomFilterAggParser.cpp
214+
./gluten-substrait/src/main/scala/org/apache/spark/sql/execution/GlutenExplainUtils.scala
215+
./shims/spark32/src/main/scala/org/apache/spark/sql/execution/FileSourceScanExecShim.scala
216+
./shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
217+
./shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
218+
./shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala
219+
./shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
220+
./shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
221+
./shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala.deprecated
222+
./shims/spark32/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
223+
./shims/spark32/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
224+
./shims/spark33/src/main/scala/org/apache/spark/sql/execution/FileSourceScanExecShim.scala
225+
./shims/spark33/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala
226+
./shims/spark33/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
227+
./shims/spark33/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
228+
./tools/gluten-it/common/src/main/scala/org/apache/spark/sql/TestUtils.scala
229+
230+
Delta Lake(https://github.com/delta-io/delta)
231+
./backends-clickhouse/src-delta-20/main/scala/org/apache/spark/sql/delta/DeltaLog.scala
232+
./backends-clickhouse/src-delta-20/main/scala/org/apache/spark/sql/delta/Snapshot.scala
233+
./backends-clickhouse/src-delta-20/main/scala/org/apache/spark/sql/delta/commands/DeleteCommand.scala
234+
./backends-clickhouse/src-delta-20/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala
235+
./backends-clickhouse/src-delta-20/main/scala/org/apache/spark/sql/delta/commands/OptimizeTableCommand.scala
236+
./backends-clickhouse/src-delta-20/main/scala/org/apache/spark/sql/delta/commands/UpdateCommand.scala
237+
./backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/DeltaLog.scala
238+
./backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/Snapshot.scala
239+
./backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/commands/DeleteCommand.scala
240+
./backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala
241+
./backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/commands/OptimizeTableCommand.scala
242+
./backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/commands/UpdateCommand.scala
243+
./backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/commands/VacuumCommand.scala
244+
./backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/stats/PrepareDeltaScan.scala
245+
./backends-clickhouse/src-delta-33/main/scala/org/apache/spark/sql/delta/DeltaLog.scala
246+
./backends-clickhouse/src-delta-33/main/scala/org/apache/spark/sql/delta/PreprocessTableWithDVs.scala
247+
./backends-clickhouse/src-delta-33/main/scala/org/apache/spark/sql/delta/Snapshot.scala
248+
./backends-clickhouse/src-delta-33/main/scala/org/apache/spark/sql/delta/commands/DMLWithDeletionVectorsHelper.scala
249+
./backends-clickhouse/src-delta-33/main/scala/org/apache/spark/sql/delta/commands/VacuumCommand.scala
250+
251+
The Velox Project(https://github.com/facebookincubator/velox)
252+
./cpp/velox/udf/examples/MyUDAF.cc
253+
./cpp/velox/utils/Common.cc
254+
./ep/build-velox/src/setup-centos7.sh
255+
./ep/build-velox/src/setup-centos8.sh
256+
./ep/build-velox/src/setup-openeuler24.sh
257+
./ep/build-velox/src/setup-rhel.sh
258+
259+
ClickHouse(https://github.com/ClickHouse/ClickHouse)
260+
./cpp-ch/local-engine/AggregateFunctions/AggregateFunctionPartialMerge.h
261+
./cpp-ch/local-engine/Functions/SparkFunctionArrayDistinct.cpp
262+

NOTICE

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,23 @@
11
Apache Gluten(incubating)
2-
Copyright 2023-2024 The Apache Software Foundation
2+
Copyright 2023-2025 The Apache Software Foundation
33

44
This product includes software developed at
55
The Apache Software Foundation (http://www.apache.org/).
66

77
The initial codebase was donated to the ASF by Intel and Kyligence, copyright 2023-2024.
8+
9+
Apache DataFusion
10+
Copyright 2019-2025 The Apache Software Foundation
11+
12+
Apache Spark
13+
Copyright 2014 and onwards The Apache Software Foundation
14+
15+
Delta Lake
16+
Copyright (2021) The Delta Lake Project Authors.
17+
18+
The Velox Project
19+
Copyright © 2024 Meta Platforms, Inc.
20+
21+
ClickHouse
22+
Copyright 2016-2025 ClickHouse, Inc.
23+

NOTICE-binary

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@ Copyright 2022-2024 The Apache Software Foundation.
1818

1919
---------------------------------------------------------
2020

21+
Apache DataFusion
22+
Copyright 2019-2025 The Apache Software Foundation
23+
24+
---------------------------------------------------------
25+
2126
Apache Uniffle (incubating)
2227
Copyright 2022 and onwards The Apache Software Foundation.
2328

@@ -43,6 +48,21 @@ Copyright (C) 2006 - 2019, The Apache Software Foundation.
4348

4449
---------------------------------------------------------
4550

51+
Delta Lake
52+
Copyright (2021) The Delta Lake Project Authors.
53+
54+
---------------------------------------------------------
55+
56+
The Velox Project
57+
Copyright © 2024 Meta Platforms, Inc.
58+
59+
---------------------------------------------------------
60+
61+
ClickHouse
62+
Copyright 2016-2025 ClickHouse, Inc.
63+
64+
---------------------------------------------------------
65+
4666
This project includes code from Daniel Lemire's FrameOfReference project.
4767

4868
https://github.com/lemire/FrameOfReference/blob/6ccaf9e97160f9a3b299e23a8ef739e711ef0c71/src/bpacking.cpp

tools/gluten-it/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,27 @@
22

33
The project makes it easy to test Gluten build locally.
44

5-
## Gluten ?
5+
## Gluten
66

77
Gluten is a native Spark SQL implementation as a standard Spark plug-in.
88

99
https://github.com/apache/incubator-gluten
1010

1111
## Getting Started
1212

13-
### 1. Install Gluten in your local machine
13+
### 1. Build Gluten
1414

15-
See official Gluten build guidance https://github.com/apache/incubator-gluten#how-to-use-gluten
15+
See official Gluten build guidance https://github.com/apache/incubator-gluten#build-from-source.
1616

17-
### 2. Install and run gluten-it with Spark version
17+
### 2. Build and run gluten-it
1818

1919
```sh
2020
cd gluten/tools/gluten-it
2121
mvn clean package -P{Spark-Version}
2222
sbin/gluten-it.sh
2323
```
2424

25-
> Note: *Spark-Version* support *spark-3.2* and *spark-3.3* only
25+
Note: **Spark-Version** can only be **spark-3.2**, **spark-3.3**, **spark-3.4** or **spark-3.5**.
2626

2727
## Usage
2828

tools/gluten-it/sbin/gluten-it.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ SPARK_JVM_OPTIONS=$($JAVA_HOME/bin/java -cp $JAR_PATH org.apache.gluten.integrat
3030

3131
EMBEDDED_SPARK_HOME=$BASEDIR/../spark-home
3232

33+
mkdir $EMBEDDED_SPARK_HOME && ln -snf $BASEDIR/../package/target/lib $EMBEDDED_SPARK_HOME/jars
34+
3335
# We temporarily disallow setting these two variables by caller.
3436
SPARK_HOME=""
3537
SPARK_SCALA_VERSION=""

tools/gluten-it/spark-home/jars

Lines changed: 0 additions & 1 deletion
This file was deleted.

tools/workload/tpch/README.md

Lines changed: 1 addition & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Test on Velox backend with TPC-H workload
22

33
## Test datasets
4-
Parquet and DWRF(a fork of the ORC file format) format files are both supported. Here are the steps to generate the testing datasets:
4+
Parquet and DWRF (a fork of the ORC file format) format files are both supported. Here are the steps to generate the testing datasets:
55

66
### Generate the Parquet dataset
77
Please refer to the scripts in [parquet_dataset](./gen_data/parquet_dataset/) directory to generate parquet dataset. Note this script relies on the [spark-sql-perf](https://github.com/databricks/spark-sql-perf) and [tpch-dbgen](https://github.com/databricks/tpch-dbgen) package from Databricks. Note in the tpch-dbgen kits, we need to do a slight modification to allow Spark to convert the csv based content to parquet, please make sure to use this commit: [0469309147b42abac8857fa61b4cf69a6d3128a8](https://github.com/databricks/tpch-dbgen/commit/0469309147b42abac8857fa61b4cf69a6d3128a8)
@@ -26,27 +26,6 @@ val rootDir = "/PATH/TO/TPCH_PARQUET_PATH" // root directory of location to crea
2626
val dbgenDir = "/PATH/TO/TPCH_DBGEN" // location of dbgen
2727
```
2828

29-
Currently, Gluten with Velox can support both Parquet and DWRF file format and three compression codec including snappy, gzip, zstd.
30-
Below step, to convert Parquet to DWRF, is optional if you are using Parquet format to run the testing.
31-
32-
### Convert the Parquet dataset to DWRF dataset(OPTIONAL)
33-
And then please refer to the scripts in [dwrf_dataset](./gen_data/dwrf_dataset/) directory to convert the Parquet dataset to DWRF dataset.
34-
35-
In tpch_convert_parquet_dwrf.sh, spark configures should be set according to the system.
36-
37-
```
38-
export GLUTEN_HOME=/PATH/TO/gluten
39-
...
40-
--executor-cores 8 \
41-
--num-executors 14 \
42-
```
43-
44-
In tpch_convert_parquet_dwrf.scala, the table path should be configured.
45-
```
46-
val parquet_file_path = "/PATH/TO/TPCH_PARQUET_PATH"
47-
val dwrf_file_path = "/PATH/TO/TPCH_DWRF_PATH"
48-
```
49-
5029
## Test Queries
5130
We provide the test queries in [TPC-H queries](../../../tools/gluten-it/common/src/main/resources/tpch-queries).
5231
We also provide a scala script in [Run TPC-H](./run_tpch/) directory about how to run TPC-H queries.

0 commit comments

Comments
 (0)