== Physical Plan ==
AdaptiveSparkPlan (22)
+- == Final Plan ==
Execute InsertIntoHadoopFsRelationCommand (12)
+- WriteFiles (11)
+- * Sort (10)
+- * Project (9)
+- * Filter (8)
+- Window (7)
+- * Sort (6)
+- AQEShuffleRead (5)
+- ShuffleQueryStage (4), Statistics(sizeInBytes=10.9 MiB, rowCount=3.13E+4)
+- Exchange (3)
+- * Project (2)
+- Scan csv (1)
+- == Initial Plan ==
Execute InsertIntoHadoopFsRelationCommand (21)
+- WriteFiles (20)
+- Sort (19)
+- Project (18)
+- Filter (17)
+- Window (16)
+- Sort (15)
+- Exchange (14)
+- Project (13)
+- Scan csv (1)
(1) Scan csv
Output [13]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9487, end_station_name#9488, end_station_id#9489, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494]
Batched: false
Location: InMemoryFileIndex [s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibike-tripdata-part00.csv]
ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_name:string,start_station_id:string,end_station_name:string,end_station_id:string,start_lat:double,start_lng:double,end_lat:double,end_lng:double,member_casual:string>
(2) Project [codegen id : 1]
Output [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, cast(start_station_id#9487 as double) AS start_station_id#9589, end_station_name#9488, cast(end_station_id#9489 as double) AS end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, isnotnull(ride_id#9482) AS valid_ride_id#9575, (ended_at#9485 > started_at#9484) AS valid_time#9576, (((isnotnull(end_station_id#9489) AND isnotnull(start_station_id#9487)) AND NOT (end_station_id#9489 = start_station_id#9487)) <=> true) AS valid_station#9577, input_file_name() AS _source_file#9578, year(cast(started_at#9484 as date)) AS year#9582, month(cast(started_at#9484 as date)) AS month#9583, start_station_id#9487, started_at#9484]
Input [13]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9487, end_station_name#9488, end_station_id#9489, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494]
(3) Exchange
Input [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484]
Arguments: hashpartitioning(start_station_id#9487, 200), ENSURE_REQUIREMENTS, [plan_id=1745]
(4) ShuffleQueryStage
Output [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484]
Arguments: 0
(5) AQEShuffleRead
Input [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484]
Arguments: coalesced
(6) Sort [codegen id : 2]
Input [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484]
Arguments: [start_station_id#9487 ASC NULLS FIRST, started_at#9484 ASC NULLS FIRST], false, 0
(7) Window
Input [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484]
Arguments: [row_number() windowspecdefinition(start_station_id#9487, started_at#9484 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#9581], [start_station_id#9487], [started_at#9484 ASC NULLS FIRST]
(8) Filter [codegen id : 3]
Input [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484, _start_station_ride_num#9581]
Condition : ((NOT valid_ride_id#9575 OR NOT valid_time#9576) OR NOT valid_station#9577)
(9) Project [codegen id : 3]
Output [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, 2026-04-08 09:20:33.476511 AS _processed_dttm#9579, _start_station_ride_num#9581, year#9582, month#9583, empty2null(date_format(started_at#9484, yyyyMM, Some(Etc/UTC))) AS yyyymm#10391]
Input [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484, _start_station_ride_num#9581]
(10) Sort [codegen id : 3]
Input [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, _processed_dttm#9579, _start_station_ride_num#9581, year#9582, month#9583, yyyymm#10391]
Arguments: [yyyymm#10391 ASC NULLS FIRST], false, 0
(11) WriteFiles
Input [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, _processed_dttm#9579, _start_station_ride_num#9581, year#9582, month#9583, yyyymm#10391]
(12) Execute InsertIntoHadoopFsRelationCommand
Input: []
Arguments: file:/notebooks/g8.kirsanov.dmitry/rzvde-g8-kirsanov-dmitry/raw_invalid/citibike_data, false, [yyyymm#10391], CSV, [header=true, __partition_columns=["yyyymm"], path=rzvde-g8-kirsanov-dmitry/raw_invalid/citibike_data/], Overwrite, [ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng, member_casual, valid_ride_id, valid_time, valid_station, _source_file, _processed_dttm, _start_station_ride_num, year, month, yyyymm]
(13) Project
Output [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, cast(start_station_id#9487 as double) AS start_station_id#9589, end_station_name#9488, cast(end_station_id#9489 as double) AS end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, isnotnull(ride_id#9482) AS valid_ride_id#9575, (ended_at#9485 > started_at#9484) AS valid_time#9576, (((isnotnull(end_station_id#9489) AND isnotnull(start_station_id#9487)) AND NOT (end_station_id#9489 = start_station_id#9487)) <=> true) AS valid_station#9577, input_file_name() AS _source_file#9578, year(cast(started_at#9484 as date)) AS year#9582, month(cast(started_at#9484 as date)) AS month#9583, start_station_id#9487, started_at#9484]
Input [13]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9487, end_station_name#9488, end_station_id#9489, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494]
(14) Exchange
Input [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484]
Arguments: hashpartitioning(start_station_id#9487, 200), ENSURE_REQUIREMENTS, [plan_id=1729]
(15) Sort
Input [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484]
Arguments: [start_station_id#9487 ASC NULLS FIRST, started_at#9484 ASC NULLS FIRST], false, 0
(16) Window
Input [21]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484]
Arguments: [row_number() windowspecdefinition(start_station_id#9487, started_at#9484 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#9581], [start_station_id#9487], [started_at#9484 ASC NULLS FIRST]
(17) Filter
Input [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484, _start_station_ride_num#9581]
Condition : ((NOT valid_ride_id#9575 OR NOT valid_time#9576) OR NOT valid_station#9577)
(18) Project
Output [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, 2026-04-08 09:20:33.476511 AS _processed_dttm#9579, _start_station_ride_num#9581, year#9582, month#9583, empty2null(date_format(started_at#9484, yyyyMM, Some(Etc/UTC))) AS yyyymm#10391]
Input [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, year#9582, month#9583, start_station_id#9487, started_at#9484, _start_station_ride_num#9581]
(19) Sort
Input [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, _processed_dttm#9579, _start_station_ride_num#9581, year#9582, month#9583, yyyymm#10391]
Arguments: [yyyymm#10391 ASC NULLS FIRST], false, 0
(20) WriteFiles
Input [22]: [ride_id#9482, rideable_type#9483, started_at#9484, ended_at#9485, start_station_name#9486, start_station_id#9589, end_station_name#9488, end_station_id#9591, start_lat#9490, start_lng#9491, end_lat#9492, end_lng#9493, member_casual#9494, valid_ride_id#9575, valid_time#9576, valid_station#9577, _source_file#9578, _processed_dttm#9579, _start_station_ride_num#9581, year#9582, month#9583, yyyymm#10391]
(21) Execute InsertIntoHadoopFsRelationCommand
Input: []
Arguments: file:/notebooks/g8.kirsanov.dmitry/rzvde-g8-kirsanov-dmitry/raw_invalid/citibike_data, false, [yyyymm#10391], CSV, [header=true, __partition_columns=["yyyymm"], path=rzvde-g8-kirsanov-dmitry/raw_invalid/citibike_data/], Overwrite, [ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng, member_casual, valid_ride_id, valid_time, valid_station, _source_file, _processed_dttm, _start_station_ride_num, year, month, yyyymm]
(22) AdaptiveSparkPlan
Output: []
Arguments: isFinalPlan=true