== Physical Plan ==
AdaptiveSparkPlan (17)
+- == Final Plan ==
CollectLimit (13)
+- * Project (12)
+- * Filter (11)
+- TableCacheQueryStage (10), Statistics(sizeInBytes=8.2 MiB, rowCount=3.13E+4)
+- InMemoryTableScan (1)
+- InMemoryRelation (2)
+- AdaptiveSparkPlan (9)
+- Project (8)
+- Window (7)
+- Sort (6)
+- Exchange (5)
+- Project (4)
+- Scan csv (3)
+- == Initial Plan ==
CollectLimit (16)
+- Project (15)
+- Filter (14)
+- InMemoryTableScan (1)
+- InMemoryRelation (2)
+- AdaptiveSparkPlan (9)
+- Project (8)
+- Window (7)
+- Sort (6)
+- Exchange (5)
+- Project (4)
+- Scan csv (3)
(1) InMemoryTableScan
Output [21]: [_processed_dttm#5042, _source_file#5041, _start_station_ride_num#5044, end_lat#5057, end_lng#5058, end_station_id#5054, end_station_name#5053, ended_at#5050, member_casual#5059, month#5046, ride_id#5047, rideable_type#5048, start_lat#5055, start_lng#5056, start_station_id#5052, start_station_name#5051, started_at#5049, valid_ride_id#5038, valid_station#5040, valid_time#5039, year#5045]
Arguments: [_processed_dttm#5042, _source_file#5041, _start_station_ride_num#5044, end_lat#5057, end_lng#5058, end_station_id#5054, end_station_name#5053, ended_at#5050, member_casual#5059, month#5046, ride_id#5047, rideable_type#5048, start_lat#5055, start_lng#5056, start_station_id#5052, start_station_name#5051, started_at#5049, valid_ride_id#5038, valid_station#5040, valid_time#5039, year#5045], [((NOT valid_ride_id#5038 OR NOT valid_time#5039) OR NOT coalesce(valid_station#5040, false))]
(2) InMemoryRelation
Arguments: [ride_id#5047, rideable_type#5048, started_at#5049, ended_at#5050, start_station_name#5051, start_station_id#5052, end_station_name#5053, end_station_id#5054, start_lat#5055, start_lng#5056, end_lat#5057, end_lng#5058, member_casual#5059, valid_ride_id#5038, valid_time#5039, valid_station#5040, _source_file#5041, _processed_dttm#5042, _start_station_ride_num#5044, year#5045, month#5046], CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer@43d79ed6,StorageLevel(disk, memory, deserialized, 1 replicas),AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(3) Project [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, 2026-04-08 08:20:23.917168 AS _processed_dttm#349, _start_station_ride_num#351, year#352, month#353]
+- Window [row_number() windowspecdefinition(start_station_id#257, started_at#254 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#351], [start_station_id#257], [started_at#254 ASC NULLS FIRST]
+- *(2) Sort [start_station_id#257 ASC NULLS FIRST, started_at#254 ASC NULLS FIRST], false, 0
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- Exchange hashpartitioning(start_station_id#257, 200), ENSURE_REQUIREMENTS, [plan_id=151]
+- *(1) Project [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, cast(start_station_id#257 as double) AS start_station_id#359, end_station_name#258, cast(end_station_id#259 as double) AS end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, isnotnull(ride_id#252) AS valid_ride_id#345, (ended_at#255 > started_at#254) AS valid_time#346, NOT (end_station_id#259 = start_station_id#257) AS valid_station#347, input_file_name() AS _source_file#348, year(cast(started_at#254 as date)) AS year#352, month(cast(started_at#254 as date)) AS month#353, start_station_id#257, started_at#254]
+- FileScan csv [ride_id#252,rideable_type#253,started_at#254,ended_at#255,start_station_name#256,start_station_id#257,end_station_name#258,end_station_id#259,start_lat#260,start_lng#261,end_lat#262,end_lng#263,member_casual#264] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibik..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_...
+- == Initial Plan ==
Project [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, 2026-04-08 08:20:23.917168 AS _processed_dttm#349, _start_station_ride_num#351, year#352, month#353]
+- Window [row_number() windowspecdefinition(start_station_id#257, started_at#254 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#351], [start_station_id#257], [started_at#254 ASC NULLS FIRST]
+- Sort [start_station_id#257 ASC NULLS FIRST, started_at#254 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(start_station_id#257, 200), ENSURE_REQUIREMENTS, [plan_id=113]
+- Project [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, cast(start_station_id#257 as double) AS start_station_id#359, end_station_name#258, cast(end_station_id#259 as double) AS end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, isnotnull(ride_id#252) AS valid_ride_id#345, (ended_at#255 > started_at#254) AS valid_time#346, NOT (end_station_id#259 = start_station_id#257) AS valid_station#347, input_file_name() AS _source_file#348, year(cast(started_at#254 as date)) AS year#352, month(cast(started_at#254 as date)) AS month#353, start_station_id#257, started_at#254]
+- FileScan csv [ride_id#252,rideable_type#253,started_at#254,ended_at#255,start_station_name#256,start_station_id#257,end_station_name#258,end_station_id#259,start_lat#260,start_lng#261,end_lat#262,end_lng#263,member_casual#264] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibik..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_...
,None)
(3) Scan csv
Output [13]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#257, end_station_name#258, end_station_id#259, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264]
Batched: false
Location: InMemoryFileIndex [s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibike-tripdata-part00.csv]
ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_name:string,start_station_id:string,end_station_name:string,end_station_id:string,start_lat:double,start_lng:double,end_lat:double,end_lng:double,member_casual:string>
(4) Project
Output [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, cast(start_station_id#257 as double) AS start_station_id#359, end_station_name#258, cast(end_station_id#259 as double) AS end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, isnotnull(ride_id#252) AS valid_ride_id#345, (ended_at#255 > started_at#254) AS valid_time#346, NOT (end_station_id#259 = start_station_id#257) AS valid_station#347, input_file_name() AS _source_file#348, year(cast(started_at#254 as date)) AS year#352, month(cast(started_at#254 as date)) AS month#353, start_station_id#257, started_at#254]
Input [13]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#257, end_station_name#258, end_station_id#259, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264]
(5) Exchange
Input [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, year#352, month#353, start_station_id#257, started_at#254]
Arguments: hashpartitioning(start_station_id#257, 200), ENSURE_REQUIREMENTS, [plan_id=957]
(6) Sort
Input [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, year#352, month#353, start_station_id#257, started_at#254]
Arguments: [start_station_id#257 ASC NULLS FIRST, started_at#254 ASC NULLS FIRST], false, 0
(7) Window
Input [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, year#352, month#353, start_station_id#257, started_at#254]
Arguments: [row_number() windowspecdefinition(start_station_id#257, started_at#254 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#351], [start_station_id#257], [started_at#254 ASC NULLS FIRST]
(8) Project
Output [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, 2026-04-08 08:20:23.917168 AS _processed_dttm#349, _start_station_ride_num#351, year#352, month#353]
Input [22]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, year#352, month#353, start_station_id#257, started_at#254, _start_station_ride_num#351]
(9) AdaptiveSparkPlan
Output [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, _processed_dttm#349, _start_station_ride_num#351, year#352, month#353]
Arguments: isFinalPlan=false
(10) TableCacheQueryStage
Output [21]: [_processed_dttm#5042, _source_file#5041, _start_station_ride_num#5044, end_lat#5057, end_lng#5058, end_station_id#5054, end_station_name#5053, ended_at#5050, member_casual#5059, month#5046, ride_id#5047, rideable_type#5048, start_lat#5055, start_lng#5056, start_station_id#5052, start_station_name#5051, started_at#5049, valid_ride_id#5038, valid_station#5040, valid_time#5039, year#5045]
Arguments: 0
(11) Filter [codegen id : 1]
Input [21]: [_processed_dttm#5042, _source_file#5041, _start_station_ride_num#5044, end_lat#5057, end_lng#5058, end_station_id#5054, end_station_name#5053, ended_at#5050, member_casual#5059, month#5046, ride_id#5047, rideable_type#5048, start_lat#5055, start_lng#5056, start_station_id#5052, start_station_name#5051, started_at#5049, valid_ride_id#5038, valid_station#5040, valid_time#5039, year#5045]
Condition : ((NOT valid_ride_id#5038 OR NOT valid_time#5039) OR NOT coalesce(valid_station#5040, false))
(12) Project [codegen id : 1]
Output [21]: [toprettystring(ride_id#5047, Some(Etc/UTC)) AS toprettystring(ride_id)#8979, toprettystring(rideable_type#5048, Some(Etc/UTC)) AS toprettystring(rideable_type)#8980, toprettystring(started_at#5049, Some(Etc/UTC)) AS toprettystring(started_at)#8981, toprettystring(ended_at#5050, Some(Etc/UTC)) AS toprettystring(ended_at)#8982, toprettystring(start_station_name#5051, Some(Etc/UTC)) AS toprettystring(start_station_name)#8983, toprettystring(start_station_id#5052, Some(Etc/UTC)) AS toprettystring(start_station_id)#8984, toprettystring(end_station_name#5053, Some(Etc/UTC)) AS toprettystring(end_station_name)#8985, toprettystring(end_station_id#5054, Some(Etc/UTC)) AS toprettystring(end_station_id)#8986, toprettystring(start_lat#5055, Some(Etc/UTC)) AS toprettystring(start_lat)#8987, toprettystring(start_lng#5056, Some(Etc/UTC)) AS toprettystring(start_lng)#8988, toprettystring(end_lat#5057, Some(Etc/UTC)) AS toprettystring(end_lat)#8989, toprettystring(end_lng#5058, Some(Etc/UTC)) AS toprettystring(end_lng)#8990, toprettystring(member_casual#5059, Some(Etc/UTC)) AS toprettystring(member_casual)#8991, toprettystring(valid_ride_id#5038, Some(Etc/UTC)) AS toprettystring(valid_ride_id)#8992, toprettystring(valid_time#5039, Some(Etc/UTC)) AS toprettystring(valid_time)#8993, toprettystring(valid_station#5040, Some(Etc/UTC)) AS toprettystring(valid_station)#8994, toprettystring(_source_file#5041, Some(Etc/UTC)) AS toprettystring(_source_file)#8995, toprettystring(_processed_dttm#5042, Some(Etc/UTC)) AS toprettystring(_processed_dttm)#8996, toprettystring(_start_station_ride_num#5044, Some(Etc/UTC)) AS toprettystring(_start_station_ride_num)#8997, toprettystring(year#5045, Some(Etc/UTC)) AS toprettystring(year)#8998, toprettystring(month#5046, Some(Etc/UTC)) AS toprettystring(month)#8999]
Input [21]: [_processed_dttm#5042, _source_file#5041, _start_station_ride_num#5044, end_lat#5057, end_lng#5058, end_station_id#5054, end_station_name#5053, ended_at#5050, member_casual#5059, month#5046, ride_id#5047, rideable_type#5048, start_lat#5055, start_lng#5056, start_station_id#5052, start_station_name#5051, started_at#5049, valid_ride_id#5038, valid_station#5040, valid_time#5039, year#5045]
(13) CollectLimit
Input [21]: [toprettystring(ride_id)#8979, toprettystring(rideable_type)#8980, toprettystring(started_at)#8981, toprettystring(ended_at)#8982, toprettystring(start_station_name)#8983, toprettystring(start_station_id)#8984, toprettystring(end_station_name)#8985, toprettystring(end_station_id)#8986, toprettystring(start_lat)#8987, toprettystring(start_lng)#8988, toprettystring(end_lat)#8989, toprettystring(end_lng)#8990, toprettystring(member_casual)#8991, toprettystring(valid_ride_id)#8992, toprettystring(valid_time)#8993, toprettystring(valid_station)#8994, toprettystring(_source_file)#8995, toprettystring(_processed_dttm)#8996, toprettystring(_start_station_ride_num)#8997, toprettystring(year)#8998, toprettystring(month)#8999]
Arguments: 21
(14) Filter
Input [21]: [_processed_dttm#5042, _source_file#5041, _start_station_ride_num#5044, end_lat#5057, end_lng#5058, end_station_id#5054, end_station_name#5053, ended_at#5050, member_casual#5059, month#5046, ride_id#5047, rideable_type#5048, start_lat#5055, start_lng#5056, start_station_id#5052, start_station_name#5051, started_at#5049, valid_ride_id#5038, valid_station#5040, valid_time#5039, year#5045]
Condition : ((NOT valid_ride_id#5038 OR NOT valid_time#5039) OR NOT coalesce(valid_station#5040, false))
(15) Project
Output [21]: [toprettystring(ride_id#5047, Some(Etc/UTC)) AS toprettystring(ride_id)#8979, toprettystring(rideable_type#5048, Some(Etc/UTC)) AS toprettystring(rideable_type)#8980, toprettystring(started_at#5049, Some(Etc/UTC)) AS toprettystring(started_at)#8981, toprettystring(ended_at#5050, Some(Etc/UTC)) AS toprettystring(ended_at)#8982, toprettystring(start_station_name#5051, Some(Etc/UTC)) AS toprettystring(start_station_name)#8983, toprettystring(start_station_id#5052, Some(Etc/UTC)) AS toprettystring(start_station_id)#8984, toprettystring(end_station_name#5053, Some(Etc/UTC)) AS toprettystring(end_station_name)#8985, toprettystring(end_station_id#5054, Some(Etc/UTC)) AS toprettystring(end_station_id)#8986, toprettystring(start_lat#5055, Some(Etc/UTC)) AS toprettystring(start_lat)#8987, toprettystring(start_lng#5056, Some(Etc/UTC)) AS toprettystring(start_lng)#8988, toprettystring(end_lat#5057, Some(Etc/UTC)) AS toprettystring(end_lat)#8989, toprettystring(end_lng#5058, Some(Etc/UTC)) AS toprettystring(end_lng)#8990, toprettystring(member_casual#5059, Some(Etc/UTC)) AS toprettystring(member_casual)#8991, toprettystring(valid_ride_id#5038, Some(Etc/UTC)) AS toprettystring(valid_ride_id)#8992, toprettystring(valid_time#5039, Some(Etc/UTC)) AS toprettystring(valid_time)#8993, toprettystring(valid_station#5040, Some(Etc/UTC)) AS toprettystring(valid_station)#8994, toprettystring(_source_file#5041, Some(Etc/UTC)) AS toprettystring(_source_file)#8995, toprettystring(_processed_dttm#5042, Some(Etc/UTC)) AS toprettystring(_processed_dttm)#8996, toprettystring(_start_station_ride_num#5044, Some(Etc/UTC)) AS toprettystring(_start_station_ride_num)#8997, toprettystring(year#5045, Some(Etc/UTC)) AS toprettystring(year)#8998, toprettystring(month#5046, Some(Etc/UTC)) AS toprettystring(month)#8999]
Input [21]: [_processed_dttm#5042, _source_file#5041, _start_station_ride_num#5044, end_lat#5057, end_lng#5058, end_station_id#5054, end_station_name#5053, ended_at#5050, member_casual#5059, month#5046, ride_id#5047, rideable_type#5048, start_lat#5055, start_lng#5056, start_station_id#5052, start_station_name#5051, started_at#5049, valid_ride_id#5038, valid_station#5040, valid_time#5039, year#5045]
(16) CollectLimit
Input [21]: [toprettystring(ride_id)#8979, toprettystring(rideable_type)#8980, toprettystring(started_at)#8981, toprettystring(ended_at)#8982, toprettystring(start_station_name)#8983, toprettystring(start_station_id)#8984, toprettystring(end_station_name)#8985, toprettystring(end_station_id)#8986, toprettystring(start_lat)#8987, toprettystring(start_lng)#8988, toprettystring(end_lat)#8989, toprettystring(end_lng)#8990, toprettystring(member_casual)#8991, toprettystring(valid_ride_id)#8992, toprettystring(valid_time)#8993, toprettystring(valid_station)#8994, toprettystring(_source_file)#8995, toprettystring(_processed_dttm)#8996, toprettystring(_start_station_ride_num)#8997, toprettystring(year)#8998, toprettystring(month)#8999]
Arguments: 21
(17) AdaptiveSparkPlan
Output [21]: [toprettystring(ride_id)#8979, toprettystring(rideable_type)#8980, toprettystring(started_at)#8981, toprettystring(ended_at)#8982, toprettystring(start_station_name)#8983, toprettystring(start_station_id)#8984, toprettystring(end_station_name)#8985, toprettystring(end_station_id)#8986, toprettystring(start_lat)#8987, toprettystring(start_lng)#8988, toprettystring(end_lat)#8989, toprettystring(end_lng)#8990, toprettystring(member_casual)#8991, toprettystring(valid_ride_id)#8992, toprettystring(valid_time)#8993, toprettystring(valid_station)#8994, toprettystring(_source_file)#8995, toprettystring(_processed_dttm)#8996, toprettystring(_start_station_ride_num)#8997, toprettystring(year)#8998, toprettystring(month)#8999]
Arguments: isFinalPlan=true