== Physical Plan ==
AdaptiveSparkPlan (15)
+- == Final Plan ==
CollectLimit (12)
+- * Project (11)
+- TableCacheQueryStage (10), Statistics(sizeInBytes=8.2 MiB, rowCount=3.13E+4)
+- InMemoryTableScan (1)
+- InMemoryRelation (2)
+- AdaptiveSparkPlan (9)
+- Project (8)
+- Window (7)
+- Sort (6)
+- Exchange (5)
+- Project (4)
+- Scan csv (3)
+- == Initial Plan ==
CollectLimit (14)
+- Project (13)
+- InMemoryTableScan (1)
+- InMemoryRelation (2)
+- AdaptiveSparkPlan (9)
+- Project (8)
+- Window (7)
+- Sort (6)
+- Exchange (5)
+- Project (4)
+- Scan csv (3)
(1) InMemoryTableScan
Output [21]: [_processed_dttm#1566, _source_file#1565, _start_station_ride_num#1568, end_lat#1581, end_lng#1582, end_station_id#1578, end_station_name#1577, ended_at#1574, member_casual#1583, month#1570, ride_id#1571, rideable_type#1572, start_lat#1579, start_lng#1580, start_station_id#1576, start_station_name#1575, started_at#1573, valid_ride_id#1562, valid_station#1564, valid_time#1563, year#1569]
Arguments: [_processed_dttm#1566, _source_file#1565, _start_station_ride_num#1568, end_lat#1581, end_lng#1582, end_station_id#1578, end_station_name#1577, ended_at#1574, member_casual#1583, month#1570, ride_id#1571, rideable_type#1572, start_lat#1579, start_lng#1580, start_station_id#1576, start_station_name#1575, started_at#1573, valid_ride_id#1562, valid_station#1564, valid_time#1563, year#1569]
(2) InMemoryRelation
Arguments: [ride_id#1571, rideable_type#1572, started_at#1573, ended_at#1574, start_station_name#1575, start_station_id#1576, end_station_name#1577, end_station_id#1578, start_lat#1579, start_lng#1580, end_lat#1581, end_lng#1582, member_casual#1583, valid_ride_id#1562, valid_time#1563, valid_station#1564, _source_file#1565, _processed_dttm#1566, _start_station_ride_num#1568, year#1569, month#1570], CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer@43d79ed6,StorageLevel(disk, memory, deserialized, 1 replicas),AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(3) Project [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, 2026-04-08 08:20:23.917168 AS _processed_dttm#349, _start_station_ride_num#351, year#352, month#353]
+- Window [row_number() windowspecdefinition(start_station_id#257, started_at#254 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#351], [start_station_id#257], [started_at#254 ASC NULLS FIRST]
+- *(2) Sort [start_station_id#257 ASC NULLS FIRST, started_at#254 ASC NULLS FIRST], false, 0
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- Exchange hashpartitioning(start_station_id#257, 200), ENSURE_REQUIREMENTS, [plan_id=151]
+- *(1) Project [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, cast(start_station_id#257 as double) AS start_station_id#359, end_station_name#258, cast(end_station_id#259 as double) AS end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, isnotnull(ride_id#252) AS valid_ride_id#345, (ended_at#255 > started_at#254) AS valid_time#346, NOT (end_station_id#259 = start_station_id#257) AS valid_station#347, input_file_name() AS _source_file#348, year(cast(started_at#254 as date)) AS year#352, month(cast(started_at#254 as date)) AS month#353, start_station_id#257, started_at#254]
+- FileScan csv [ride_id#252,rideable_type#253,started_at#254,ended_at#255,start_station_name#256,start_station_id#257,end_station_name#258,end_station_id#259,start_lat#260,start_lng#261,end_lat#262,end_lng#263,member_casual#264] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibik..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_...
+- == Initial Plan ==
Project [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, 2026-04-08 08:20:23.917168 AS _processed_dttm#349, _start_station_ride_num#351, year#352, month#353]
+- Window [row_number() windowspecdefinition(start_station_id#257, started_at#254 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#351], [start_station_id#257], [started_at#254 ASC NULLS FIRST]
+- Sort [start_station_id#257 ASC NULLS FIRST, started_at#254 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(start_station_id#257, 200), ENSURE_REQUIREMENTS, [plan_id=113]
+- Project [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, cast(start_station_id#257 as double) AS start_station_id#359, end_station_name#258, cast(end_station_id#259 as double) AS end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, isnotnull(ride_id#252) AS valid_ride_id#345, (ended_at#255 > started_at#254) AS valid_time#346, NOT (end_station_id#259 = start_station_id#257) AS valid_station#347, input_file_name() AS _source_file#348, year(cast(started_at#254 as date)) AS year#352, month(cast(started_at#254 as date)) AS month#353, start_station_id#257, started_at#254]
+- FileScan csv [ride_id#252,rideable_type#253,started_at#254,ended_at#255,start_station_name#256,start_station_id#257,end_station_name#258,end_station_id#259,start_lat#260,start_lng#261,end_lat#262,end_lng#263,member_casual#264] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibik..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_...
,None)
(3) Scan csv
Output [13]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#257, end_station_name#258, end_station_id#259, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264]
Batched: false
Location: InMemoryFileIndex [s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibike-tripdata-part00.csv]
ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_name:string,start_station_id:string,end_station_name:string,end_station_id:string,start_lat:double,start_lng:double,end_lat:double,end_lng:double,member_casual:string>
(4) Project
Output [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, cast(start_station_id#257 as double) AS start_station_id#359, end_station_name#258, cast(end_station_id#259 as double) AS end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, isnotnull(ride_id#252) AS valid_ride_id#345, (ended_at#255 > started_at#254) AS valid_time#346, NOT (end_station_id#259 = start_station_id#257) AS valid_station#347, input_file_name() AS _source_file#348, year(cast(started_at#254 as date)) AS year#352, month(cast(started_at#254 as date)) AS month#353, start_station_id#257, started_at#254]
Input [13]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#257, end_station_name#258, end_station_id#259, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264]
(5) Exchange
Input [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, year#352, month#353, start_station_id#257, started_at#254]
Arguments: hashpartitioning(start_station_id#257, 200), ENSURE_REQUIREMENTS, [plan_id=293]
(6) Sort
Input [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, year#352, month#353, start_station_id#257, started_at#254]
Arguments: [start_station_id#257 ASC NULLS FIRST, started_at#254 ASC NULLS FIRST], false, 0
(7) Window
Input [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, year#352, month#353, start_station_id#257, started_at#254]
Arguments: [row_number() windowspecdefinition(start_station_id#257, started_at#254 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#351], [start_station_id#257], [started_at#254 ASC NULLS FIRST]
(8) Project
Output [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, 2026-04-08 08:20:23.917168 AS _processed_dttm#349, _start_station_ride_num#351, year#352, month#353]
Input [22]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, year#352, month#353, start_station_id#257, started_at#254, _start_station_ride_num#351]
(9) AdaptiveSparkPlan
Output [21]: [ride_id#252, rideable_type#253, started_at#254, ended_at#255, start_station_name#256, start_station_id#359, end_station_name#258, end_station_id#361, start_lat#260, start_lng#261, end_lat#262, end_lng#263, member_casual#264, valid_ride_id#345, valid_time#346, valid_station#347, _source_file#348, _processed_dttm#349, _start_station_ride_num#351, year#352, month#353]
Arguments: isFinalPlan=false
(10) TableCacheQueryStage
Output [21]: [_processed_dttm#1566, _source_file#1565, _start_station_ride_num#1568, end_lat#1581, end_lng#1582, end_station_id#1578, end_station_name#1577, ended_at#1574, member_casual#1583, month#1570, ride_id#1571, rideable_type#1572, start_lat#1579, start_lng#1580, start_station_id#1576, start_station_name#1575, started_at#1573, valid_ride_id#1562, valid_station#1564, valid_time#1563, year#1569]
Arguments: 0
(11) Project [codegen id : 1]
Output [21]: [toprettystring(ride_id#1571, Some(Etc/UTC)) AS toprettystring(ride_id)#2177, toprettystring(rideable_type#1572, Some(Etc/UTC)) AS toprettystring(rideable_type)#2178, toprettystring(started_at#1573, Some(Etc/UTC)) AS toprettystring(started_at)#2179, toprettystring(ended_at#1574, Some(Etc/UTC)) AS toprettystring(ended_at)#2180, toprettystring(start_station_name#1575, Some(Etc/UTC)) AS toprettystring(start_station_name)#2181, toprettystring(start_station_id#1576, Some(Etc/UTC)) AS toprettystring(start_station_id)#2182, toprettystring(end_station_name#1577, Some(Etc/UTC)) AS toprettystring(end_station_name)#2183, toprettystring(end_station_id#1578, Some(Etc/UTC)) AS toprettystring(end_station_id)#2184, toprettystring(start_lat#1579, Some(Etc/UTC)) AS toprettystring(start_lat)#2185, toprettystring(start_lng#1580, Some(Etc/UTC)) AS toprettystring(start_lng)#2186, toprettystring(end_lat#1581, Some(Etc/UTC)) AS toprettystring(end_lat)#2187, toprettystring(end_lng#1582, Some(Etc/UTC)) AS toprettystring(end_lng)#2188, toprettystring(member_casual#1583, Some(Etc/UTC)) AS toprettystring(member_casual)#2189, toprettystring(valid_ride_id#1562, Some(Etc/UTC)) AS toprettystring(valid_ride_id)#2190, toprettystring(valid_time#1563, Some(Etc/UTC)) AS toprettystring(valid_time)#2191, toprettystring(valid_station#1564, Some(Etc/UTC)) AS toprettystring(valid_station)#2192, toprettystring(_source_file#1565, Some(Etc/UTC)) AS toprettystring(_source_file)#2193, toprettystring(_processed_dttm#1566, Some(Etc/UTC)) AS toprettystring(_processed_dttm)#2194, toprettystring(_start_station_ride_num#1568, Some(Etc/UTC)) AS toprettystring(_start_station_ride_num)#2195, toprettystring(year#1569, Some(Etc/UTC)) AS toprettystring(year)#2196, toprettystring(month#1570, Some(Etc/UTC)) AS toprettystring(month)#2197]
Input [21]: [_processed_dttm#1566, _source_file#1565, _start_station_ride_num#1568, end_lat#1581, end_lng#1582, end_station_id#1578, end_station_name#1577, ended_at#1574, member_casual#1583, month#1570, ride_id#1571, rideable_type#1572, start_lat#1579, start_lng#1580, start_station_id#1576, start_station_name#1575, started_at#1573, valid_ride_id#1562, valid_station#1564, valid_time#1563, year#1569]
(12) CollectLimit
Input [21]: [toprettystring(ride_id)#2177, toprettystring(rideable_type)#2178, toprettystring(started_at)#2179, toprettystring(ended_at)#2180, toprettystring(start_station_name)#2181, toprettystring(start_station_id)#2182, toprettystring(end_station_name)#2183, toprettystring(end_station_id)#2184, toprettystring(start_lat)#2185, toprettystring(start_lng)#2186, toprettystring(end_lat)#2187, toprettystring(end_lng)#2188, toprettystring(member_casual)#2189, toprettystring(valid_ride_id)#2190, toprettystring(valid_time)#2191, toprettystring(valid_station)#2192, toprettystring(_source_file)#2193, toprettystring(_processed_dttm)#2194, toprettystring(_start_station_ride_num)#2195, toprettystring(year)#2196, toprettystring(month)#2197]
Arguments: 3
(13) Project
Output [21]: [toprettystring(ride_id#1571, Some(Etc/UTC)) AS toprettystring(ride_id)#2177, toprettystring(rideable_type#1572, Some(Etc/UTC)) AS toprettystring(rideable_type)#2178, toprettystring(started_at#1573, Some(Etc/UTC)) AS toprettystring(started_at)#2179, toprettystring(ended_at#1574, Some(Etc/UTC)) AS toprettystring(ended_at)#2180, toprettystring(start_station_name#1575, Some(Etc/UTC)) AS toprettystring(start_station_name)#2181, toprettystring(start_station_id#1576, Some(Etc/UTC)) AS toprettystring(start_station_id)#2182, toprettystring(end_station_name#1577, Some(Etc/UTC)) AS toprettystring(end_station_name)#2183, toprettystring(end_station_id#1578, Some(Etc/UTC)) AS toprettystring(end_station_id)#2184, toprettystring(start_lat#1579, Some(Etc/UTC)) AS toprettystring(start_lat)#2185, toprettystring(start_lng#1580, Some(Etc/UTC)) AS toprettystring(start_lng)#2186, toprettystring(end_lat#1581, Some(Etc/UTC)) AS toprettystring(end_lat)#2187, toprettystring(end_lng#1582, Some(Etc/UTC)) AS toprettystring(end_lng)#2188, toprettystring(member_casual#1583, Some(Etc/UTC)) AS toprettystring(member_casual)#2189, toprettystring(valid_ride_id#1562, Some(Etc/UTC)) AS toprettystring(valid_ride_id)#2190, toprettystring(valid_time#1563, Some(Etc/UTC)) AS toprettystring(valid_time)#2191, toprettystring(valid_station#1564, Some(Etc/UTC)) AS toprettystring(valid_station)#2192, toprettystring(_source_file#1565, Some(Etc/UTC)) AS toprettystring(_source_file)#2193, toprettystring(_processed_dttm#1566, Some(Etc/UTC)) AS toprettystring(_processed_dttm)#2194, toprettystring(_start_station_ride_num#1568, Some(Etc/UTC)) AS toprettystring(_start_station_ride_num)#2195, toprettystring(year#1569, Some(Etc/UTC)) AS toprettystring(year)#2196, toprettystring(month#1570, Some(Etc/UTC)) AS toprettystring(month)#2197]
Input [21]: [_processed_dttm#1566, _source_file#1565, _start_station_ride_num#1568, end_lat#1581, end_lng#1582, end_station_id#1578, end_station_name#1577, ended_at#1574, member_casual#1583, month#1570, ride_id#1571, rideable_type#1572, start_lat#1579, start_lng#1580, start_station_id#1576, start_station_name#1575, started_at#1573, valid_ride_id#1562, valid_station#1564, valid_time#1563, year#1569]
(14) CollectLimit
Input [21]: [toprettystring(ride_id)#2177, toprettystring(rideable_type)#2178, toprettystring(started_at)#2179, toprettystring(ended_at)#2180, toprettystring(start_station_name)#2181, toprettystring(start_station_id)#2182, toprettystring(end_station_name)#2183, toprettystring(end_station_id)#2184, toprettystring(start_lat)#2185, toprettystring(start_lng)#2186, toprettystring(end_lat)#2187, toprettystring(end_lng)#2188, toprettystring(member_casual)#2189, toprettystring(valid_ride_id)#2190, toprettystring(valid_time)#2191, toprettystring(valid_station)#2192, toprettystring(_source_file)#2193, toprettystring(_processed_dttm)#2194, toprettystring(_start_station_ride_num)#2195, toprettystring(year)#2196, toprettystring(month)#2197]
Arguments: 3
(15) AdaptiveSparkPlan
Output [21]: [toprettystring(ride_id)#2177, toprettystring(rideable_type)#2178, toprettystring(started_at)#2179, toprettystring(ended_at)#2180, toprettystring(start_station_name)#2181, toprettystring(start_station_id)#2182, toprettystring(end_station_name)#2183, toprettystring(end_station_id)#2184, toprettystring(start_lat)#2185, toprettystring(start_lng)#2186, toprettystring(end_lat)#2187, toprettystring(end_lng)#2188, toprettystring(member_casual)#2189, toprettystring(valid_ride_id)#2190, toprettystring(valid_time)#2191, toprettystring(valid_station)#2192, toprettystring(_source_file)#2193, toprettystring(_processed_dttm)#2194, toprettystring(_start_station_ride_num)#2195, toprettystring(year)#2196, toprettystring(month)#2197]
Arguments: isFinalPlan=true