== Physical Plan ==
AdaptiveSparkPlan (15)
+- == Final Plan ==
CollectLimit (12)
+- * Project (11)
+- TableCacheQueryStage (10), Statistics(sizeInBytes=12.5 GiB, rowCount=4.84E+7)
+- InMemoryTableScan (1)
+- InMemoryRelation (2)
+- AdaptiveSparkPlan (9)
+- Project (8)
+- Window (7)
+- Sort (6)
+- Exchange (5)
+- Project (4)
+- Scan csv (3)
+- == Initial Plan ==
CollectLimit (14)
+- Project (13)
+- InMemoryTableScan (1)
+- InMemoryRelation (2)
+- AdaptiveSparkPlan (9)
+- Project (8)
+- Window (7)
+- Sort (6)
+- Exchange (5)
+- Project (4)
+- Scan csv (3)
(1) InMemoryTableScan
Output [21]: [_processed_dttm#10645, _source_file#10644, _start_station_ride_num#10647, end_lat#10660, end_lng#10661, end_station_id#10657, end_station_name#10656, ended_at#10653, member_casual#10662, month#10649, ride_id#10650, rideable_type#10651, start_lat#10658, start_lng#10659, start_station_id#10655, start_station_name#10654, started_at#10652, valid_ride_id#10641, valid_station#10643, valid_time#10642, year#10648]
Arguments: [_processed_dttm#10645, _source_file#10644, _start_station_ride_num#10647, end_lat#10660, end_lng#10661, end_station_id#10657, end_station_name#10656, ended_at#10653, member_casual#10662, month#10649, ride_id#10650, rideable_type#10651, start_lat#10658, start_lng#10659, start_station_id#10655, start_station_name#10654, started_at#10652, valid_ride_id#10641, valid_station#10643, valid_time#10642, year#10648]
(2) InMemoryRelation
Arguments: [ride_id#10650, rideable_type#10651, started_at#10652, ended_at#10653, start_station_name#10654, start_station_id#10655, end_station_name#10656, end_station_id#10657, start_lat#10658, start_lng#10659, end_lat#10660, end_lng#10661, member_casual#10662, valid_ride_id#10641, valid_time#10642, valid_station#10643, _source_file#10644, _processed_dttm#10645, _start_station_ride_num#10647, year#10648, month#10649], CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer@43d79ed6,StorageLevel(disk, memory, deserialized, 1 replicas),AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(3) Project [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10655, end_station_name#10536, end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, valid_ride_id#10641, valid_time#10642, valid_station#10643, _source_file#10644, 2026-04-08 09:56:30.312891 AS _processed_dttm#10645, _start_station_ride_num#10647, year#10648, month#10649]
+- Window [row_number() windowspecdefinition(start_station_id#10535, started_at#10532 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#10647], [start_station_id#10535], [started_at#10532 ASC NULLS FIRST]
+- *(2) Sort [start_station_id#10535 ASC NULLS FIRST, started_at#10532 ASC NULLS FIRST], false, 0
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- Exchange hashpartitioning(start_station_id#10535, 200), ENSURE_REQUIREMENTS, [plan_id=2133]
+- *(1) Project [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, cast(start_station_id#10535 as double) AS start_station_id#10655, end_station_name#10536, cast(end_station_id#10537 as double) AS end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, isnotnull(ride_id#10530) AS valid_ride_id#10641, (ended_at#10533 > started_at#10532) AS valid_time#10642, (((isnotnull(end_station_id#10537) AND isnotnull(start_station_id#10535)) AND NOT (end_station_id#10537 = start_station_id#10535)) <=> true) AS valid_station#10643, input_file_name() AS _source_file#10644, year(cast(started_at#10532 as date)) AS year#10648, month(cast(started_at#10532 as date)) AS month#10649, start_station_id#10535, started_at#10532]
+- FileScan csv [ride_id#10530,rideable_type#10531,started_at#10532,ended_at#10533,start_station_name#10534,start_station_id#10535,end_station_name#10536,end_station_id#10537,start_lat#10538,start_lng#10539,end_lat#10540,end_lng#10541,member_casual#10542] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(55 paths)[s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibi..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_...
+- == Initial Plan ==
Project [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10655, end_station_name#10536, end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, valid_ride_id#10641, valid_time#10642, valid_station#10643, _source_file#10644, 2026-04-08 09:56:30.312891 AS _processed_dttm#10645, _start_station_ride_num#10647, year#10648, month#10649]
+- Window [row_number() windowspecdefinition(start_station_id#10535, started_at#10532 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#10647], [start_station_id#10535], [started_at#10532 ASC NULLS FIRST]
+- Sort [start_station_id#10535 ASC NULLS FIRST, started_at#10532 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(start_station_id#10535, 200), ENSURE_REQUIREMENTS, [plan_id=2095]
+- Project [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, cast(start_station_id#10535 as double) AS start_station_id#10655, end_station_name#10536, cast(end_station_id#10537 as double) AS end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, isnotnull(ride_id#10530) AS valid_ride_id#10641, (ended_at#10533 > started_at#10532) AS valid_time#10642, (((isnotnull(end_station_id#10537) AND isnotnull(start_station_id#10535)) AND NOT (end_station_id#10537 = start_station_id#10535)) <=> true) AS valid_station#10643, input_file_name() AS _source_file#10644, year(cast(started_at#10532 as date)) AS year#10648, month(cast(started_at#10532 as date)) AS month#10649, start_station_id#10535, started_at#10532]
+- FileScan csv [ride_id#10530,rideable_type#10531,started_at#10532,ended_at#10533,start_station_name#10534,start_station_id#10535,end_station_name#10536,end_station_id#10537,start_lat#10538,start_lng#10539,end_lat#10540,end_lng#10541,member_casual#10542] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(55 paths)[s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibi..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_...
,None)
(3) Scan csv
Output [13]: [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10535, end_station_name#10536, end_station_id#10537, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542]
Batched: false
Location: InMemoryFileIndex [s3a://rzvde-g8-kirsanov-dmitry/raw/citibike_data/202502/202502-citibike-tripdata-part00.csv, ... 54 entries]
ReadSchema: struct<ride_id:string,rideable_type:string,started_at:timestamp,ended_at:timestamp,start_station_name:string,start_station_id:string,end_station_name:string,end_station_id:string,start_lat:double,start_lng:double,end_lat:double,end_lng:double,member_casual:string>
(4) Project
Output [21]: [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, cast(start_station_id#10535 as double) AS start_station_id#10655, end_station_name#10536, cast(end_station_id#10537 as double) AS end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, isnotnull(ride_id#10530) AS valid_ride_id#10641, (ended_at#10533 > started_at#10532) AS valid_time#10642, (((isnotnull(end_station_id#10537) AND isnotnull(start_station_id#10535)) AND NOT (end_station_id#10537 = start_station_id#10535)) <=> true) AS valid_station#10643, input_file_name() AS _source_file#10644, year(cast(started_at#10532 as date)) AS year#10648, month(cast(started_at#10532 as date)) AS month#10649, start_station_id#10535, started_at#10532]
Input [13]: [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10535, end_station_name#10536, end_station_id#10537, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542]
(5) Exchange
Input [21]: [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10655, end_station_name#10536, end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, valid_ride_id#10641, valid_time#10642, valid_station#10643, _source_file#10644, year#10648, month#10649, start_station_id#10535, started_at#10532]
Arguments: hashpartitioning(start_station_id#10535, 200), ENSURE_REQUIREMENTS, [plan_id=2200]
(6) Sort
Input [21]: [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10655, end_station_name#10536, end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, valid_ride_id#10641, valid_time#10642, valid_station#10643, _source_file#10644, year#10648, month#10649, start_station_id#10535, started_at#10532]
Arguments: [start_station_id#10535 ASC NULLS FIRST, started_at#10532 ASC NULLS FIRST], false, 0
(7) Window
Input [21]: [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10655, end_station_name#10536, end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, valid_ride_id#10641, valid_time#10642, valid_station#10643, _source_file#10644, year#10648, month#10649, start_station_id#10535, started_at#10532]
Arguments: [row_number() windowspecdefinition(start_station_id#10535, started_at#10532 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _start_station_ride_num#10647], [start_station_id#10535], [started_at#10532 ASC NULLS FIRST]
(8) Project
Output [21]: [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10655, end_station_name#10536, end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, valid_ride_id#10641, valid_time#10642, valid_station#10643, _source_file#10644, 2026-04-08 09:56:30.312891 AS _processed_dttm#10645, _start_station_ride_num#10647, year#10648, month#10649]
Input [22]: [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10655, end_station_name#10536, end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, valid_ride_id#10641, valid_time#10642, valid_station#10643, _source_file#10644, year#10648, month#10649, start_station_id#10535, started_at#10532, _start_station_ride_num#10647]
(9) AdaptiveSparkPlan
Output [21]: [ride_id#10530, rideable_type#10531, started_at#10532, ended_at#10533, start_station_name#10534, start_station_id#10655, end_station_name#10536, end_station_id#10657, start_lat#10538, start_lng#10539, end_lat#10540, end_lng#10541, member_casual#10542, valid_ride_id#10641, valid_time#10642, valid_station#10643, _source_file#10644, _processed_dttm#10645, _start_station_ride_num#10647, year#10648, month#10649]
Arguments: isFinalPlan=false
(10) TableCacheQueryStage
Output [21]: [_processed_dttm#10645, _source_file#10644, _start_station_ride_num#10647, end_lat#10660, end_lng#10661, end_station_id#10657, end_station_name#10656, ended_at#10653, member_casual#10662, month#10649, ride_id#10650, rideable_type#10651, start_lat#10658, start_lng#10659, start_station_id#10655, start_station_name#10654, started_at#10652, valid_ride_id#10641, valid_station#10643, valid_time#10642, year#10648]
Arguments: 0
(11) Project [codegen id : 1]
Output [21]: [toprettystring(ride_id#10650, Some(Etc/UTC)) AS toprettystring(ride_id)#11374, toprettystring(rideable_type#10651, Some(Etc/UTC)) AS toprettystring(rideable_type)#11375, toprettystring(started_at#10652, Some(Etc/UTC)) AS toprettystring(started_at)#11376, toprettystring(ended_at#10653, Some(Etc/UTC)) AS toprettystring(ended_at)#11377, toprettystring(start_station_name#10654, Some(Etc/UTC)) AS toprettystring(start_station_name)#11378, toprettystring(start_station_id#10655, Some(Etc/UTC)) AS toprettystring(start_station_id)#11379, toprettystring(end_station_name#10656, Some(Etc/UTC)) AS toprettystring(end_station_name)#11380, toprettystring(end_station_id#10657, Some(Etc/UTC)) AS toprettystring(end_station_id)#11381, toprettystring(start_lat#10658, Some(Etc/UTC)) AS toprettystring(start_lat)#11382, toprettystring(start_lng#10659, Some(Etc/UTC)) AS toprettystring(start_lng)#11383, toprettystring(end_lat#10660, Some(Etc/UTC)) AS toprettystring(end_lat)#11384, toprettystring(end_lng#10661, Some(Etc/UTC)) AS toprettystring(end_lng)#11385, toprettystring(member_casual#10662, Some(Etc/UTC)) AS toprettystring(member_casual)#11386, toprettystring(valid_ride_id#10641, Some(Etc/UTC)) AS toprettystring(valid_ride_id)#11387, toprettystring(valid_time#10642, Some(Etc/UTC)) AS toprettystring(valid_time)#11388, toprettystring(valid_station#10643, Some(Etc/UTC)) AS toprettystring(valid_station)#11389, toprettystring(_source_file#10644, Some(Etc/UTC)) AS toprettystring(_source_file)#11390, toprettystring(_processed_dttm#10645, Some(Etc/UTC)) AS toprettystring(_processed_dttm)#11391, toprettystring(_start_station_ride_num#10647, Some(Etc/UTC)) AS toprettystring(_start_station_ride_num)#11392, toprettystring(year#10648, Some(Etc/UTC)) AS toprettystring(year)#11393, toprettystring(month#10649, Some(Etc/UTC)) AS toprettystring(month)#11394]
Input [21]: [_processed_dttm#10645, _source_file#10644, _start_station_ride_num#10647, end_lat#10660, end_lng#10661, end_station_id#10657, end_station_name#10656, ended_at#10653, member_casual#10662, month#10649, ride_id#10650, rideable_type#10651, start_lat#10658, start_lng#10659, start_station_id#10655, start_station_name#10654, started_at#10652, valid_ride_id#10641, valid_station#10643, valid_time#10642, year#10648]
(12) CollectLimit
Input [21]: [toprettystring(ride_id)#11374, toprettystring(rideable_type)#11375, toprettystring(started_at)#11376, toprettystring(ended_at)#11377, toprettystring(start_station_name)#11378, toprettystring(start_station_id)#11379, toprettystring(end_station_name)#11380, toprettystring(end_station_id)#11381, toprettystring(start_lat)#11382, toprettystring(start_lng)#11383, toprettystring(end_lat)#11384, toprettystring(end_lng)#11385, toprettystring(member_casual)#11386, toprettystring(valid_ride_id)#11387, toprettystring(valid_time)#11388, toprettystring(valid_station)#11389, toprettystring(_source_file)#11390, toprettystring(_processed_dttm)#11391, toprettystring(_start_station_ride_num)#11392, toprettystring(year)#11393, toprettystring(month)#11394]
Arguments: 3
(13) Project
Output [21]: [toprettystring(ride_id#10650, Some(Etc/UTC)) AS toprettystring(ride_id)#11374, toprettystring(rideable_type#10651, Some(Etc/UTC)) AS toprettystring(rideable_type)#11375, toprettystring(started_at#10652, Some(Etc/UTC)) AS toprettystring(started_at)#11376, toprettystring(ended_at#10653, Some(Etc/UTC)) AS toprettystring(ended_at)#11377, toprettystring(start_station_name#10654, Some(Etc/UTC)) AS toprettystring(start_station_name)#11378, toprettystring(start_station_id#10655, Some(Etc/UTC)) AS toprettystring(start_station_id)#11379, toprettystring(end_station_name#10656, Some(Etc/UTC)) AS toprettystring(end_station_name)#11380, toprettystring(end_station_id#10657, Some(Etc/UTC)) AS toprettystring(end_station_id)#11381, toprettystring(start_lat#10658, Some(Etc/UTC)) AS toprettystring(start_lat)#11382, toprettystring(start_lng#10659, Some(Etc/UTC)) AS toprettystring(start_lng)#11383, toprettystring(end_lat#10660, Some(Etc/UTC)) AS toprettystring(end_lat)#11384, toprettystring(end_lng#10661, Some(Etc/UTC)) AS toprettystring(end_lng)#11385, toprettystring(member_casual#10662, Some(Etc/UTC)) AS toprettystring(member_casual)#11386, toprettystring(valid_ride_id#10641, Some(Etc/UTC)) AS toprettystring(valid_ride_id)#11387, toprettystring(valid_time#10642, Some(Etc/UTC)) AS toprettystring(valid_time)#11388, toprettystring(valid_station#10643, Some(Etc/UTC)) AS toprettystring(valid_station)#11389, toprettystring(_source_file#10644, Some(Etc/UTC)) AS toprettystring(_source_file)#11390, toprettystring(_processed_dttm#10645, Some(Etc/UTC)) AS toprettystring(_processed_dttm)#11391, toprettystring(_start_station_ride_num#10647, Some(Etc/UTC)) AS toprettystring(_start_station_ride_num)#11392, toprettystring(year#10648, Some(Etc/UTC)) AS toprettystring(year)#11393, toprettystring(month#10649, Some(Etc/UTC)) AS toprettystring(month)#11394]
Input [21]: [_processed_dttm#10645, _source_file#10644, _start_station_ride_num#10647, end_lat#10660, end_lng#10661, end_station_id#10657, end_station_name#10656, ended_at#10653, member_casual#10662, month#10649, ride_id#10650, rideable_type#10651, start_lat#10658, start_lng#10659, start_station_id#10655, start_station_name#10654, started_at#10652, valid_ride_id#10641, valid_station#10643, valid_time#10642, year#10648]
(14) CollectLimit
Input [21]: [toprettystring(ride_id)#11374, toprettystring(rideable_type)#11375, toprettystring(started_at)#11376, toprettystring(ended_at)#11377, toprettystring(start_station_name)#11378, toprettystring(start_station_id)#11379, toprettystring(end_station_name)#11380, toprettystring(end_station_id)#11381, toprettystring(start_lat)#11382, toprettystring(start_lng)#11383, toprettystring(end_lat)#11384, toprettystring(end_lng)#11385, toprettystring(member_casual)#11386, toprettystring(valid_ride_id)#11387, toprettystring(valid_time)#11388, toprettystring(valid_station)#11389, toprettystring(_source_file)#11390, toprettystring(_processed_dttm)#11391, toprettystring(_start_station_ride_num)#11392, toprettystring(year)#11393, toprettystring(month)#11394]
Arguments: 3
(15) AdaptiveSparkPlan
Output [21]: [toprettystring(ride_id)#11374, toprettystring(rideable_type)#11375, toprettystring(started_at)#11376, toprettystring(ended_at)#11377, toprettystring(start_station_name)#11378, toprettystring(start_station_id)#11379, toprettystring(end_station_name)#11380, toprettystring(end_station_id)#11381, toprettystring(start_lat)#11382, toprettystring(start_lng)#11383, toprettystring(end_lat)#11384, toprettystring(end_lng)#11385, toprettystring(member_casual)#11386, toprettystring(valid_ride_id)#11387, toprettystring(valid_time)#11388, toprettystring(valid_station)#11389, toprettystring(_source_file)#11390, toprettystring(_processed_dttm)#11391, toprettystring(_start_station_ride_num)#11392, toprettystring(year)#11393, toprettystring(month)#11394]
Arguments: isFinalPlan=true