Bigquery to get Speed based on time/location data (rows above/below current row) Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!How to get the current time in PythonGet current time in milliseconds in Python?BigQuery Data Location Settingget list of pandas dataframe columns based on data typeCondition based on nth row above or below current row - PythonIs there a way to resample time series data in BigQuery?BigQuery speed - time from __SHUFFLE1_REPARTITION0BigQuery: SELECT in WHERE-clause with filter based on a value in the current rowbigquery for time series dataPandas time operations
Why does my GNOME settings mention "Moto C Plus"?
Lights are flickering on and off after accidentally bumping into light switch
Output the slug and name of a CPT single post taxonomy term
Is Vivien of the Wilds + Wilderness Reclamation a competitive combo?
Compiling and throwing simple dynamic exceptions at runtime for JVM
Help Recreating a Table
Can gravitational waves pass through a black hole?
Proving inequality for positive definite matrix
Does the Pact of the Blade warlock feature allow me to customize the properties of the pact weapon I create?
How to break 信じようとしていただけかも知れない into separate parts?
Coin Game with infinite paradox
Will I be more secure with my own router behind my ISP's router?
A German immigrant ancestor has a "Registration Affidavit of Alien Enemy" on file. What does that mean exactly?
Assertions In A Mock Callout Test
How to produce a PS1 prompt in bash or ksh93 similar to tcsh
How can I introduce the names of fantasy creatures to the reader?
Has a Nobel Peace laureate ever been accused of war crimes?
Recursive calls to a function - why is the address of the parameter passed to it lowering with each call?
Suing a Police Officer Instead of the Police Department
Why these surprising proportionalities of integrals involving odd zeta values?
What *exactly* is electrical current, voltage, and resistance?
How to ask rejected full-time candidates to apply to teach individual courses?
How to keep bees out of canned beverages?
Can I ask an author to send me his ebook?
Bigquery to get Speed based on time/location data (rows above/below current row)
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!How to get the current time in PythonGet current time in milliseconds in Python?BigQuery Data Location Settingget list of pandas dataframe columns based on data typeCondition based on nth row above or below current row - PythonIs there a way to resample time series data in BigQuery?BigQuery speed - time from __SHUFFLE1_REPARTITION0BigQuery: SELECT in WHERE-clause with filter based on a value in the current rowbigquery for time series dataPandas time operations
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a table in Bigquery with tracking data for Nascar drivers (dummy data for a project I am working on). The x and y coordinates are taken 10 times a second. The capture_frame
signifies the current frame, and each sequential capture_frame
should be 100 milliseconds apart because the data is taken every 100 ms.
I want to calculate each driver's speed per lap. I know how to do this in pandas but I think this is possible in bigquery. To calculate speed, I am looking at 2 rows before capture_frame
and 2 rows after and then dividing by the difference in epoch time, which should be 400 milliseconds.
Here is an example of a few capture frames for 1 race for one driver for the first lap. There are a few hundred capture frames per lap and then 20 drivers mixed in as well, but it is easier to understand if we look at just one driver/race/lap.
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| Race | Capture | Lap | Driver | … | X | Y | Epoch_time | Delta_dist | Curr_speed |
| | _frame | | | | | | | | |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 1 | 1 | Logano | …. | 2.1 | 1 | 1552089720 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 2 | 1 | Logano | … | 2.2 | 1.1 | 1552089820 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 3 | 1 | Logano | … | 2.22 | 1.2 | 1552089920 | 2.265921446 | 0.005664804 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 4 | 1 | Logano | .. | 3.22 | 1.5 | 1552090020 | 3.124163888 | 0.00781041 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 5 | 1 | Logano | .. | 4.22 | 1.8 | 1552090120 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 6 | 1 | Logano | .. | 5.22 | 1.9 | 1552090220 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
The delta_dist
for frame 3 is calculated by sqrt((4.22-2.1)^2 + (1.8-1)^2)/1
and the the curr_speed
is that number divided by 400. The first/last 2 distances and speeds of the race will be null because there are no prior x or y coordinates which is okay as there isn't really any speed when you are .1 second from starting or stopping.
In pandas I would do (this is not great code as I am just bringing each driver and race in on its own):
#laps_per_race dictionary with num laps per race
for driver in driver_list:
for race in race_list:
driver_race_query = “SELECT * from nascar_xyz where driver=driver and Race=race”.format(driver=driver, race=race)
df_entire_race = client.query(driver_race_query).to_dataframe()
num_laps = laps_per_race[race]
for lap in num_laps:
#get subset of dataframe just for this lap
df = df_entire_race.loc[df_entire_race['Lap'] == lap]
df.sort_values(‘Epoch_time’, inplace=True)
df[‘prev_x’] = df[‘X’].shift(2)
df[‘next_x’] = df[‘X’].shift(-2)
df[‘prev_y’] = df[‘Y’].shift(2)
df[‘next_y’] = df[‘Y’].shift(-2)
#this is just distance function sqrt((x2-x1)^2 + (y2-y1)^2)
df['delta_dist'] = np.sqrt((df[‘X’].shift(-2) - df[‘X’].shift(2))**2 + (df[‘Y’].shift(-2) - df[‘Y’].shift(2))**2))
#400.0 is the time actual difference
df['Curr_speed'] = df['delta_dist']/400.0
I think in my sql query I either have to do a group by or partition by to because I want to look in each race by driver_id
, then lap (if that level of abstraction makes sense). Maybe for the speed and looking capture_frames ahead I can do something with windowing (https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts) or something called lag which seems like the equivalent of .shift()
in pandas.
python sql pandas google-bigquery geospatial
add a comment |
I have a table in Bigquery with tracking data for Nascar drivers (dummy data for a project I am working on). The x and y coordinates are taken 10 times a second. The capture_frame
signifies the current frame, and each sequential capture_frame
should be 100 milliseconds apart because the data is taken every 100 ms.
I want to calculate each driver's speed per lap. I know how to do this in pandas but I think this is possible in bigquery. To calculate speed, I am looking at 2 rows before capture_frame
and 2 rows after and then dividing by the difference in epoch time, which should be 400 milliseconds.
Here is an example of a few capture frames for 1 race for one driver for the first lap. There are a few hundred capture frames per lap and then 20 drivers mixed in as well, but it is easier to understand if we look at just one driver/race/lap.
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| Race | Capture | Lap | Driver | … | X | Y | Epoch_time | Delta_dist | Curr_speed |
| | _frame | | | | | | | | |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 1 | 1 | Logano | …. | 2.1 | 1 | 1552089720 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 2 | 1 | Logano | … | 2.2 | 1.1 | 1552089820 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 3 | 1 | Logano | … | 2.22 | 1.2 | 1552089920 | 2.265921446 | 0.005664804 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 4 | 1 | Logano | .. | 3.22 | 1.5 | 1552090020 | 3.124163888 | 0.00781041 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 5 | 1 | Logano | .. | 4.22 | 1.8 | 1552090120 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 6 | 1 | Logano | .. | 5.22 | 1.9 | 1552090220 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
The delta_dist
for frame 3 is calculated by sqrt((4.22-2.1)^2 + (1.8-1)^2)/1
and the the curr_speed
is that number divided by 400. The first/last 2 distances and speeds of the race will be null because there are no prior x or y coordinates which is okay as there isn't really any speed when you are .1 second from starting or stopping.
In pandas I would do (this is not great code as I am just bringing each driver and race in on its own):
#laps_per_race dictionary with num laps per race
for driver in driver_list:
for race in race_list:
driver_race_query = “SELECT * from nascar_xyz where driver=driver and Race=race”.format(driver=driver, race=race)
df_entire_race = client.query(driver_race_query).to_dataframe()
num_laps = laps_per_race[race]
for lap in num_laps:
#get subset of dataframe just for this lap
df = df_entire_race.loc[df_entire_race['Lap'] == lap]
df.sort_values(‘Epoch_time’, inplace=True)
df[‘prev_x’] = df[‘X’].shift(2)
df[‘next_x’] = df[‘X’].shift(-2)
df[‘prev_y’] = df[‘Y’].shift(2)
df[‘next_y’] = df[‘Y’].shift(-2)
#this is just distance function sqrt((x2-x1)^2 + (y2-y1)^2)
df['delta_dist'] = np.sqrt((df[‘X’].shift(-2) - df[‘X’].shift(2))**2 + (df[‘Y’].shift(-2) - df[‘Y’].shift(2))**2))
#400.0 is the time actual difference
df['Curr_speed'] = df['delta_dist']/400.0
I think in my sql query I either have to do a group by or partition by to because I want to look in each race by driver_id
, then lap (if that level of abstraction makes sense). Maybe for the speed and looking capture_frames ahead I can do something with windowing (https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts) or something called lag which seems like the equivalent of .shift()
in pandas.
python sql pandas google-bigquery geospatial
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
add a comment |
I have a table in Bigquery with tracking data for Nascar drivers (dummy data for a project I am working on). The x and y coordinates are taken 10 times a second. The capture_frame
signifies the current frame, and each sequential capture_frame
should be 100 milliseconds apart because the data is taken every 100 ms.
I want to calculate each driver's speed per lap. I know how to do this in pandas but I think this is possible in bigquery. To calculate speed, I am looking at 2 rows before capture_frame
and 2 rows after and then dividing by the difference in epoch time, which should be 400 milliseconds.
Here is an example of a few capture frames for 1 race for one driver for the first lap. There are a few hundred capture frames per lap and then 20 drivers mixed in as well, but it is easier to understand if we look at just one driver/race/lap.
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| Race | Capture | Lap | Driver | … | X | Y | Epoch_time | Delta_dist | Curr_speed |
| | _frame | | | | | | | | |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 1 | 1 | Logano | …. | 2.1 | 1 | 1552089720 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 2 | 1 | Logano | … | 2.2 | 1.1 | 1552089820 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 3 | 1 | Logano | … | 2.22 | 1.2 | 1552089920 | 2.265921446 | 0.005664804 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 4 | 1 | Logano | .. | 3.22 | 1.5 | 1552090020 | 3.124163888 | 0.00781041 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 5 | 1 | Logano | .. | 4.22 | 1.8 | 1552090120 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 6 | 1 | Logano | .. | 5.22 | 1.9 | 1552090220 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
The delta_dist
for frame 3 is calculated by sqrt((4.22-2.1)^2 + (1.8-1)^2)/1
and the the curr_speed
is that number divided by 400. The first/last 2 distances and speeds of the race will be null because there are no prior x or y coordinates which is okay as there isn't really any speed when you are .1 second from starting or stopping.
In pandas I would do (this is not great code as I am just bringing each driver and race in on its own):
#laps_per_race dictionary with num laps per race
for driver in driver_list:
for race in race_list:
driver_race_query = “SELECT * from nascar_xyz where driver=driver and Race=race”.format(driver=driver, race=race)
df_entire_race = client.query(driver_race_query).to_dataframe()
num_laps = laps_per_race[race]
for lap in num_laps:
#get subset of dataframe just for this lap
df = df_entire_race.loc[df_entire_race['Lap'] == lap]
df.sort_values(‘Epoch_time’, inplace=True)
df[‘prev_x’] = df[‘X’].shift(2)
df[‘next_x’] = df[‘X’].shift(-2)
df[‘prev_y’] = df[‘Y’].shift(2)
df[‘next_y’] = df[‘Y’].shift(-2)
#this is just distance function sqrt((x2-x1)^2 + (y2-y1)^2)
df['delta_dist'] = np.sqrt((df[‘X’].shift(-2) - df[‘X’].shift(2))**2 + (df[‘Y’].shift(-2) - df[‘Y’].shift(2))**2))
#400.0 is the time actual difference
df['Curr_speed'] = df['delta_dist']/400.0
I think in my sql query I either have to do a group by or partition by to because I want to look in each race by driver_id
, then lap (if that level of abstraction makes sense). Maybe for the speed and looking capture_frames ahead I can do something with windowing (https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts) or something called lag which seems like the equivalent of .shift()
in pandas.
python sql pandas google-bigquery geospatial
I have a table in Bigquery with tracking data for Nascar drivers (dummy data for a project I am working on). The x and y coordinates are taken 10 times a second. The capture_frame
signifies the current frame, and each sequential capture_frame
should be 100 milliseconds apart because the data is taken every 100 ms.
I want to calculate each driver's speed per lap. I know how to do this in pandas but I think this is possible in bigquery. To calculate speed, I am looking at 2 rows before capture_frame
and 2 rows after and then dividing by the difference in epoch time, which should be 400 milliseconds.
Here is an example of a few capture frames for 1 race for one driver for the first lap. There are a few hundred capture frames per lap and then 20 drivers mixed in as well, but it is easier to understand if we look at just one driver/race/lap.
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| Race | Capture | Lap | Driver | … | X | Y | Epoch_time | Delta_dist | Curr_speed |
| | _frame | | | | | | | | |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 1 | 1 | Logano | …. | 2.1 | 1 | 1552089720 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 2 | 1 | Logano | … | 2.2 | 1.1 | 1552089820 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 3 | 1 | Logano | … | 2.22 | 1.2 | 1552089920 | 2.265921446 | 0.005664804 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 4 | 1 | Logano | .. | 3.22 | 1.5 | 1552090020 | 3.124163888 | 0.00781041 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 5 | 1 | Logano | .. | 4.22 | 1.8 | 1552090120 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 6 | 1 | Logano | .. | 5.22 | 1.9 | 1552090220 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
The delta_dist
for frame 3 is calculated by sqrt((4.22-2.1)^2 + (1.8-1)^2)/1
and the the curr_speed
is that number divided by 400. The first/last 2 distances and speeds of the race will be null because there are no prior x or y coordinates which is okay as there isn't really any speed when you are .1 second from starting or stopping.
In pandas I would do (this is not great code as I am just bringing each driver and race in on its own):
#laps_per_race dictionary with num laps per race
for driver in driver_list:
for race in race_list:
driver_race_query = “SELECT * from nascar_xyz where driver=driver and Race=race”.format(driver=driver, race=race)
df_entire_race = client.query(driver_race_query).to_dataframe()
num_laps = laps_per_race[race]
for lap in num_laps:
#get subset of dataframe just for this lap
df = df_entire_race.loc[df_entire_race['Lap'] == lap]
df.sort_values(‘Epoch_time’, inplace=True)
df[‘prev_x’] = df[‘X’].shift(2)
df[‘next_x’] = df[‘X’].shift(-2)
df[‘prev_y’] = df[‘Y’].shift(2)
df[‘next_y’] = df[‘Y’].shift(-2)
#this is just distance function sqrt((x2-x1)^2 + (y2-y1)^2)
df['delta_dist'] = np.sqrt((df[‘X’].shift(-2) - df[‘X’].shift(2))**2 + (df[‘Y’].shift(-2) - df[‘Y’].shift(2))**2))
#400.0 is the time actual difference
df['Curr_speed'] = df['delta_dist']/400.0
I think in my sql query I either have to do a group by or partition by to because I want to look in each race by driver_id
, then lap (if that level of abstraction makes sense). Maybe for the speed and looking capture_frames ahead I can do something with windowing (https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts) or something called lag which seems like the equivalent of .shift()
in pandas.
python sql pandas google-bigquery geospatial
python sql pandas google-bigquery geospatial
edited Mar 9 at 2:58
Pikachu the Purple Wizard
2,12461529
2,12461529
asked Mar 9 at 0:51
H. RoseH. Rose
191
191
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
add a comment |
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
2
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
add a comment |
1 Answer
1
active
oldest
votes
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072921%2fbigquery-to-get-speed-based-on-time-location-data-rows-above-below-current-row%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
add a comment |
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
add a comment |
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
answered Mar 9 at 1:44
Felipe HoffaFelipe Hoffa
23k253125
23k253125
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072921%2fbigquery-to-get-speed-based-on-time-location-data-rows-above-below-current-row%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17