Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporal embedding #22

Closed
lillythomas opened this issue Nov 3, 2023 · 8 comments
Closed

Temporal embedding #22

lillythomas opened this issue Nov 3, 2023 · 8 comments
Assignees
Milestone

Comments

@lillythomas
Copy link
Contributor

Temporal

For v0, we've decided to structure the inputs as mono-temporal (e.g. time step per data cube = 1). To combine inputs, we'll seek to match S1 and S2 captures within +/- 3 days, but this may take some experimentation. To capture temporal semantics, we'll embed the timestamp.

@weiji14
Copy link
Contributor

weiji14 commented Nov 8, 2023

The Sentinel-1 and Sentinel-2 image pair will have different timestamps, so I suppose we'll encode the median timestamp for the temporal embedding? Or do we encode two timestamps (one for each satellite sensor)?

@yellowcap
Copy link
Member

We were also discussing if we want to use absolute time or relative time within a year. The absolute time would help the model understand the weather for each year or season as prior knowledge. A few considerations:

  • It is not clear what will happen to this embedding moving forward in time, when the model gets timestamps from a future that it has not been trained with.
  • We can test a cyclic embedding, similar to the spherical harmonics in the spatial dimension
  • Compare single number timestamp vs year/month/day pattern

@yellowcap yellowcap mentioned this issue Nov 13, 2023
@weiji14 weiji14 added this to the v0 Release milestone Nov 14, 2023
@weiji14
Copy link
Contributor

weiji14 commented Nov 21, 2023

@srmsoumya is starting some work on this at #47. We discussed a little bit yesterday about fixed vs learnable embeddings, but I think we may have confused the terminology a bit. According to https://stats.stackexchange.com/questions/470804/what-is-the-difference-between-position-embedding-vs-positional-encoding-in-bert:

  • Positional encoding are fixed. E.g. we could have a static function that encodes datetime such that January is close to December.
  • Positional embedding are learnable. These are representations that have been learned from the input data.

So, just to be clear, do we want to use a temporal positional encoding that is fixed, and/or a temporal embedding that is learned?

@danhammer
Copy link
Collaborator

danhammer commented Dec 1, 2023

How is time handled in the first release of the embeddings, mentioned in Clay-foundation/office issue Scope 1 for webapp #51? Are the embeddings generated for a mosaic for a specific time range?

If yes, I'd be interested in exploring the easiest integration of time at first -- just generating embeddings for the same area at two different time periods (say, 2021 and 2022). We'd just append the two and run the vector search over the appended embedding, length 1,536 = 768*2 for now. I know this sounds overly simple, but it's a pretty decent way to find recent, illegal mining activity, for example. And we wouldn't have to encode time yet within the model. That is, we can start to build out the UI/UX interactions based on time in parallel with a more robust examination of time.

@brunosan
Copy link
Member

brunosan commented Dec 1, 2023

This Issue was unassigned, so kicking to @yellowcap to delegate if needed.

If I understand correctly

The cosine similarity of the given file is very self-similar, so the image appears semantically flat, as I would expect. (min of cosine similarity is .999
image

@yellowcap
Copy link
Member

generating embeddings for the same area at two different time periods (say, 2021 and 2022)

That is definitively feasible. The current run of embeddings is on the training data, in which we only have one date per location. But as discussed previously, if we agree on an AOI and the dates, we can generate the imagery for these and run inference to generate the embeddings.

@brunosan
Copy link
Member

brunosan commented Dec 7, 2023

Is there a low-effort lift where we can add ~3 or so timestamps per location?
This way the model learns that each location can change semantics to some degree (crop stage, floods, small clouds, ...).

@yellowcap
Copy link
Member

Current architecture recieves date, will increase date diversity (multiple dates for same location ) in next pipeline run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants