-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathDataScienceOnBlockchainWithR-PartII.Rmd
465 lines (344 loc) · 31.2 KB
/
DataScienceOnBlockchainWithR-PartII.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
---
title: "Data Science on Blockchain with R. Part II: Tracking the NFTs"
author: "By Thomas de Marchin and Milana Filatenkova"
date: "OCTOBER 2021"
output:
html_document:
number_sections: true
---
```{r setup, include = F}
rm(list=ls())
knitr::opts_chunk$set(echo = T, warning = F, message = F, cache = F, out.width = '100%')
```
**Thomas is Senior Data Scientist at Pharmalex. He is passionate about the incredible possibility that blockchain technology offers to make the world a better place. You can contact him on [Linkedin](https://www.linkedin.com/in/tdemarchin/) or [Twitter](https://twitter.com/tdemarchin).**
**Milana is Data Scientist at Pharmalex. She is passionate about the power of analytical tools to discover the truth about the world around us and guide decision making. You can contact her on [Linkedin](https://www.linkedin.com/in/mfilatenkova/).**
**Update (December 2022): All sales price after March 2022 felt to 0, this could be due to a change in the way Opensea works. Note that OpenSea was hacked in March 2022 and they updated their smart contracts.**
![Examples of Weird Whale NFTs. These NFTs (token ids 525, 564, 618, 645, 816, 1109, 1523 and 2968) belong to the creator of the collection Benyamin Ahmed (Benoni) who gave us the permission to show them in this article.](figures/patchwork.png)
A HTML version of this article with high resolution figures and tables is available [here](https://tdemarchin.github.io/DataScienceOnBlockchainWithR-PartII//DataScienceOnBlockchainWithR-PartII.html). The code used to generate it is available on my [Github](https://github.com/tdemarchin/DataScienceOnBlockchainWithR-PartII).
# Introduction
***What is the Blockchain:*** A blockchain is a growing list of records, called blocks, that are linked together using cryptography. It is used for recording transactions, tracking assets, and building trust between participating parties. Primarily known for Bitcoin and cryptocurrencies application, Blockchain is now used in almost all domains, including supply chain, healthcare, logistic, identity management... Some blockchains are public and can be accessed from everyone while some are private. Hundreds of blockchains exist with their own specifications and applications: Bitcoin, Ethereum, Tezos...
***What are NFTs:*** Non-Fungible Tokens are used to represent ownership of unique items. They let us tokenize things like art, collectibles, patents, even real estate. They can only have one official owner at a time and the record of their ownership is secured on the blockchain, no one can modify or copy/paste a new NFT into existence.
***What is R:*** R language is widely used among statisticians and data miners for developing data analysis software.
***What is a smart contract:*** Smart contracts are simply programs stored on a blockchain that run when predetermined conditions are met. They typically are used to automate the execution of an agreement so that all participants can be immediately certain of the outcome, without any intermediary’s involvement or time loss. They can also automate a workflow, triggering the next action when conditions are met. An example of a smart contract use case is the lottery: people buy tickets within a predefined time window and once it expires, a winner is automatically selected and money is transferred to his account, all this without involvement of a third party.
This is the second article on a series of articles on interaction with blockchains using R. Part I focuses on some basic concepts related to blockchain, including how to read the blockchain data. If you haven't read it, I strongly encourage you to do so to get familiar with the tools and terminology we use in this second article: [Part I](https://towardsdatascience.com/data-science-on-blockchain-with-r-afaf09f7578c).
It is not uncommon to hear that cryptocurrencies are popular with Mafia as it is anonymous and confidential. This is only partially true. While we don't know exactly who is behind an address, the transactions made by this address are visible from everyone. Unless you are very careful, it is practically possible to determine who is behind the address by crossing databases. There are now companies specialized in doing only that. This is a breakthrough path towards a more transparent and fair world. Blockchain has the potential to resolve major problems linked to lack of traceability and accountability problems the world is facing. Take for example the cacao culture in Ivory Coast. The Country has lost more than 60\% of its "protected" forests during the last 25 years despite the official engagement of the agribusiness to fight against deforestation. The main reason for the inneficiency of current strategies to protect wild forests is that it is extremely difficult to trace down the origin of the cocoa beans, the existing traceability solutions being easily manipulated. Blockchain could help solve this issue. In the pharmaceutical world, blockchain has also the potential to improve certain aspects of drug production industry. For example, this technology would enhance the transparency into manufacturing process and supply chain, as well as management of clinical data.
The topic covered in this article is how to extract and visualize the blockchain transactions. We will explore some of the tools available on R to do this. To avoid messing up with the mafia, let's focus on a more benign but currently very popular application of blockchain technology: the art NFTs. We will look at more specifically the Weird Whales NFTs. The approach described here is of course extensible to anything stored on the blockchain.
The Weird Whales project is a collection of 3350 whales which have been programmatically generated from an ocean of combinations, each with their unique characteristics and traits: https://weirdwhalesnft.com/. This project was created by a 12-year-old programmer named Benyamin Ahmed who puts on sale on the famous NFT marketplace OpenSea. The 3,350 computer-generated Weird Whales almost instantly sold out based on the heartwarming story and Benyamin made more than 400,000\$ in two months. Whales were initially sold at approximately 60\$ but since then, their price has been multiplied by 100... Read [this](https://www.cnbc.com/2021/08/25/12-year-old-coder-made-6-figures-selling-weird-whales-nfts.html) to learn more about this incredible story.
**A HTML version of this article as well as the code used to generate it is available on my [Github](https://github.com/tdemarchin/DataScienceOnBlockchainWithR-PartII).**
# Data
This section is dedicated to downloading the sales data consisting in which tokens were transferred to which address as well as the sale prices. **This is a very interesting topic but also a bit technical so if you are only interested in the data analysis, you can skip this section and go directly to section 3.**
## Transfers
All smart contracts are different and it is important to understand them to decode the information. Reverse engineering by reading the source code and using block explorers like EtherScan is usually a good start. The Weird Whales are managed by a specific smart contract on the Ethereum blockchain. This contract is stored on a specific address and you can read its code [here](https://etherscan.io/address/0x96ed81c7f4406eff359e27bff6325dc3c9e042bd#code).
To make it easier to extract information from the blockchain, which can be fairly complicated due to how it is stored on the ledger, we can read the events. In Solidity, the programming language used to code smart contracts on Ehtereum, events are dispatched signals the smart contracts can fire. Any app connected to Ethereum network can listen to these events and act accordingly. Here is a list of recent Weird Whales events: https://etherscan.io/address/0x96ed81c7f4406eff359e27bff6325dc3c9e042bd#events
We are mostly interested by a specific type of event: transfer. Every time a transfer of a token takes place, an event gets written on the blockchain with structure: Transfer (index_topic_1 *address from*, index_topic_2 *address to*, index_topic_3 uint256 *tokenId*). As indicated its names, this event records the address from which the token is transferred, the address to which it is transferred and the token ID, which goes from 1 to 3350 (as there were 3350 Weird Whales generated).
We will therefore extract all transfer events related to Weird Whales. For this, we filter on the hash signature of this event (also called Topic 0). By doing a bit of reverse engineering on EtherScan (https://etherscan.io/tx/0xa677cfc3b4084f7a5f2e5db5344720bb2ca2c0fe8f29c26b2324ad8c8d6c2ba3#eventlog), we see that topic 0 for this event is "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef".
Below, we outline a process to create a databases containing trade data of the Weird Whales. We download the data using the EtherScan API, see [Part I](https://towardsdatascience.com/data-science-on-blockchain-with-r-afaf09f7578c) for more details. EtherScan's API is limited to 1000 result per call. That's not enough to analyze the Weird Whale transactions as only the minting (process of creation of the token on the blockchain) generates 3350 transaction (1 transaction per NFT minted). And that's without all the subsequent transfers! That's why we have to use a dirty while loop. Note that if you are ready to pay a bit, there are other blockchain database available without restriction. For example, the Ethereum database is available on Google BigQuery.
```{r}
# First, let's load a few useful packages
library(knitr)
library(tidyverse)
library(httr)
library(jsonlite)
library(plotly)
library(patchwork)
library(cowplot)
library(network)
library(ggraph)
library(networkDynamic)
library(ndtv)
library(tsna)
```
```{r eval = F}
# EtherScan requires a token, have a look at their website. This is my token but please use your own!
EtherScanAPIToken <- "UJP16VCE9D29XFAA86RWADATJ5K4PBSYD9"
dataEventTransferList <- list()
continue <- 1
i <- 0
while(continue == 1){ # we will run trough the earliest blocks mentioning Weird whales to the most recent.
i <- i + 1
print(i)
if(i == 1){fromBlock = 12856383} #first block mentioning Weird Whale contract
# load the transfer events from the Weird Whale contract
resEventTransfer <- GET("https://api.etherscan.io/api",
query = list(module = "logs",
action = "getLogs",
fromBlock = fromBlock,
toBlock = "latest",
address = "0x96ed81c7f4406eff359e27bff6325dc3c9e042bd", # address of the Weird Whale contract
topic0 = "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef", # hash of the transfer event
apikey = EtherScanAPIToken))
dataEventTransferList[[i]] <- fromJSON(rawToChar(resEventTransfer$content), flatten = T)$result %>%
select(-gasPrice, -gasUsed, -logIndex) # reformat the data in a dataframe
if(i > 1){
if(all_equal(dataEventTransferList[[i]], dataEventTransferList[[i-1]]) == T){continue <- 0}
} # at some point, we reached the latest transactions and we can stop
fromBlock <- max(as.numeric(dataEventTransferList[[i]]$blockNumber)) # increase the block to start looking at for the next iteration
}
dataEventTransfer <- bind_rows(dataEventTransferList) %>% # coerce the list to dataframe
distinct() # eliminate potential duplicated rows
# data needs to be reshaped
dataEventTransfer <- dataEventTransfer %>%
rename(contractAddress = address) %>%
mutate(dateTime = as.POSIXct(as.numeric(timeStamp), origin = "1970-01-01")) %>% # convert the date in a human readable format
mutate(topics = purrr::map(topics, setNames, c("eventHash","fromAddress","toAddress","tokenId"))) %>% # it is important to set the names otherwise unnest_wider will print many warning messages.
unnest_wider(topics) %>% # reshape the topic column (list) to get a column for each topic.
mutate(tokenId = as.character(as.numeric(tokenId)), # convert Hexadecimal to numeric
blockNumber = as.numeric(blockNumber),
fromAddress = paste0("0x", str_sub(fromAddress,-40,-1)), # reshape the address format
toAddress = paste0("0x", str_sub(toAddress,-40,-1))) %>%
mutate(tokenId = factor(tokenId, levels = as.character(sort(unique(as.numeric(tokenId)))))) %>%
select(-data, -timeStamp, -transactionIndex)
saveRDS(dataEventTransfer, "data/dataEventTransfer.rds")
```
This is how the Transfer dataset looks like:
```{r}
dataEventTransfer <- readRDS("data/dataEventTransfer.rds")
glimpse(dataEventTransfer)
```
## Sales price
While both the transfer and the sales can be managed by the same contract, it is done differently On OpenSea. The sale is managed by the main OpenSea contract and if it is approved (asked price reached), that first contract calls a second NFT contract, here Weird Whales, which then triggers the transfer. If we want to know the price at which the NFTs were sold (in addition to the transfer discussed above), we need to extract data from the main contract (https://etherscan.io/address/0x7be8076f4ea4a4ad08075c2508e481d6c946d12b). The sales are recorded by an event called *OrderMatch*.
Note that this loop can take a while to run as we download all the sales prices for all the NFT sales on OpenSea, not only the Weird Whales. Given that at the time of writing, there were on average 30 transactions per minute on OpenSea, it can take a while to download... This code block is very similar to the one above so if you are unsure about what a line does exactly, read the code comments above.
```{r eval = F}
# load the OrderMatch events from the Weird Whale contract
dataEventOrderMatchList <- list()
continue <- 1
i <- 0
while(continue == 1){
i <- i + 1
print(i)
if(i == 1){fromBlock = 12856383}
resEventOrderMatch <- GET("https://api.etherscan.io/api",
query = list(module = "logs",
action = "getLogs",
fromBlock = fromBlock,
toBlock = "latest",
address = "0x7be8076f4ea4a4ad08075c2508e481d6c946d12b", # address of the Open Sea contract
topic0 = "0xc4109843e0b7d514e4c093114b863f8e7d8d9a458c372cd51bfe526b588006c9", # hash of the OrderMatch event
apikey = EtherScanAPIToken))
dataEventOrderMatchList[[i]] <- fromJSON(rawToChar(resEventOrderMatch$content), flatten = T)$result %>%
select(-gasPrice, -gasUsed, -logIndex)
if(i > 1){
if(all_equal(dataEventOrderMatchList[[i]], dataEventOrderMatchList[[i-1]]) == T){continue <- 0}
}
fromBlock <- max(as.numeric(dataEventOrderMatchList[[i]]$blockNumber))
}
dataEventOrderMatch <- bind_rows(dataEventOrderMatchList) %>%
distinct()
dataEventOrderMatch <- dataEventOrderMatch %>%
mutate(topics = purrr::map(topics, setNames, c("eventHash","fromAddress","toAddress","metadata"))) %>%
unnest_wider(topics)
```
If we look at the orderMatch event structure, we see that the price is encoded in uint256 type in the data field. It is preceded by two others fields, buyHash and sellHash, both in bytes32 types. The uint256 and bytes32 types are both 32 bytes long, which makes 64 Hexadecimal characters. We are not interested by the buyHash and sellHash data but only by the price sale. We thus have to retrieve the last 64 characters and convert them into decimal to get the sale price.
![Figure 1: Structure of an orderMatch event](figures/orderMatchExample.png)
```{r eval = F}
dataEventOrderMatch <- dataEventOrderMatch %>%
mutate(priceETH = str_sub(data, start = -64),
priceETH = as.numeric(paste0("0x", priceETH)),
priceETH = priceETH / 10^18) %>% # this is expressed in Wei, the smallest denomination of ether. 1 ether = 1,000,000,000,000,000,000 Wei (10\^18).
select(priceETH, transactionHash)
```
## Combine the two events
Let's now merge the two dataset by the transactionHash of the Weird Whales transfers.
```{r, eval = F}
# Merge the transfer and orderMatch events
dataWeirdWhales <- left_join(dataEventTransfer, dataEventOrderMatch, by = "transactionHash")
# Minting does not involve any real "sale" but still cost money. As there is no orderMatch event for minting, the priceETH is NA and we will manually update it to the minting cost. Weird Whales can also be transfered for free (or the cryptocurrency transaction is made outside openSea) and in this case, there is no sale price either. Similarly, we will manually set the price of these transfers to 0 ETH.
dataWeirdWhales <- dataWeirdWhales %>%
mutate(priceETH = case_when(
fromAddress == "0x0000000000000000000000000000000000000000" ~ 0.025,
is.na(priceETH) ~ 0,
TRUE ~ priceETH
)
)
saveRDS(dataWeirdWhales, "data/dataWeirdWhales.rds")
```
## Convert the ETH price in USD
As we are working on the Ethereum blockchain, the transaction price are given in ETH. Ethereum / USD rate is highly volatile. If we try to convert ETH to USD, we cannot just apply a multiplicative factor. We thus have to download the historical ETH USD price. This time around, we will download the data from the CoinGecko instead of EtherScan (you need a pro account for that). CoinGecko exchange provides free access to ETH-USD convertion.
We will use a spline function approximation to smooth out and interpolate the conversion rate. This is due to the timestamp of the transaction event, which is in seconds, while the resolution of the historical price dataset is much lower, they are recorded on average every 30 minute. We thus have to interpolate the historical prices to cover the transaction events.
```{r fig.cap = "Figure 2: Historical ETH to USD rate. Red line: spline."}
dataWeirdWhales <- readRDS("data/dataWeirdWhales.rds")
# download historical price, see https://www.coingecko.com/en/api/documentation for more information
resHistoricalPrice <- GET("https://api.coingecko.com/api/v3/coins/ethereum/market_chart",
query = list(vs_currency = "usd",
days = "max",
interval = "daily"))
dataHistoricalPrice <- fromJSON(rawToChar(resHistoricalPrice$content), flatten = T)$prices %>%
as_tibble() %>%
select(-V1) %>%
rename(ETHtoUSDRate = V2) %>% # we need only the price per date
mutate(date = seq(as.POSIXct(Sys.Date()) - (n()-1)*86400, as.POSIXct(Sys.Date()), "days")) %>% # last row is today's row and we have one row per day
filter(date >= (min(dataWeirdWhales$dateTime)-86000)) # we don't need data prior to Weird Whales
# apply the interpolation spline on the historical conversion rates
historicalInterpolationETHUSD <- approx(x = dataHistoricalPrice$date,
y = dataHistoricalPrice$ETHtoUSDRate,
xout = seq(min(dataHistoricalPrice$date),
max(dataHistoricalPrice$date),
length.out = 1000)) %>%
bind_rows()
# plot the historical conversion rates together with the spline
pETHUSDConversionRate <- ggplot(aes(x = date, y = ETHtoUSDRate), data = dataHistoricalPrice) +
geom_point() +
scale_x_datetime(date_breaks = "3 months", date_labels = "%F") +
geom_line(aes(x = x, y = y), col = "red", data = historicalInterpolationETHUSD)
ggplotly(pETHUSDConversionRate)
```
Let us now convert the ETH in USD for our Weird Whales transactions using the interpolated value of the price at the time of the transaction.
```{r, fig.cap = "Figure 2: Historical ETH to USD rate. Red line: spline."}
# use the interpolated conversion rate to convert ETH to USD
dataWeirdWhales <- dataWeirdWhales %>%
mutate(ETHtoUSDRate = bind_rows(approx(x = dataHistoricalPrice$date,
y = dataHistoricalPrice$ETHtoUSDRate,
xout = dateTime))$y,
priceUSD = round(priceETH * ETHtoUSDRate,3)
)
saveRDS(dataWeirdWhales, "data/dataWeirdWhalesFinal.rds")
```
## Final dataset
Note that if you do not manage to download all the data from EtherScan, you can always load the dataset available on the github.
This is how the final dataset looks like:
```{r}
dataWeirdWhales <- readRDS("data/dataWeirdWhalesFinal.rds")
glimpse(dataWeirdWhales)
```
# Analysis
## Descriptive statistics
Below we make a summary of the dataset by calculating a few descriptive statistics :
```{r}
# summary statistics
dataWeirdWhales %>%
summarise(`Number of transactions` = n(),
`Unique tokens` = length(unique(tokenId)),
`Unique senders` = length(unique(fromAddress)),
`Unique receivers` = length(unique(toAddress)),
`Date range` = paste(min(dateTime), max(dateTime), sep = " - "),
`Duration` = max(dateTime) - min(dateTime),
`Total sales volume (USD)` = sum(priceUSD, na.rm = TRUE),
`Average sale price (USD)` = mean(priceUSD, na.rm = TRUE)
) %>%
t() %>%
kable(caption = "Table 1: Summary statistics on the content of the dataset.")
```
Another interesting exercise - We can determine the number of transactions per address. The first address (0x00..) is not a real address - it refers to the minting of the NFTs. The number of transactions in it is simply the total number of NFTs registered on the blockchain. Besides we see that some addresses are quite active in trading Weird Whales as they have been involved in hundreds of transactions!
```{r}
# number of transactions by address
tibble(address = c(dataWeirdWhales$fromAddress, dataWeirdWhales$toAddress)) %>%
group_by(address) %>%
summarise(`Number of transactions` = n()) %>%
rename(`Address` = address) %>%
arrange(desc(`Number of transactions`))
```
Let’s now visualize the price of transactions as a function of the time, irrespective of the token ID. In the first few days, we see high variability in prices. This is followed by a quieter period and we see the beginning of an upward trend in the last days of August. This price increase coincides with an intensive activity on the social media as the story of the creator of Weird Whales becomes viral. One transaction sci-rockets up to 25000 USD, this corresponds to an increase of about 55555\% compared to the initial price of 45 USD!
```{r fig.cap = "Figure 3: Sales price (USD) evolution for the Weird Whales NFTs transactions."}
pSalesPrice <- ggplot(aes(x = dateTime, y = priceUSD), data = dataWeirdWhales) +
geom_point(size = 0.5) +
scale_x_datetime(date_breaks = "3 months", date_labels = "%F") +
labs(y = "Price (USD)", x = "Date") +
theme_bw()
print(pSalesPrice)
```
## Visualizing the network
So far, we have looked at summary of the transactions prices. Now, how to visualize the transactions, knowing that each NFT is unique and needs to be differentiated from the others? The tool we need is a network. The dataset we assembled is perfectly suited to be plotted as a network. Networks are described by vertices (or nodes) and edges (or links). Here we are going to use a node representation for a wallet address. The network we are going to construct will display all the wallet addresses that have ever traded Weird Whales. The connections between the vertices, so called edges, will represent the transactions.
There are several class and corresponding packages available in R to plot networks, the most famous being *network* and *igraph*. Have a look at the References section for a few amazing tutorials on this topic. My personal preference is the *network* package as it gives the possibility to create interactive plots via the *networkDynamic* and *ndtv* packages. Besides, other packages have been developed to facilitate manipulation and plotting of such objects, such as for example, the *ggraph* package which brings the familiar *ggplot2* framework to the network class.
Let's give it a try! We will first create a simple static (i.e. without the temporal dimension) network and plot it using the *ggraph* package. There are too many data to display in one plot so we will subset our data by plotting only the NFTs involved in at least 10 transactions.
```{r, fig.height = 15, fig.width = 15}
# subset the dataset
tokenIdFilter <- dataWeirdWhales %>%
group_by(tokenId) %>%
summarise(n = n()) %>%
filter(n>9)
dataWeirdWhalesFiltered <- dataWeirdWhales %>%
filter(tokenId %in% tokenIdFilter$tokenId) %>%
droplevels()
# restructure the timing to create a temporal network (below)
dataWeirdWhalesFiltered <- dataWeirdWhalesFiltered %>%
mutate(dateHour = round.POSIXt(dateTime, "hour")) %>% # The time resolution is in seconds. That's nice but it leads to a lot of computation (frames) for our network. Let's round it to hours.
mutate(dateHourNumeric = as.numeric(dateHour)/3600) %>%
mutate(dateHourNumeric = dateHourNumeric - min(dateHourNumeric))
# the set of vertices lists all the addresses involved in the transactions
vertices <- tibble(label = unique(c(dataWeirdWhalesFiltered$fromAddress,
dataWeirdWhalesFiltered$toAddress))) %>%
rowid_to_column("id") %>% # instead of using the addresses to visually identify the vertices, we will use shorter ID numbers
mutate(onset = 0,
terminus = max(dataWeirdWhalesFiltered$dateHourNumeric))
# the set of edges lists all the transactions
edges <- dataWeirdWhalesFiltered %>%
left_join(vertices, by = c("fromAddress" = "label")) %>%
rename(from = id) %>%
left_join(vertices, by = c("toAddress" = "label")) %>%
rename(to = id)
# This will be useful to create a temporal dynamic network (below). Edges will appear at `onset` and disappear at `terminus.`
edges <- edges %>%
rename(onset = dateHourNumeric) %>%
mutate(terminus = max(onset), #
tokenId = as.character(tokenId)) %>%
select(from, to, onset, terminus, tokenId, priceUSD)
# create the network using network function
network <- network(edges,
vertex.attr = vertices,
matrix.type = "edgelist",
loops = T,
multiple = T,
ignore.eval = F)
```
We can now plot our network. We see that all transactions of all the tokens originate from the single minting address (1). Some addresses are involved in multiple transactions and that's why we see several (curved) edges for those ones. We also see that some tokens were transferred to an address only to be sent back to the sender.
```{r fig.height = 12, fig.width = 12, fig.caption = "Figure 4: Static network of the WheirdWhales NFTs transactions. Each address is represented by a node (circle) and the transations are represented by the edges (lines). The edge color refers to the token ID."}
pNetwork <- ggraph(network) +
geom_edge_fan(aes(color = tokenId), arrow = arrow(length = unit(6, "pt"), type = "open")) +
geom_node_point(color = "black", size = 8) +
theme_void() +
geom_node_text(aes(label = id), color = "gold", size = 3, fontface = "bold") +
theme(legend.position = "bottom")
pNetwork
```
Let's now use the timestamp of the transaction to add a temporal dimension to our network. This would allow us to visualise the network evolution over time. For this, we will use the amazing *networkDynamic* package.
```{r}
# create a dynamic temporal network
dNetwork <- networkDynamic(edge.spells = as.data.frame(edges[,c("onset", "terminus", "from", "to", "tokenId")]),
vertex.spells = as.data.frame(vertices[,c("onset", "terminus", "id", "label")]),
create.TEAs = T,
verbose = FALSE)
```
We can create a timeline plot showing the frequency of the transaction activity. We see that most (about 2/3) of the transactions happened very shortly after the NFT's creation. This is followed by a relatively calm period and then a spike of activity around 1000.
```{r, fig.cap = "Figure 5: Timeline plot showing the frequency of the transactions."}
plot(tEdgeFormation(dNetwork, time.interval = 5), ylab = "Transaction frequency")
```
Below, we create an animation which can be launched directly from the browser. Edges and vertices can be clicked on to show more information about the associated addresses and tokens traded.
```{r fig.cap = "Figure 6: Animated network of the Wheird Whales NFTs transactions. Each address is represented by a node (circle) and the transactions are represented by the edges (lines). The edge color refers to the token ID. Edges and vertices can be clicked on to show more information about the associated addresses and tokens traded."}
# compute a sequence of layouts to be compiled into an animation (this can take some time)
compute.animation(dNetwork,
animation.mode = 'MDSJ',
slice.par = list(interval = 50,
start = 1,
end = max(edges$terminus),
aggregate.dur = 50,
rule = 'any'),
verbose = F)
# compile layouts into an animation
render.d3movie(dNetwork,
output.mode = 'htmlWidget',
vertex.tooltip = paste("<b>Address:</b>", (network %v% 'label')),
edge.tooltip = paste("<b>TokenId:</b>", (network %e% 'tokenId')),
launchBrowser = T,
main = 'Transactions of Weird Whales NFTs',
edge.col = function(slice){slice %e% 'tokenId'},
usearrows = F,
verbose = F)
```
# Conclusion
Hopefully, you enjoyed reading this article and have now a better understanding on how to visualize the blockchain transactions. Here, we have shown an example of how to download and plot a network of transactions associated to NFTs. We saw that analysing and plotting transactions is easy once you have the data. On the other hand, obtaining data in an appropriate format requires a deep understanding of the blockchain, especially when working with smart contracts.
In the next posts, we can explore the Tezos blockchain, the new place to be for NFTs trading. We could also dig into the Helium blockchain, a physical decentralized wireless blockchain-powered network for Internet of Things (IoT) devices. If you wish to learn more about this, please follow me on [Medium](https://tdemarchin.medium.com/), [Linkedin](https://www.linkedin.com/in/tdemarchin/) and/or [Twitter](https://twitter.com/tdemarchin) so you get alerted of a new article release. Thank you for reading and feel free to reach us if you have questions or comments.
A HTML version of this article with high resolution figures and tables is available [here](https://tdemarchin.github.io/DataScienceOnBlockchainWithR-PartII//DataScienceOnBlockchainWithR-PartII.html). The code used to generate it is available on my [Github](https://github.com/tdemarchin/DataScienceOnBlockchainWithR-PartII).
If you wish to help us continue researching and writing about data science on blockchain, don't hesitate to make a donation to our Ethereum address (0xf5fC137E7428519969a52c710d64406038319169) or Tezos address (tz1ffZLHbu9adcobxmd411ufBDcVgrW14mBd).
Stay tuned!
# References
Powered by Etherscan and CoinGecko APIs:
<https://etherscan.io/>
<https://www.coingecko.com/>
General:
<https://ethereum.org/en>
<https://www.r-bloggers.com/>
Network:
<https://kateto.net/network-visualization>
<https://www.jessesadler.com/post/network-analysis-with-r/>
<https://programminghistorian.org/en/lessons/temporal-network-analysis-with-r>
<https://ggraph.data-imaginist.com/index.html>