๐ŸŒฝ How to Get Bitcoin Transfers ๐ŸŒฝ

    It's not that hard I promise

    Basics:

    BTC is UTXO - this means that transactions work like this:

    • Address 1 inputs (sends) 10.06 BTC (everything in the address).
    • Address 2 outputs (receives) 4 BTC
    • Address 3 outputs (receives) 6.06 BTC.

    ๐Ÿคซ Secretly, Address 1 also owns address 3, so they effectively sent 4 BTC to Address 2. BUT we'll never know that Address 1 and Address 3 are owned by the same person (aka the two public keys have the same private key) UNTIL Address 1 and Address 3 appear as inputs together in the same transaction.

    Address clustering means finding the networks of addresses that have appeared together as inputs and assigning them an identifier, which confers ownership.

    Finally, a transfers table reduces the example transaction above to: FROM: Entity 1, TO: Entity 2, AMOUNT: 4 BTC

    QED

    Using our new Entities Table, here is some code to make one day of Bitcoin Transfers. Scroll to see the whole query or click the name of the query in the output table to see the query in the editor.

    -- first get the input entity for each transaction
    -- the input amount doesn't matter, we only need to know the input entity
    -- we're grouping by every var at the end so that we don't have input 
    dupes
    WITH inputs AS (
      SELECT
        tx_id,
        COALESCE(project_name, CAST(address_group AS varchar)) AS from_entity
      FROM
        BITCOIN.CORE.FACT_INPUTS fi
        LEFT JOIN BITCOIN.CORE.DIM_ENTITY_CLUSTERS ec ON fi.pubkey_script_address = ec.address
      WHERE
        block_timestamp >= '2023-10-24'
        AND block_timestamp < '2023-10-25'
      group by
        tx_id,
        from_entity
    ) 
    -- next get the outputs and merge in the inputs
    -- sum up the output value by entity as the transfer amount
    -- if there is no project name, or entity, use the pubkey
    -- (presumably this is a new address - it could be the change address or just a new address,
    -- we'll find out in time)
    SELECT
      fo.tx_id,
      block_timestamp,
      from_entity,
      COALESCE(
        project_name,
        CAST(address_group AS varchar),
        PUBKEY_SCRIPT_ADDRESS
      ) AS to_entity,
      sum(value) AS transfer_amount
    FROM
      BITCOIN.CORE.FACT_OUTPUTS fo
      LEFT JOIN BITCOIN.CORE.DIM_ENTITY_CLUSTERS ec ON fo.pubkey_script_address = ec.address
      LEFT JOIN inputs ON inputs.tx_id = fo.tx_id
    WHERE
      -- don't forget to subset time or this query will be loooooong
      block_timestamp >= '2023-10-24'
      AND block_timestamp < '2023-10-25'
      AND from_entity != to_entity
    GROUP BY
      fo.tx_id,
      block_timestamp,
      from_entity,
      to_entity
    LIMIT
      10000```
    
    Loading...
    WOW! Millions of transfers demystified just like that!
    ๐ŸฅตFor the love of corn please do not try to create a large transfers table, we'll be releasing one soon!๐Ÿฅต
    Loading...
    Loading...
    ๐Ÿ”ฎ Some discussions on ACCURACY ๐Ÿ”ฎ

    โฌ†๏ธ โฌ†๏ธ As you can see above, with the November 2023 address groups (clusters), the ratio of to_entities to from_entities is about 3:1, but using the same entities (aka having future knowledge), we can calculate the transfers from 3 years ago and get a ratio of 2:1.
    >> Over time the transfers table becomes more accurate ๐Ÿ“ˆ
    >> Flipside has still some tricks up its sleeve to improve transfers accuracy, so buckle in, bc more goodies are on the way ๐Ÿ’บ

    โžก๏ธ โžก๏ธ To the right I've painstakingly copied and pasted the same 1-day transfers query 6 times, so you can see how accuracy improves over time. If the to/from ratio is a good metric for accuracy (aka cluster address completeness), using just address clusters, full accuracy is not achieved until between 6m to 1 year!

    ๐Ÿ‘€ Coming soon to a dashboard near you, we'll be publishing a more sophisticated study on clustering accuracy

    Loading...

    thank you for reading. Argue Chat with me on twitter @angelaminster