Thursday, July 22, 2021

How to transfer the data from one db to another using azure datafactory ( ADF ) ?

 Steps:

1) Create Azure Datafactory 

2)  Create Linked service 

3) Add the dataset to the linked service.


az cosmosdb list-connection-strings --name <> --resource-group ..
az datafactory linked-service create --resource-group <> --factory-name <> --properties "@MongoDBAPILinkedService.json" --name TigerLinkedSource
@MongoDBAPILinkedService.json
{
"type": "CosmosDbMongoDbApi",
"typeProperties": {
"connectionString": "mongodb://**********:primarypassword@*****************.documents.azure.com:10255/?ssl=true&replicaSet=globaldb",
"database": "tiger-db"
}
}
step3:
adding the dataset
// source dataset az datafactory dataset create --resource-group <> --factory-name  <> --dataset-name OutputDataset --properties "@MongoDBAPIDatasetSource.json"{
"linkedServiceName": {
"referenceName": "TigerLinkedSource",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "CosmosDbMongoDbApiCollection",
"schema": [],
"typeProperties": {
"collection": "tiger"
}
}
Please do the above process for the sink.
Create a pipeline in between the source and sink & trigger/run the pipeline using azure cli.
az datafactory pipeline create --resource-group <> --factory-name <> --name Pipeline --pipeline "@Pipeline.json"
Pipeline.json
{
"activities": [
{
"name": "CopyFromSourceToSink",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "CosmosDbMongoDbApiSource",
"batchSize":100
},
"sink": {
"type": "CosmosDbMongoDbApiSink",
"writeBatchTimeout": "00:30:00",
"writeBehavior": "insert"
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "InputDataset",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "OutputDataset",
"type": "DatasetReference"
}
]
}
]
}
trigger/run the pipeline using this azure cli command.
az datafactory pipeline create-run --resource-group <> --factory-name <> --name Pipeline