The Bit I Know - Meuru's Blog

Tuesday, January 12, 2021

User Data Deletion

There will be no user data retained. Therefore no need of requesting to delete your data

Thursday, November 26, 2020

Receiving values from a AWS Glue Job

Recently I wanted to use AWS Glue heavily for some development work at office. AWS Glue is a managed service from AWS which comes handy in processing or computing large amounts of data. My use case was to implement an ETL (Extract - Transform - Load) work flow. Therefore there were multiple glue jobs doing different tasks varied form decompression, data processing, validating, loading, etc. Those glue jobs were managed by a single AWS Step Function in each workflows. Everything was straightforward until there was a requirement to read some values from AWS Glue Job and include them in a SNS Notification. As everybody does, I called Google for my help. I started going through documentations and stackoverflow posts. Then finally it broke my heart when I read this post.

It was evident that AWS Glue Jobs are designed not to return values.

"By definition, AWS glue is expected to work on huge amount of data and hence it is expected that output will also be huge amount of data."

So the expectation was to store the data at the end of the processing. But my use case was to read some small set of value. Following are the approaches I figured out.

1. Saving the values to a file using a Lambda function and use it when required

You can create a lambda function that would accept the values you want to store and it can either write it to a file and store it in a s3 or can put the value into a DB for future reference. Then that lambda function can be invoked within the Glue Job and pass the required values.

Lambda function


import json
import boto3

def lambda_handler(event, context):
    s3_client = boto3.client('s3')
    bucket_name = 'myBucket'
    s3_path = 'path/to/dir/values.txt'

    values_str = json.dumps(event)
    print("Received values: %s" % values_str)

    try:
        response = s3_client.get_object(Bucket=bucket_name, Key=s3_path)
        current_content = response['Body'].read().decode('utf-8')
        print("Reading content from file at s3:%s key:%s" % (bucket_name, s3_path))
        content = '{}, {}'.format(current_content, values_str)
    except s3_client.exceptions.NoSuchKey:
        print("Created a new file at s3:%s key:%s" % (bucket_name, s3_path))
        content = values_str

    encoded_string = content.encode("utf-8")

    response = s3_client.put_object(Bucket=bucket_name, Key=s3_path, Body=encoded_string)
    return {
        'statusCode': 200,
        'body': json.dumps(response)
    }

Lambda invocation


import boto3
import json

# GLUE JOB CODE

lambda_client = boto3.client('lambda')

response = lambda_client.invoke(FunctionName='myLambdaFnName', Payload=json.dumps({
    "key1": 'val',
    "key2": a_value_from_glue
}))

2. Logging the values with a special pattern and reading it from CloudWatch

You can easily log your values. When you log make sure to include a special pattern so that you can easily extract those values. Any other service can fetch those log entries using the CloudWatch API. Refer the following example

Logging within the the Glue Job


# GLUE JOB CODE

print("[MY_SERVICE] key1: val1")
print("[MY_SERVICE] key2: %s" % a_value_from_glue)

Reading the values


import boto3

logs_client = boto3.client('logs')
repsonse = logs_client.start_query(
    logGroupName='/aws-glue/jobs/output',
    startTime=timestamp,
    endTime=int(datetime.now().timestamp()),
    queryString='fields @timestamp, @message | filter @message like /MY_SERVICE/'
)

query_id = repsonse['queryId']

query_response = None
while query_response == None or query_response['status'] == 'Running':
    time.sleep(1)
    query_response = logs_client.get_query_results(
        queryId=query_id
    )

logger.info('Received results: {}'.format(query_response['results']))

results = []
for result in query_response['results']:
    timestamp = next(ele for ele in result if ele['field']=='@timestamp')['value']
    message = next(ele for ele in result if ele['field']=='@message')['value'].replace('\n', '')
    results.append('{}: {}'.format(timestamp, message))

print(results)

Please note that there can be a small delay when logs are pushed to CloudWatch, therefore make sure to give enough time for the logs to get pushed.

Happy coding folks!

Thursday, April 23, 2020

Thou shall pay thy taxes

Hi, recently(writing on 24th April 2020) government introduced some new tax schema called Advanced Personal Income Tax (APIT). Don't ask me what it is, I also have no idea.
But one thing I know is we have to pay that.
Paying taxes is not a bad thing, country needs money too, not just you.

Image soure

So I'm sharing my journey of becoming a good tax payer of Sri Lanka. Hope this would help you too.
Following are the steps I followed, I also have a lot of questions in my mind. Let's see what we can figure out.

The first step would be to create an account. You may already have given consent to your employee to pay PAYEE and APIT taxes on behalf of you, but still you need an account.
Annually you can submit a tax return and if you are lucky you can get a refund 🤗

Create a new account if you don't have one

Go to this link.
Then from that drop of Registration type select INDIVIDUAL LOCAL
You'll get a huge form to fill. Fill it carefully.
You will be requested to upload an image of your NIC
For the field of Purpose of registration what I select was TAX PURPOSE
Once you submit you will be navigated to the the confirmation page
Check all the details again.
Fill the declaration and submit.

They say they will mail a notification within 5 days( previously I selected email as my preferred communication medium, save paper -> save trees 🌲 )

That's it for now folks. Happy paying taxes :)

PS: I am making this a living document, I will update more info when I receive the confirmation and other data.I am not a financial person, so everything will be on laymen's terms and this is an alien subject for me, therefore please correct me if something is wrong

Wednesday, April 15, 2020

Simply Setup Firebase Realtime Database with JS

Follow the article from following link:

https://medium.com/@meurub/simply-setup-firebase-realtime-database-with-js-9b85e502fce6?sk=0895edaccd4deb032825cdcff7b98120

Wednesday, February 19, 2020

How to delete a conflicting document in CouchDB 1.6.1

Retrieve the list of conflicting revisions

User the following curl to retrieve the list of conflicting revisions

curl -u userName:password http://<IP>:5984/<DB_NAME>/<DOC_NAME>\?conflicts\=true

The output would be something like following

{"_id":"my_doc","_rev":"10007-41ad08f6c152a9da8458bc5b1a7d86a7", "documentObject":{"index":{...........}),"_conflicts":["10003-d109ea0ea25918a83bbde4de6753f173","10001-1b618568334a43b2f61fc2f710efaac8","9992-
..........
6433482dd8f753c41d67949c1d11b296","563-5cc0de9756e78a004b68c054260c60a8","277-0e5c2632c940a6973b21ff0ef7c32f6a"]}

Then copy the conflicting revisions to an array to and enter it to the terminal

array="10003-d109ea0ea25918a83bbde4de6753f173","10001-1b618568334a43b2f61fc2f710efaac8","9992-
..........
6433482dd8f753c41d67949c1d11b296","563-5cc0de9756e78a004b68c054260c60a8","277-0e5c2632c940a6973b21ff0ef7c32f6a"

Delete the corresponding revisions

Write the script to execute the delete call

for i in $(echo $array | sed "s/,/ /g")
do
curl -u userName:password -X DELETE http://<IP>:<DB_NAME>/<DOC_NAME>\?rev\=$i
done

You will see output like follows

{"ok":true,"id":"settings","rev":"10004-73a7da6d7551759cbecbd4a14bfbe213"}
{"ok":true,"id":"settings","rev":"10002-df955439c468e65aab946be7db63be80"}
{"ok":true,"id":"settings","rev":"9993-8966c9010602c8a053cb734792ff0f00"}
{"ok":true,"id":"settings","rev":"9992-5c8032bc4d2798fbade3324335f1f393"}

{"ok":true,"id":"settings","rev":"9973-d5b8a7a15fbe3197e09e71b42c82d14f"}
.....

Sunday, December 29, 2019

Free up disk space on Mac

Have you ever run out to disk space in your mac? You might need to figure out which files and folders eat most of your disc space.

Option 1: Finding the largest Files
One option is to open the Finder and search the files using 'This Mac' and sort the files by size.

Option 2: Finding the largest Folders
Open the terminal and go to your user folder by typing 'cd ~/' then enter the following command. It will list the largest folders in the sorting order. It may take a while to list. Don't worry about the warning messages.

du -a * | sort -r -n | head -10

Then you can proceed to delete the unnecessary folders

One of the culprits of eating disk space is the XCode app and its related files

Friday, December 27, 2019

Software Security Summary

the goal of software security is to maintain the confidentiality, integrity, and availability of information resources in order to enable successful business operations. This goal is accomplished through the implementation of security controls.

Risk is a combination of factors that threaten the success of the business. This can be described conceptually as follows: a threat agent interacts with a system, which may have a vulnerability that can be exploited in order to cause an impact.

Eg: a car burglar (threat agent) goes through a parking lot checking cars (the system) for unlocked doors (the vulnerability) and when they find one, they open the door (the exploit) and take whatever is inside (the impact).

A dev team: Approaches the system based on the intended functionalities
An attacker: What operations can be done on the system(nothing avoided is possible)

security holes can be introduced in

Requirement gaps
System logic error
Poor coding practices
Improper deployments
Security holes introduced during maintenance and updating phases

Reference: https://www.owasp.org/images/0/08/OWASP_SCP_Quick_Reference_Guide_v2.pdf