Tuesday, May 7, 2024

Control network routes for specific processes on Linux

Intro

I wanted to control the flow of packets for specific processes on my Linux (Ubuntu 22.04) such that they won't go through a tunnel device which is setup for VPN connectivity. The VPN software is provided as is and I do not administer its configuration.

Please note that this post assumes you have understanding of network routing and are equiped to troubleshoot problems. If not you should first learn how networks work first. Each network setup has its own specifics. Appliance vendors come up with defaults and automation for the most common scenarios. Custom configuration can conflict with that and might isolate you from network access. All responsibilities remain those of the reader.

What worked

Summary of the solution

Use the network classifier cgroup (src: docs.kernel.org) to tag network packets with a class identifier (classid) for a process and its subprocesses (src: unix.stackexchange.com). Subsequently use iptables to mark packets which you can use from a custom routing table using policy-based routing (serverfault.com).

Step by step for an example scenario

Imagine that for certain tasks you need to use a VPN that is provided and configured by another team. The VPN software setups up a tunnel interface on your computer and configures a default route for this tunnel device. You still want to use a public service that provides a dedicated client but for which you don't want traffic to go over the VPN (e.g. an audio streaming service like Spotify).

Step 1 : Create a novpn network class

sudo mkdir /sys/fs/cgroup/net_cls
sudo mount -t cgroup -onet_cls net_cls /sys/fs/cgroup/net_cls
sudo mkdir /sys/fs/cgroup/net_cls/novpn
echo 0x100001 | sudo tee /sys/fs/cgroup/net_cls/novpn/net_cls.classid

You can see the upstream documentation on https://docs.kernel.org/admin-guide/cgroup-v1/net_cls.html . Important is the structure of the classid. You can freely chose it as long as it is unique for your system but if you chose another id make sure to check the id doing a cat on the file as it will be needed later:

cat /sys/fs/cgroup/net_cls/novnp/net_cls.classid
1048577

Step 2 : Create custom routing table

In order to create the custom routing table the easiest way is to just 'clone' the default route from your setup when the VPN is not up. You can easily get this using ip:
ip route | grep default
default via 192.168.0.1 dev wlp0s20f3 proto dhcp metric 600
With that it is best to pick a route table number that is not in use such that there is a dedicated routing table. As per serverfault.com this can be done using
ip route show table all | grep -Po 'table \K[^\s]+' | sort -u
local

The example output above would just mean there is no tables besides the reserved routing tables. In our case we want a dedicated route table for traffic that should not go through VPN so lets register it officially with a number not taken. I chose 3 and I can register it as follows:

echo "3\tnovpn"  | sudo tee -a /etc/iproute2/rt_tables
3    novpn

For me this file looks now like:

cat /etc/iproute2/rt_tables
#
# reserved values
#
255    local
254    main
253    default
0    unspec
#
# local
#
#1    inr.ruhep
3    novpn

 

So it shows the reserved values as well as the table id we chose and its logical name 'novpn'. We should also pick a firewall mark to mark the packets. We can check all the iptables tables for marks:

for table in raw security mangle nat filter; do echo "===$table===";sudo /sbin/iptables -L -n --line-numbers --table $table -v; done | grep -o "MARK set [0-9x]*"

On my setup there were no firewall marks yet. I will therefore use mark 2 in this example. Now we have all the inputs to setup our route table

ip rule add fwmark 2 table 3
ip route add default via 192.168.0.1 dev wlp0s20f3 proto dhcp metric 1 table 3
ip route flush cache


With the first command we make sure the routing table is selected for packets with firewall mark 2. With the second command we add the default route to our routing table. And finally we clear cache to make sure new config is used.

Step 3 : Configure the marking using iptables

Since we already decided upon the mark and we have setup the cgroup and know its class id we have everything to configure iptables to mark pakkets as desired:
sudo iptables -t mangle -A OUTPUT -m cgroup --cgroup 1048577 -j MARK --set-mark 2

Step 4: Start our process such that it has the cgroup class id

It is important our processes get the cgroup class id. This can be done by registering their  process ID in /sys/fs/cgroup/net_cls/novpn/tasks . This could become quite tedious since a lot of programs spawn a lot of children. Luckily Linux keeps track of the parent child hierarchy and if we register a process all its children will also be added. So if you spawn a terminal window and register its pid and subsequently spawn the process than the process and all of its children will get associated with this novpn cgroup class.

echo $$ | sudo tee /sys/fs/cgroup/net_cls/novpn/tasks
spotify
Now this Spotify process will have all its packets adhere to routing table 3 which doesn't have a route for the VPN but rather goes as if there wasn't a VPN setup.


Final notes

  • I do not represent Spotify I just use their services and it felt to be a useful example (their client application spawns multiple processes and needs to be able to communicate with other OS entities which makes other solutions like using netns more tricky).
  • When using the terminal approach, one should take care not to use the terminal for actions that require VPN connectivity as per-design those wouldn't work in that terminal.
  • This is a rather static solution if you move your computer and end up on a new network then likely you need a different network configuration (other default route in the custom table)
  • Verify the marking of packets of a cgroup is working. An easy trick to verify the tagging is working is by changing the iptables rule to drop the packages rather than to mark them. Then you can easily verify that Spotify works when launched outside the "novpn"  terminal but that it cannot connect to any network when launched inside the "novpn"  terminal.

Saturday, November 25, 2023

Podman pitfalls

Fedora docs are your friend

https://docs.fedoraproject.org/en-US/fedora-coreos/

 

SELinux might be on

If you are having permission denied errors watch out for SELinux. Check your podman VM and verify /etc/selinux/config . You can consider switching to permissive mode + reboot

Certificate errors

MITM

Some company like to or must inspect their users traffic. Generally this is done by having a transparent proxy which terminates SSL/TLS and uses a self-signed certificate that is owned by the company and can be considered trusted. The default podman VM won´t trust this certificate. You can try the following:

COPY the PEM file to /etc/pki/ca-trust/source/anchors/ and then update the trust:
update-ca-trust force-enable && update-ca-trust extract

Time drift

If the podman VM has time drift this can also break SSL/TLS certificate verification. Just update the time of your VM.

Allow docker in podman

sudo rpm-ostree install podman-docker

Kubernetes cheat sheet

Just some commands that had value at some time or another.

Debugging

Sometimes debugging is hard because you are using an optimized images without troubleshooting tools or even a shell. Ephemeral containers come to the rescue there with some useful kubectl debug commands but if you need to see the attached volumes these tools fall short and while it is possible to do it manually it is tedious. Make sure you are aware of kubectl-superdebug :

Resources

All resources in a namespace

Just iterate over the resource type and look for them:
 
for i in `kubectl api-resources --verbs list --namespaced -o name`; do kubectl get --sho-kind --ignore-not-found $i; done

Which pods still have persistent volume claim

kubectl get pods --all-namespaces -o=json | jq -c '.items[] | {name: .metadata.name, namespace: .metadata.namespace, claimName:.spec.volumes[] | select (has ("persistentVolumeClaim") ).persistentVolumeClaim.claimName }'

Networking

Jump portals

In order to do this you'd need to be able to exec into pods and make sure socat is available on the pod. When that is possible it is possible to tunnel via the pod towards a target.

On the pod setup a tunnel to remote endpoint:

socat tcp-l:<local-port>,fork,reuseaddr tcp:<target-host>:<target-port>
kubectl port-forward pod/<jump-pod> <local-port>:<target-port>

resources:
- socat commad list: https://exploit-notes.hdks.org/exploit/network/port-forwarding/port-forwarding-with-socat/
- k8s port-forward docs: https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/


Thursday, September 1, 2022

OAuth 2.0 notes

 This post is by no means meant to be original just some notes to persist info acquired while digesting oauth2.0/openid connect articles. Use at your own risk. An attempt was made at keeping pointers to the sources.


rfc6749



The authorization code grant type is used to obtain both access
   tokens and refresh tokens and is optimized for confidential clients.
   Since this is a redirection-based flow, the client must be capable of
   interacting with the resource owner's user-agent (typically a web
   browser) and capable of receiving incoming requests (via redirection)
   from the authorization server.

     +----------+
     | Resource |
     |   Owner  |
     |          |
     +----------+
          ^
          |
         (B)
     +----|-----+          Client Identifier      +---------------+
     |         -+----(A)-- & Redirection URI ---->|               |
     |  User-   |                                 | Authorization |
     |  Agent  -+----(B)-- User authenticates --->|     Server    |
     |          |                                 |               |
     |         -+----(C)-- Authorization Code ---<|               |
     +-|----|---+                                 +---------------+
       |    |                                         ^      v
      (A)  (C)                                        |      |
       |    |                                         |      |
       ^    v                                         |      |
     +---------+                                      |      |
     |         |>---(D)-- Authorization Code ---------'      |
     |  Client |          & Redirection URI                  |
     |         |                                             |
     |         |<---(E)----- Access Token -------------------'
     +---------+       (w/ Optional Refresh Token) 
https://www.rfc-editor.org/rfc/rfc6749#section-4.1

AWS Cognito's authorization code grant:

https://aws.amazon.com/blogs/mobile/understanding-amazon-cognito-user-pool-oauth-2-0-grants/
Cognito comes by default with an auth app which gets hosted on an URI with a chosen domain name:
https://<domain-name>.auth.<region>.amazoncognito.com
In there you have the different endpoints for your authn/authz flows which are documented on 
https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-userpools-server-contract-reference.html
 
 

Verification of JWT tokens from Cognito

The key information for verification depends on the user pool and can be retrieved from:
https://cognito-idp.Region.amazonaws.com/your_user_pool_ID/.well-known/jwks.json . For details
see the knowledge-center article https://aws.amazon.com/premiumsupport/knowledge-center/decode-verify-cognito-json-token/ 
 

ALB authn/authz

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/listener-authenticate-users.html
The loadbalancer sets the following headers:
  • x-amzn-oidc-accesstoken

The access token from the token endpoint, in plain text.

  • x-amzn-oidc-identity

The subject field (sub) from the user info endpoint, in plain text.

  • x-amzn-oidc-data

The user claims, in JSON web tokens (JWT) format.

 

Miscellaneous notes

    •  redirect_uri's have to match
    • state is also used to avoid cross-side request forgery attacks
 

Sunday, January 23, 2022

Authentication via Belgian eID card with Mozilla Firefox on Linux

Since 2003 the Belgian government issues electronic identity cards called "eID" cards. These cards have cryptographic keys on them that allow digital signatures. This enables you to authenticate to government e-services using a smart card reader, your eID and pin (which you have to set when you receive your eID card). While setting up the software on Windows is straight forward I found the process for Linux less trivial hence I like to documented it.

So if you want to use your eID card to login to Belgian governmental services on Linux you can use this as a guide. My setup is a Linux Mind 20.3 "Una" which was not considered a support OS at time of writing (20.2 "Uma" was) so these instructions should not be limited to supported OS-es.

The well documented part

In order to be able to authenticate using an eID you need some middleware on your computer. This middleware provides the interface between your browser plugin and your eID in the smartcard reader. This section describes the installation of this middleware.

 

This piece of software is available via https://eid.belgium.be and at time of writing the Linux version can be found at https://eid.belgium.be/en/linux-eid-software-installatie . I do appreciate that they have tried to support a variety popular Linux distributions but what I respect most is their decision to have it available as opensource. In my case I used a newer release of Mint and therefore I downloaded the tarbal from the link under "Downloads for unsupported distributions". Surprisingly they don't mention their github repo which has a better README.md. So I'd advise to start there: https://github.com/Fedict/eid-mw or either just check out the instructions and use it to install the code packaged in the tarbal.

The github page details the pre-requisites. For my distribution I was able to install them using:

 sudo apt-get install libtool autoconf automake libassuan-dev autoconf-archive libpcsclite-dev libgtk-3-dev libcurl4-openssl-dev libproxy-dev libxml2 libp11-kit-dev openssl
 

Once you have the pre-requisites installed you can just use the commands from the github README.md to install the middleware:

 autoreconf -i
 ./configure  
 make 
 sudo make install

After this the middleware is installed. It comes with a binary application which you can run via the command `eid-viewer` and use to verify it is working correctly.

No card reader found

At this stage the application opened but the status bar at the bottom stated "No cardreader found" even when my cardreader was connected. Generally when a card reader is connected but no eID is inserted it should read "Ready to read identity card" but when testing just make sure to insert an eID card because the card reader might require this. In my case however I was stuck at "no cardreader found".

The most likely explanation was that Linux didn't know how to communicate with my card reader and that it requires a device driver. Since it is a card reader that has many years I didn't have the documentation that came with it and the device itself only has a label "Digipass by vasco" which is too little to get a specific driver.

Fortunately Linux can help here. Since my smart cardreader is a USB device I can list all the USB devices and get there VendorId and ProductId using lsusb:

 lsusb
 ...
 Bus 003 Device 006: ID 1a44:0001 VASCO Data Security International Digipass
 ...
 

When you Google this you arrive at the useful linux-hardware.org website (specifically https://linux-hardware.org/?id=usb:1a44-0001). There they link to https://salsa.debian.org/rousseau/CCID which is a driver that works with quite a lot of models. This time I found the instructions on the website more useful: (i.e. https://ccid.apdu.fr/ ). It also seems that you can do a `sudo apt-get install libccid` but I only saw this package afterwards. In my case I have compiled from source.

If at this stage you still aren't able to check your card via `eid-viewer` then make sure to install `pcscd` which is a daemon that allows access to a smart card. You'll need it anyway at a later point. After installing you can also start the corresponding service:

 sudo apt-get install pcscd
 sudo service pcscd start

Hopefully at this stage you can open `eid-viewer` and see your card details.

Now make it work on firefox

There is an official extension for firefox to make it work with firefox which is available at https://addons.mozilla.org/nl/firefox/addon/belgium-eid/ .

However for me it always would notify me at startup with "A recent enough version of the Belgian electronic identity card software could not be found. Is the eID software installed and up-to-date?".

I know the software is up-to-date since I have installed the latest version. This part required a bit more research so I'll split up in 2 sections; the explanation and the solution so you can skip to the solution if you are only interesting in getting yours working. If the solution doesn't work for you the explanation can give insights in how it works to aid troubleshooting on your end.

The explanation

Whenever you get stuck you need to troubleshoot. One tool for troubleshooting is debugging. Recent versions of Firefox allows to debug extensions out of the box. If you browse to "about:debugging" then you get a debugging screen for Firefox, when you select "This Firefox"then you can see all the extensions that are currently installed. When you click the "inspect" button for the "eID Belgium" plugin you'll get a screen (if you've used developer tools from Firefox or Chrome this looks very familiar). There is a debugger tab in which you can find the main thread which runs "background-script.js". This allows you to see what the plugin actually does.  In this particular case it tries to install a pkcs11 security module. When that fails it shows the notification we saw earlier. 

Since automatic installation doesn't work I thought let's do it manually and I followed instruction from https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Installation and after I added the module I could see in security devices that Firefox can even see the attached card reader. Full of optimism I try a test login via https://iamapps.belgium.be/tma/?lang=en leading to a disappointing failed login.

The Firefox window to manage security devices gives very little configuration options but the issue at hand here is that the Firefox extension "eID Belgium" is not allowed to use the pkcs11 module that I imported manually. If you've tried creating this security module manually you'd want to delete it again because although it shows the cardreader it won't be usable and it will give rise to name conflict as you can only have one module named "beidpkcs11".

The proper way to go about it is via a native manifest: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Native_manifests#manifest_location . This manifest was actually built but it was put on /usr/local/lib/mozilla/pkcs11-modules/beidpkcs11.json which seems not checked by Firefox (perhaps in the past it was . There is also an alternate version that acts as a fallback /usr/local/lib/mozilla/pkcs11-modules/beidpkcs11_alt.json . This fallback uses p11-kit but in my case the first version works perfectly.

The solution

Put the generated manifest on the location where Firefox expects it:

 sudo mkdir -p /usr/lib/mozilla/pkcs11-modules/
 sudo ln -s /usr/local/lib/mozilla/pkcs11-modules/beidpkcs11.json /usr/lib/mozilla/pkcs11-modules/beidpkcs11.json

Restart firefox and you should no longer see the notification pop-up and ideally the lights of your card reader are even blinking.

Note according to  the docs the following 3 locations should be valid:

  1. /usr/lib/mozilla/pkcs11-modules/beidpkcs11.json
  2. /usr/lib64/mozilla/pkcs11-modules/beidpkcs11.json
  3.  ~/.mozilla/pkcs11-modules/beidpkcs11.json 

For me option 2 didn't work even though I'm on Firefox 96.0.1 (64-bit). Option 1 & 3 both worked. So use 1 if you want it to be system-wide or option 3 if you only want it to be available for a specific user.

The final test

Now go to https://iamapps.belgium.be/tma/?lang=en and you should be able to authenticate successfully!


Finally you can start the fun part; the filing of your taxes! Or whatever you wanted to do.


Edit: I wanted to create a PR to add the native manifest but it seems somebody has beaten me to it (with this change). So likely the next version will be easier to install.

Friday, December 3, 2021

Serialization Anomily when using MVCC (The good, the bad & the ugly)

Intro

When working in-depth with databases sooner or later you will encounter transaction isolation levels. If you are new to transaction isolation check out the Postgress transaction isolation documentation because that documentation team did an amazing job!  In this post I'd like to go into more depth on one issue mentioned on that page the serialization anomaly.

Real life example

A picture says more than a thousands words so let's try to create a mental picture by talking about a real-life scenario. For this assume we are creating an application that has to manage financial accounts.

Let's take as a starting point 3 accounts with the following balances:

  • Account A: €20
  • Account B: €30
  • Account C: €40

Now consider 2 transaction:

  • tx1: transfers money (€10) from account A to a account B
  • tx2: transfers money (€10) from account A to C

In order to perform these transactions against our database engine it is required to read the balance of the sender from the balances table , subtract the amount sent and write the new balance. Subsequently it needs to read the balance of the receiver, add the amount and write the receiver's new balance.

When these transactions happen in a non-overlapping mode performing these changes to your data is almost trivial. We can detail the steps as follows: Transaction 1 is in bold and transaction 2 in italics. Then the statements look like:

Begin - RA - WA - RB - WB - end - begin - RA - WA - RC - WC - end
 

So in detail:

  • begin: begins the first transaction
  • RA: reads balance for account A 20 euro
  • WA: write the new balance of 10 Euro for account A
  • RB: reads balance for account B 30 euro
  • WB: write new balance of 40 euro for account B
  • end: ends the first transaction
  • begin: begins the second transaction
  • RA: reads balance for account A 10 euro (since previous change is committed)
  • WA: writes the new balance of 0 euro for account A
  • RC: reads balance for account C 40 euro
  • WC: writes the new balance of 50 euro for account C
  • end: ends the second transaction

These steps should be intuitive and feel familiar as it is the behavior we expect when performing financial transactions in real life.

The reason why this is easy is because concurrency in this example always 1, a single transaction is in the system, so no overlap is taking place.


The interesting world of concurrency

 Just like a puzzle becomes more interesting when getting to more than 1 piece concurrency becomes interesting when having multiple concurrent actors.

A modern database however will have a few tricks up its sleeves to manage concurrency. Locking is probably the best-known mechanism to deal with concurrency. It can for example avoid data corruption due to multiple processes writing the same record at the same time. Locking is very powerful but is generally not the first choice to deal with concurrency because it causes contention which lowers the throughput of your database system. Which brings us to another database trick; Multiversion Concurrency Control, MVCC in short.

MVCC is an implementation where each transaction will get a snapshot of the database state at transaction start. This allows the DB engine to be able to read data without risking a dirty read. So it avoids reading of uncommitted data even when that data is changed by another concurrent process. It also enables repeatable reads. It can however cause Serialization anomalies which would make scenario's possible that don't make sense in real life. In order to understand why this is a problem let's revisit our 2 example transactions. Let's start with our system initialized again on the starting point detailed earlier but this time let's have the transactions executed concurrently such that the different statements within them are interleaved.

Assume the order:

Begin - RA - begin - RA - WA - RB - WB - end - WA - RC - WC - end

So in detail:

  • begin: begins the first transaction
  • RA: reads balance for account A 20 euro
  • begin: begins the second transaction
  • RA: reads balance for account A 20 euro (since previous change is not committed and when transaction started the snapshot of the database had still the original balance of 20 euro for account A)
  • WA: write the new balance of 10 Euro for account A
  • RB: reads balance for account B 30 euro
  • WB: write new balance of 40 euro for account B
  • end: ends the first transaction
  • WA: writes the new balance of 10 euro for account A (since this transaction also read the 20 euro starting balance due to MVCC and subtracted 10 from it)
  • RC: reads balance for account C 40 euro
  • WC: writes the new balance of 50 euro for account C
  • end: ends the second transaction


This is a serialization anomaly and this example hopefully illustrated why this is a problem. If it is unclear imagine you are a bank managing these accounts.  Your system has just generated money for your customer who could withdraw or use that money! The above scenario is well possible if you picture an account with multiple users. Two different users could easily wire money to different accounts concurrently around the same time. So for these type of workloads it is important to avoid serialization anomalies.

Serializable isolation to the Rescue

It is possible to avoid serialization anomalies in database engines that allow running with the strictest transaction isolation level; serializable isolation.

Serializable isolation states that in order for a concurrent execution of transactions to be valid a serial ordering of transactions has to exist such that each statement execution has the same results as in the concurrent execution.

So if we start from our concurrent execution in our example. Since there are 2 transactions we have 2 possible orderings:

  • Tx1 followed by Tx2
  • Tx2 followed by Tx1

Or in our per statement illustration:

  • begin - RA - WA - RB - WB - end - beginRA - WA - RCWC - end
  • beginRAWA - RC - WCend - begin - RA - WA - RB - WB - end

I leave it to the reader to detail the outcomes of the second ordering. You should arrive to the conclusion that account A will always have a balance of 0 after both transactions finish no matter how you order these 2 transactions (serially). Therefore if you have a database engine running in transaction level serializable isolation and you try to do the concurrent execution then that engine will give an error. This does not mean there is an issue in the Database engine but rather that there is a problem with your workload. If it occurs rarely it's possible to just have your application catch these exceptions and retry. Because when this error is thrown generally the database engine will rollback 1 of the transactions, often the transaction that submitted the statement that would give rise to the anomaly.

How can the DB know?

There is no such thing as magic (not even in the database world) so the engine must have some clever way of detecting that a statement would cause a serialization anomaly. Working backwards from the behavior of serially ordered transactions you could come op with 2 rules:

  1. If data is read in a transaction TX that was written by another transaction  TC which was committed before the start of Tx then in a sequential ordering Tx must follow TC (order: TC -> TX)
  2. If data is read in a transaction TY that is written by another transaction that overlaps but wasn't committed when TY started then Ty must precede TU (order TY -> TU) 
 Note that
  • Data read and written within the same transaction doesn't impose ordering
  • 2 Reads from different transactions also don't impose an ordering
  • Since more than 2 transactions can overlap a read-only transaction could still give rise to a serialization anomaly!

In concurrent executions that are a serialization anomaly you will get ordering rules that will be in conflict. For our example:

Begin - RA - begin - RA - WA - RB - WB - end - WA - RC - WC - end

  • begin: begins the first transaction
  • RA: reads balance for account A
  • begin: begins the second transaction
  • RA: reads balance for account A 
    • 2 reads don't impose ordering between each other
  • WA: write the new balance of account A
    • At this point the DB knows that T2 reads data written to by an overlapping transaction T1 which is uncommitted so T2 must precede T1 (T2 -> T1 )
  • RB: reads balance for account B
  • WB: write new balance for account B
  • end: ends the first transaction
  • WA: writes the new balance for account A 
    • Now the DB knows that T1 reads data written by T2 which was not committed at starttime of T1 so T1 has to precede T2 (T1 -> T2)

 At this stage the DB engine can throw an exception as no matter what happens further with T2 (except for a transaction rollback) it will yield a serialization anomaly. Therefore it is best to rollback the transaction immediately and don't allow further statements as their results shouldn't be relied upon anyway.

The advantage of the arrow notation is that you can chain them together and as soon as you encounter a transaction for a second time you know there is a violation. So if we chain them in the order we found them:

(T2 -> T1 ) || (T1 -> T2) => (T2 -> T1  -> T2)

 A directed graph would be a useful way of tracking these orderings discovered by applying the rules. Cycles would in this case indicate a serialization anomaly so we can use a directed acyclic graph (DAG). 

In summary

Transactions should see the database as if they were running alone on the system, not being impacted by other running transactions. This is important because we don't know what will happen to these transactions, they could be aborted and in that case relying on the data written by them would cause an anomaly in itself. Ironically it is the measures we put in place to protect against these anomalies that could give rise to a serialization anomaly where results from statements of overlapping transactions wouldn't be retrievable if the transactions had happened serially. This post showcases that for MVCC and aims at providing intuition into why these serialization anomalies are problematic and give basic insights in what a database can do to protect you from it. That is if you chose to run with a transaction level of serializable isolation.

 

Final Notes:

I haven't gone into much detail about locking. Locking can be used to avoid serializable isolation anomalies but it requires aggressive locks (e.g. exclusive locks) that would enforce serial access to the underlying resource. The example is chosen in such a way that the concurrent execution would be possible even with table level locking. For example where during normal database execution writes would block other writes. This is because locks follow the lifetime of a transaction and T1 ended before a write in T2 could give rise to contention on a lock.

I have tried to give a simple explanation for the tracking mechanism above. Note that my example covered only rule 2. Rule 2 is easiest to track as you only need to take into account open transactions. Rule 1 on the other hand uses committed transactions which is troublesome to track since their amount grows with the uptime of your database. You can stop keeping track of committed transactions from the moment they no longer have (direct or indirect) overlap with open transactions. This overlap could disappear because transactions can close (by commit or rollback). But a lot of database instances have a continuous workload which could block this type of cleanup since there always remains overlap with open transactions. The good news is that research exists which investigates use of heuristics to efficiently avoid serialization anomalies. A nice and freely available paper on such a heuristic is "Efficiently making (almost) any concurrency control mechanism serializable".

Sunday, December 23, 2018

Setting up Pyhon 3.7 with SSL support

Intro

If you are lucky enough to have Python 3.7 in your OS repositories then you can skip this post if not you might find a hard time setting up Python 3.7 with SSL support.  Python 3.7 requires a recent version of openssl but even when that one has been installed in my case it would not find it so I'm documenting the steps on how I got it to work.

1) Get a recent openssl installation

In my case I went for 1.1.1a as that one should be working with Python 3.7.
cd /usr/src/
sudo wget https://github.com/openssl/openssl/archive/OpenSSL_1_1_1a.tar.gz
sudo tar -xvzf OpenSSL_1_1_1a.tar.gz
cd openssl-OpenSSL_1_1_1a/
export CFLAGS=-fPIC  # Make sure we build shared libraries
./config shared --prefix /usr/local/openssl111a --openssldir=/usr/local/ssl  # Make sure we build shared libraries
sudo make
sudo make install

 

2) Get and install Python

It is possible to follow instuctions of https://tecadmin.net/install-python-3-7-on-ubuntu-linuxmint/ with the following changes:

  • Before doing any configure make symbolic links to the openssl libraries.  I tried adding the openssl lib folder to LD_LIBRARY_PATH but for some reason I would still always error out with

    *** WARNING: renaming "_ssl" since importing it failed: libssl.so.1.1: cannot open shared object file: No such file or directory'


    Creating symbolic links in a default library path did strangely enough work:
    cd /usr/lib
    sudo ln -s /opt/openssl/lib/libcrypto.so.1.1
    sudo ln -s /opt/openssl/lib/libssl.so.1.1
    
    
  • When doing the configure specify additional details

    sudo LDFLAGS="-L/opt/openssl/lib" ./configure --enable-optimizations --with-openssl=/opt/openssl > /tmp/configure_output


  • When doing the make if you redirect stdout you will only see the warnings if they don't show:

    • *** WARNING: renaming "_ssl" since importing it failed: libssl.so.1.1: cannot open shared object file: No such file or directory
    • *** WARNING: renaming "_hashlib" since importing it failed: libssl.so.1.1: cannot open shared object file: No such file or directory


      then you are good to go.


  • You can validate by importing the python ssl lib:
    $ /usr/local/bin/python3.7
    Python 3.7.0 (default, Dec 23 2018, 10:35:52) 
    [GCC 6.4.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import ssl
    >>>
    
  • No error means you should be good to go.