PE1AQP's Notes about vCard and .vcf file handling.

Table of Contents


Not really a ham-radio topic, but this is (for me) a convenient web-site to present these ideas.

Introduction

Recently I had reason to copy contact information from some older smartphones to a newer.

I don't like the idea to "save" the my contact information in a cloud operated by some big-tech company. So I exported this info into a contacts.vcf file and copied that file onto my main PC. From there it could be copied onto my new phone.

But it was quickly clear that this .vcf file needed some cleaning. It contained duplicate entries with only minor spelling variations of the names, telephone numbers without any further info and even entries for people long since deceased.

I searched the internet for tools to ease the handling of such .vcf files and came across this web-page by 'Vermaden'. Exactly the thing I like, a UNIXy, command-line way of handling .vcf files.


Short description

For a fuller description, see Vermaden's web-page. But the short of it is:

  • A script converts the multi-line vCard entries in a .vcf file into sorted single line entries in (what I call) an archive file.
  • These single line entries can be handled by the usual UNIX cli-tools, such as grep to select certain entries or column to present a nicely formatted table.
  • A second script can search this archive file for duplicate telephone numbers. The file can then be cleaned with any standard text-editor like emacs or vi.
  • it makes, in my view, sense to store these archive files in a version control repository like GIT or SVN or even simple local versioning systems like RCS. (But, of course a system with a repository under your own control!)
  • A third script re-assembles the selected and/or corrected archive files back into multi-line vCards in a .vcf file.

Required improvement

I found that I had a few vCard entries from people with German "umlauts" and other non-ASCII characters in their names. These names appeared as quoted-printable UTF-8 in the .vcf files generated by the phones. For example:

BEGIN:VCARD
VERSION:2.1
N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=47=72=C3=BC=6E=65=62=61=75=6D;=42=61=72=74=68=6F=6C=6F=6D=C3=A4=75=73;;;
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=42=61=72=74=68=6F=6C=6F=6D=C3=A4=75=73=20=47=72=C3=BC=6E=65=62=61=75=
=6D
TEL;HOME:+493023125010
END:VCARD

No need to worry about privacy: this is a made-up name with a so-called drama telephone number.

My latest phone has the option to export the contacts in vCard version 4. This version specifies that all payload strings are in UTF-8, which is of course also the encoding I use on all my computers.

But older phones only export to vCard version 2.1, which uses this quoted-printable variant of UTF-8.

I required therefore a adaptation of the "from vcf to archive" script that translated quoted-printable texts to UTF-8, but left V4 strings unmodified.

I found that the program qprint fitted the bill. It is available in the repositories for most UNIX-like/*BSD/Linux distributions, even when not installed by default. This program needs to process the .vcf file as a whole, before any rearrangement of source-lines happens, because the quoted-printable texts can be split over several lines (as happens in the example above).

With the archive file in UTF-8, the script "from archive to vcf" should now generate V4 vCards, and not V2.1 as the original script did. V4 is also a bit stricter about telephone number layout.


Some minor adjustments

I also introduced a few minor adjustments to the scripts, such as normalisation of +-symbol in international telephone numbers and restoring spaces in comments strings.


My variants of the scripts

Most of the issues mentioned above have been accepted and included, in some form, by the main author.

But of course, shell scripts can be, often are and these certainly are evermore adapted to the user's programming needs; causing local versions to differentiate from their originals.

To reduce confusion, I gave my local, modified copies of Vermaden's three scripts slightly different names. All three of them can be found in the vcfhandling-<date>.tgz compressed tar file(s), downloadable below.

The main difference (at time of writing, 2024-08-27) between my local scripts and those in the original repository are:

  • Error/warning/info lines are send to stderr,
  • Where relevant, use gnu-cc/gnu-emacs compatible layout of error/info/etc. output. Most editors have a "find next error" functionality that use this message layout to jump to the correct source file location.
  • The duplicate and consistency checker has been extended to find names without any contact info and to find text in the telephone number field.
  • My test data and a test driving makefile are now (2024-08-27) also published. Unpack that test file in the same directory where the scripts were unpacked and run make. (This test data also uses "drama numbers".)
  • The duplicate checker now also checks for duplicate email-adresses (2024-08-28).

(Clickable) Listing of the relevant files:

$ ls -lAtr
-rw-rw-r-- 1 pe1aqp pe1aqp 3006 Aug 24 22:26 vcfhandling-20240824.tgz
-rw-rw-r-- 1 pe1aqp pe1aqp 3103 Aug 27 13:39 vcfhandling-20240827.tgz
-rw-rw-r-- 1 pe1aqp pe1aqp 3253 Aug 28 12:47 vcfhandling-20240828.tgz
-rw-rw-r-- 1 pe1aqp pe1aqp 1921 Aug 28 12:47 vcfhandlingtest-20240828.tgz

Back to

The Fine Print

I'm a fan of the GPL, and in particular the GPLv3, license. But I've only added small bits to these scripts. Therefore:

Author: Jon Krom : See Colophon

Created: 2024-09-04 Wed 11:56

Validate