What I did to protect my privacy – a more conscious approach to data online

The Cambridge Analytica story is still unfolding but it is undeniable that it sparked some interesting and serious conversations regarding the amount of information about us accessible online, our use of the internet and the need of legislation to protect our privacy (information is indeed power).

Facebook is in the eye of the storm for its terrible negligence and malpractice in managing the data of millions of people. A few days ago, another story broke on their clumsy utilisation of a deprecated Android API to collect a history of the recipients of all SMS messages and phone calls on Android devices that had the Facebook or Messenger app installed to feed the algorithms managing friends suggestions.

The reactions

Tim Cook intervened calling for clear rules, processes and structures to safeguard personal information and limit the reach of advertisement companies, political parties and many other agencies that have the tools to “connect the dots” and use data to their advantage.
Despite Tim Cook’s numerous remarks and pride for Apple’s commitment to privacy and their laudable on-device approach, to retain their ability to do commerce in China, Apple recently transferred all cloud data belonging to its Chinese users to servers based in China, allowing for much easier government access.

Whilst I am sure that this conversation will be beneficial in creating a more structured system to protect our information (and the GDPR is a good stepping stone), I believe it is necessary for each of us to be conscious of the mechanisms underlying the internet, the concept of audience, the potential eternity of data shared online, the way our devices interact and the technologies involved.

This requires a combined effort of companies, policy makers, agencies and governments, but hopefully it will also be something that education systems will be called to cover in the near future.

However, there are some small steps that everyone can take to limit the amount of information online companies can use to profile people. It was estimated that Google (through the Analytics script, AdSense, in-browser predictive systems, auto-suggest, etc.) can track around 80% of an average browsing session. Facebook as well (through Pixel, Graph APIs and other forms of presence such as comment box scripts, like button, embedded posts, etc) can track a vast portion of your online behaviour outside its domains.

As you can imagine, mixing almost all of your browsing history with the information that you spontaneously submit in your wall or with your likes, gives these online companies (and others, as we saw with Cambridge Analytica) a scary amount of information about you. This specific case regarding US elections and Brexit has various implications regarding the role of democracy, the power of information and mechanisms of governmentality – but it is something too big to squeeze in this article.

How can we defend ourselves?

What can you do to limit this? Allegedly many people are taking the drastic decision of deleting their Facebook profile (the #deletefacebook hashtag was trending a few days ago, Elon Musk also chipped in). Whilst removing your account will certainly do something, there are other ways by which companies like Google and Facebook can track your online activities (take a look here and here).

In some small way – without the need to buy a new device every day, fake your MAC address or stop using GPS altogether – there are things we could do to make life harder to people that want to track our behaviour across devices and sessions and be more conservative with our data.

I have compiled a list of things that I have done and that hopefully will be useful to you:

  • Removed all the contacts I have uploaded to Facebook since 2006 (you can do it following these steps)
  • Set DuckDuckGo as my default search engine in Chrome and Safari Mobile
  • Use the European Advertising Standards Alliance (EASA)’s site YourOnlineChoices to control the cookies currently stored by ad companies
  • Set Chrome to delete all of my cookies upon exit (except a few trusted domains that I have whitelisted for convenience) and use 1Password (with the standard browser extension or the new 1Password X for Chrome) to fill login boxes
  • Regularly reset my Advertising Identifier on iOS
  • Regularly reduce my device/browser fingerprint
  • Use a trusted VPN whenever possible (my personal one if I am doing something very sensitive, a trusted commercial one any other time as for energy saving purposes iOS heavily limits on-demand IPSec VPNs and automatically disconnects from them after a few seconds of inactivity)
  • Review the apps for which I granted permissions on Facebook –
    removed as many as possible. Also reviewed my Facebook profile to show nothing more than my profile picture and cover photo to anyone that is not my friend (and hide my friends list to anyone).
  • Review my Google account activity controls (you might find some scary things being recorded by Google, for example your voice and location!)
  • Gradually change all the emails I am registered with on online services so that in each site I am using a unique address. Comparing and matching emails is a very convenient method for companies to track users across sites. With Gmail you can easily create a unique email for each service with the + trick. For example, I would use [email protected]__ for Facebook, [email protected]__ for Airbnb, etc. If you don’t know what I am talking about, you can learn more here.

Do you have any more ideas on ways to limit our traceability on the internet? Post them here in a comment! I am also interested in your point of view on the whole story of Cambridge Analytica and the implications on your usage of social media. Have you deleted your Facebook account? Are you thinking about it?

EDIT: Just found this video from the Wall Street Journal about some of the ways Facebook tracks you across different sites/devices and offline. Very clear and easily digestible by non-tech people.

PHP: Estrarre i dati in JSON dal sito soldipubblici.gov.it

Alcuni giorni prima di Natale il governo ha lanciato il sito web soldipubblici.gov.it. Questa prima release consente di accedere ai dati dei pagamenti delle regioni, delle aziende sanitarie regionali, delle province e dei comuni, con cadenza mensile e aggiornamento al mese precedente. I dati sono tratti dal sistema SIOPE, frutto di una collaborazione tra Banca d’Italia e Ragioneria Generale dello Stato, che aggrega i pagamenti giornalieri delle diverse PA attraverso una serie di circa 250 codifiche gestionali. In realtà l’unica novità è che l’accesso ai dati SIOPE era filtrato attraverso Banca d’Italia, dunque non accessibile a tutti.

soldipubbliciIl sito esternamente si presenta come un motore di ricerca che restituisce dati non strutturati/aggregabili. In realtà “sotto il cofano” il sito offre possibilità di essere interrogato e restituisce dati in JSON. La “scoperta” si deve a openfuffa.

Tramite bash basta inviare una richiesta del tipo:
curl -i -X POST http://soldipubblici.gov.it/it/ricerca \
-H "Content-Type: application/x-www-form-urlencoded; charset=UTF-8" \
-H "Accept: Application/json" \
-H "X-Requested-With: XMLHttpRequest" --data "codicecomparto=REG&codiceente=000705604"

Dove codicecomparto e codiceente sono variabili che corrispondono alle codifiche SIOPE. Qui si può trovare una lista completa di tutte le codifiche e relativi nome ente, regione di appartenenza e comparto. Qui l’esempio di un elemento dell’array:

Per poter estrarre i singoli elementi abbiamo bisogno di decodificare il JSON. Ecco un esempio per estrarre i codici e i nomi degli enti (per poi poterli usare nelle richieste al portale):

I gestori del portale hanno già affermato che stanno pensando a realizzare delle API ufficiali:

Nel frattempo già molti si sono messi all’opera per sfruttare questo metodo “sporco” e estrarre dati via JSON.
Un modo veloce per poter iniziare a “giocare” con questo enorme database può essere via PHP. Ecco qui uno snippet base che ho scritto che permette di fare richiesta di dati per qualsiasi ente via JSON specificando le due variabili “codicecomparto” e “codiceente” (estratte come visto prima attraverso json_decode):

WordPress: role-based restriction of the ability to post (only specific category allowed)

I was looking for a way to allow a user to post only in a specific category and hide all the others. I found nothing on the internet. So here’s what I finally managed to achieve. This actually hides all the categories except the specified one, applies the specific category automatically when doing auto save or saving draft (to avoid having the post autosaved in default category). (more…)

Setup Postfix and Dovecot storing virtual users in MySQL database

Install Postfix, Dovecot, postfix-mysql and dovecot-mysql from Ubuntu repositories.

apt-get install postfix postfix-mysql dovecot-imapd dovecot-mysql

Let’s start with Postfix config. (more…)

How to create a chrooted user

useradd -m -d /var/www/domain.tld -s /usr/sbin/nologin -c "Comment on user role" username
passwd username
mkdir /var/www/domain.tld/htdocs
chmod 775 /var/www/domain.tld/htdocs
chown username:root /var/www/domain.tld/htdocs

htdocs is the directory the chrooted user can write into. He can read up to /var/www/domain.tld though. (more…)

Import large server log files in Piwik and set a cron job to do it automatically

You want to get rid of Google Analytics, don’t you? Piwik is a great open source alternative, and today we’re going to see how to import your old webserver access logs and how to set an automatic script to do it programmatically.

Piwik-logoI assume you have Piwik and python installed. If you don’t, go do it. Easy as a pie.

Here’s the line to get the access log file and import it in your Piwik site (be sure to set the correct –idsite):

python /path-to-piwik/misc/log-analytics/import_logs.py --url=http://your-piwik-public-url/ /var/www/logs/access.log.gz --idsite=X --enable-http-redirects --enable-http-errors --enable-bots --enable-static --recorder-max-payload-size=300


MySQL: trasferire database da un server a un altro

mysqlGli step per trasferire un database MySQL da un ambiente (VPS ad esempio) a un altro.

Effettuiamo il dump del database da esportare:

Dove PERCORSO_ASSOLUTO è ovviamente la posizione di output. A questo punto copiamo il dump nel nuovo ambiente (ad esempio utilizzando la funzione Secure CoPy) e NOME_UTENTE_ROOT il nome di un utente MySQL con permessi adeguati per l’export del database.

Nel nuovo ambiente effettuiamo l’accesso alla shell di MySQL con
mysql -u NOME_UTENTE_ROOT -p

Inseriamo la password e creiamo il database nel quale importeremo le tabelle del vecchio database:

Creiamo l’utente che interagirà con il database:

… e assegnamogli i necessari permessi per lavorare sul database creato prima:

Usciamo dalla shell di MySQL e importiamo infine le tabelle del vecchio database all’interno del nuovo:
mysql -u root -p databasename < backupfile.sql

Copiare file da locale a un server remoto (shell)

Per copiare velocemente dei file dal server locale ad un server remoto, è possibile usare il tool scp (Secure CoPy), incluso in numerose distribuzioni Linux.

Ad esempio, per copiare un’intera cartella mantenendo intatti i metadati di ultima modifica e accesso ai file, useremo:


Sarà molto più veloce e sicuro (non si rischia di perdere chunk o avere problemi con file system che non supportano certi caratteri) rispetto a scaricare tutto sul PC e ricaricarlo via FTP o FTPS.

Scarsità delle risorse?

La complessità che qualifica i nostri tempi, con particolare riferimento alla globalizzazione e allo sviluppo tecnologico, mina alla radice quel principio di scarsità delle risorse che è stato a base dell’operare delle imprese degli esordi del moderno capitalismo industriale.

L’approccio sistemico vitale al governo dell’impresa, pag 512. Golinelli.

Effettuare git-pull contemporanei per repository multipli in una location

Un interessante script che utilizzo per cercare in una directory (in questo caso /var/www/) tutti i repository git e fare un pull multiplo.

source ~/.keychain/$HOSTNAME-sh
find /var/www/ -type d -name .git \
  | xargs -n 1 dirname \
  | sort \
  | while read line; do echo $line && cd $line && git pull; done

La prima riga chiama keychain, uno speciale script bash che evita di dover inserire ogni volta che viene richiesta la chiave privata (generata in precedenza) la passphrase (altrimenti lo script non verrebbe eseguito). Successivamente viene effettuata la ricerca dei repo (tramite la dir .git), se essa ha esito positivo lo script effettua un git pull per ogni occorrenza (tanti pull quanti repo trova).

Se vogliamo effettuare il controllo ad esempio ogni 10 minuti basterà inserirlo in /etc/crontab:

*/10 *   * * *   user    bash ~/scriptname.sh > ~/git.log

Ovviamente al posto di scriptname.sh inserite il nome che avete dato (e controllate la location, la tilde chiama la home dell’utente che metterete al posto di “user”. Il risultato verrà stampato in ~/git.log.