Cloud Files offers public content through Limelight’s CDN network. On public containers, one can opt in to save the logs for all content requested from the CDN. These logs are for the raw usage in an apache log format and are stored compressed in a container named “.CDN_ACCESS_LOGS”. One can then parse these logs with any commercial analytics tool or use a custom solution. Being a developer, I wrote a small Python script that loads these log files and aggregates the data.
The code can be found in my github repository.
After updating the code with your own Cloud Files credentials (or using your own cf_auth module), usage is similar to the following:
1 $ ./cf_stats.py obj_name
“obj_name” is one of the keys the stats can be grouped on. Others include “date”, “container_name”, and “user_agent”. The default is “obj_name” and any incorrect parameter will generate a usage message.
Sample output:
1 Object Name: my_file.pdf
2 Count: 11
3 User Agents: "Yandex/1.01.001 (compatible; Win16; I)"
4 Response: 200 304
5 Referrers: -
6 IPs: 1.2.3.4 1.2.3.5 1.2.3.6
7 Dates: 24/Jan/2010 25/Jan/2010 31/Jan/2010 01/Jan/2010 30/Dec/2009
8 Container Name: some_container
Any of the given fields can be used as a group. Even if the code output as-is is not to your liking, the script’s parsing and grouping functions my be a good starting point for writing your own log parser.
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The thoughts expressed here are my own and do not necessarily represent those of my employer.