Starting up ELK Platform to Analize OCDS format Contracting#

At the end of this section, you will have all you need to check and visualize OCDS format Contracting data.

In order to have a better understanding of this section, it is recommended to be familiar with the command terminal:

Goals#

  1. Starting an ElasticSearch server with Kibana.

  2. Uploading the Contracting published data in a format that is easy to check.

  3. Checking data.

This example project has been developed to effortlessly start any of the 3 services.

Prerequisites#

  1. Open the Operative System’s command terminal

  2. Install Docker CE

  3. Install Docker Compose

  4. Download the file containing the tools

  5. Unzip the downloaded file and access the folder that has just been created

  • Command line command: cd ManualKibanaOCDS-master

Start Server Container with ElasticSearch and Kibana#

Now we can start the server by executing the following command in the terminal:

docker-compose -f elastic-kibana.yaml up

This command indicates Docker program to create the container shown in the file elastic-kibana.yaml. Here, we specify both programs should start.

After some minutes, depending on available resources, we should be able to open this link http://localhost:5601/app/kibana and Kibana should appear as available.

From now on, ElasticSearch and Kibana will be ready to use, even though we still do not have available data.

Upload OCDS data to ElasticSearch#

For this process, we should start the terminal at the folder elk-gobmx-csv-master/pipeline:

cd pipeline

Downloading data packages#

If we want to work with all the Contracts published in OCDS standard without taking into account the last data published, we should first be sure we have downloaded Contracts in OCDS format by json package, published on the website datos.gob.mx

By September 2 2018 , this file’s name is contratacionesabiertas_bulk_paquetes.json.zip and its size is about 310.5 MB.

It is important to mention that this information is recognized with OCDS standard through the following format recordPackages or Record Package; all tools and codes included in this manual use this format. In order to use another one, such as releases or releasePackages, or another structure non-defined by OCDS standard, it would be required to modify the code.

As these files can have a big size, it is recommended to have at least 2GB available to continue.

Now, we need to unzip contratacionesabiertas_bulk_paquetes.json.zip file, which will create multiple .json files inside a folder:

carpeta/contratacionesabiertas_bulk_paquete1.json
carpeta/contratacionesabiertas_bulk_paquete2.json
carpeta/contratacionesabiertas_bulk_paquete3.json
...

IMPORTANT: We must know the full file path of .json files to this folder, as it will be necessary for the upload stage.

Let’s suppose that the files were downloaded and unzipped inside the same operative system Download file. Its path should be 1

  • Linux/Ubuntu/Mac: /home/{username}/Download We can shorten it as $HOME/Download

  • Windows: C:\Users\{username}\Download We can shorten it as %HOMEPATH%\Download

When it is confirmed or we could get the path to the files folder, we can continue.

Processing and uploading data#

IMPORTANT#

The current process specifically uploads the compiledRelease part of each OCDS document, in order to analize the last version available among OCDS releases. We recommend to read previous chapters before continuing

In this same folder, we have another tool designed for data upload. It also uses a Docker container. We will just use two commands: the first one is to get the container ready and the second one to execute it.

docker build . -t logstash-sfp-compranet-ocds:latest

With this command, Docker will make the container ready with everything necessary to process and upload data.

Once finished, we can execute the upload process according to the available operating system.

Linux/Ubuntu o Mac

docker run --net="host" -v $HOME/Download:/input logstash-sfp-compranet-ocds

Windows

docker run --net="host" -v %HOMEPATH%\Download:/input logstash-sfp-compranet-ocds

This command will use the container that is already executing the data processing and upload. For more information, please check the technical documentation of this process.

The screen now must show the process information. This could take some minutes.

After finishing everything successfully, this caption should pop up: Upload successfully complete and Kibana is ready to use.

Now, we can visit [Kibana] web page(http://localhost:5601/app/kibana) and see all data uploaded.


To know more about technical details on how to upload data, we will explain next how to use LogStash for this process.

Extra: Download OCDS data directly from datos.gob.mx API#

Previously, we explained how to download the OCDS dataset through a single file. In this section, we will introduce an alternative to download only the contracts we are looking for or the most updated contracts that have not been published yet in the final file. For this, we will use the API of datos.gob.mx, provided by the Mexican government.

In order to see the full API documentation, please check Basic guide to use API, where you can see in detail the specific filter options.

In order to download and manipulate data, the following tools will be used: cURL and jq.

The curl command will enable us to download information automatically and jq command will help us to give a user-friendly format to JSON data. At a lager stage, this manual includes a brief introduction to jq.


Both programs can be installed locally. For example, for Linux Ubuntu, it can be done with an instruction such as:

sudo apt-get install -y curl jq

For Windows or Mac, we should download the executable files separately, but we can also use the docker container included in the current code. To this effect, we should only execute the following docker command:

docker run --rm -it -v $HOME/Download:/input --entrypoint=bash logstash-sfp-compranet-ocds

Remember that $HOME/ is a shortcut only for Linux and Mac, in Windows we shall use %HOMEPATH%\

When this command is executed, we will get a new command line, which must look like this:

bash-4.2$

In this command line, we can execute the following commands.


To download the last 100 contracting processes and keep them in a .json file:

curl https://api.datos.gob.mx/v2/contratacionesabiertas | jq -crM ".results" > opencontracting_last_100.json

Any files created inside the container will be deleted when the container is “shut down”, unless they are moved to or created in a folder shared by the computer and the container.

To download the contracting processes that involve a certain business unit (unidad compradora)(currently limited to 1000, but can be changed):

curl https://api.datos.gob.mx/v2/contratacionesabiertas?records.compiledRelease.parties.name=Servicio%20de%20Administraci%C3%B3n%20Tributaria&pageSize=1000&page=1 | jq -crM ".results"  > opencontracting_SAT_1000.json

In order to understand this last command, we will detail each section:

  • Firstly, curl command is used.

  • Secondly, this URL API based is included: https://api.datos.gob.mx/v2/contratacionesabiertas.

  • Next, we have the filter parameters:

    • records.compiledRelease.parties.name: filters according to that field value, that is, the name of some sections in the contract.

    • pageSize: details how many results per request

    • page: allows browsing through pages, in case there is more than one.

  • Afterwards, we will use the jq command to extract only the results.

  • Lastly, we specify the name of the file in which the results will be stored. It is important that the filename reflects the search query, to avoid mix-ups.

These files must be stored and treated just as in the previous section,
placing them in the Downloads folder so we can continue with our next step.