Starting up ELK Platform to Analize OCDS format Contracting#
At the end of this section, you will have all you need to check and visualize OCDS format Contracting data.
In order to have a better understanding of this section, it is recommended to be familiar with the command terminal:
Goals#
Starting an ElasticSearch server with Kibana.
Uploading the Contracting published data in a format that is easy to check.
Checking data.
This example project has been developed to effortlessly start any of the 3 services.
Prerequisites#
Open the Operative System’s command terminal
Unzip the downloaded file and access the folder that has just been created
Command line command:
cd ManualKibanaOCDS-master
Start Server Container with ElasticSearch and Kibana#
Now we can start the server by executing the following command in the terminal:
docker-compose -f elastic-kibana.yaml up
This command indicates Docker program to create the container shown in the file
elastic-kibana.yaml
. Here, we specify both programs should start.
After some minutes, depending on available resources, we should be able to open this link http://localhost:5601/app/kibana and Kibana should appear as available.
From now on, ElasticSearch and Kibana will be ready to use, even though we still do not have available data.
Upload OCDS data to ElasticSearch#
For this process, we should start the terminal at the folder elk-gobmx-csv-master/pipeline
:
cd pipeline
Downloading data packages#
If we want to work with all the Contracts published in OCDS standard without taking into account the last data published, we should first be sure we have downloaded Contracts in OCDS format by json package, published on the website datos.gob.mx
By September 2 2018 , this file’s name is
contratacionesabiertas_bulk_paquetes.json.zip
and its size is about 310.5 MB.
It is important to mention that this information is recognized with OCDS standard through the following format recordPackages or Record Package; all tools and codes included in this manual use this format. In order to use another one, such as releases or releasePackages, or another structure non-defined by OCDS standard, it would be required to modify the code.
As these files can have a big size, it is recommended to have at least 2GB available to continue.
Now, we need to unzip contratacionesabiertas_bulk_paquetes.json.zip
file, which will create multiple
.json
files inside a folder:
carpeta/contratacionesabiertas_bulk_paquete1.json
carpeta/contratacionesabiertas_bulk_paquete2.json
carpeta/contratacionesabiertas_bulk_paquete3.json
...
IMPORTANT: We must know the full file path of .json files to this folder, as it will be necessary for the upload stage.
Let’s suppose that the files were downloaded and unzipped inside the same operative system Download file. Its path should be 1
Linux/Ubuntu/Mac:
/home/{username}/Download
We can shorten it as$HOME/Download
Windows:
C:\Users\{username}\Download
We can shorten it as%HOMEPATH%\Download
When it is confirmed or we could get the path to the files folder, we can continue.
Processing and uploading data#
IMPORTANT#
The current process specifically uploads the compiledRelease
part of each OCDS document, in order to analize the last version available among OCDS releases. We recommend to read previous chapters before continuing
In this same folder, we have another tool designed for data upload. It also uses a Docker container. We will just use two commands: the first one is to get the container ready and the second one to execute it.
docker build . -t logstash-sfp-compranet-ocds:latest
With this command, Docker will make the container ready with everything necessary to process and upload data.
Once finished, we can execute the upload process according to the available operating system.
Linux/Ubuntu o Mac
docker run --net="host" -v $HOME/Download:/input logstash-sfp-compranet-ocds
Windows
docker run --net="host" -v %HOMEPATH%\Download:/input logstash-sfp-compranet-ocds
This command will use the container that is already executing the data processing and upload. For more information, please check the technical documentation of this process.
The screen now must show the process information. This could take some minutes.
After finishing everything successfully, this caption should pop up: Upload successfully complete
and Kibana is ready to use
.
Now, we can visit [Kibana] web page(http://localhost:5601/app/kibana) and see all data uploaded.
To know more about technical details on how to upload data, we will explain next how to use LogStash for this process.
Extra: Download OCDS data directly from datos.gob.mx API#
Previously, we explained how to download the OCDS dataset through a single file. In this section, we will introduce an alternative to download only the contracts we are looking for or the most updated contracts that have not been published yet in the final file. For this, we will use the API of datos.gob.mx, provided by the Mexican government.
In order to see the full API documentation, please check Basic guide to use API, where you can see in detail the specific filter options.
In order to download and manipulate data, the following tools will be used: cURL and jq.
The
curl
command will enable us to download information automatically andjq
command will help us to give a user-friendly format to JSON data. At a lager stage, this manual includes a brief introduction tojq
.
Both programs can be installed locally. For example, for Linux Ubuntu, it can be done with an instruction such as:
sudo apt-get install -y curl jq
For Windows or Mac, we should download the executable files separately, but we can also use the docker container included in the current code. To this effect, we should only execute the following docker command:
docker run --rm -it -v $HOME/Download:/input --entrypoint=bash logstash-sfp-compranet-ocds
Remember that
$HOME/
is a shortcut only for Linux and Mac, in Windows we shall use%HOMEPATH%\
When this command is executed, we will get a new command line, which must look like this:
bash-4.2$
In this command line, we can execute the following commands.
To download the last 100 contracting processes and keep them in a .json
file:
curl https://api.datos.gob.mx/v2/contratacionesabiertas | jq -crM ".results" > opencontracting_last_100.json
Any files created inside the container will be deleted when the container is “shut down”, unless they are moved to or created in a folder shared by the computer and the container.
To download the contracting processes that involve a certain business unit (unidad compradora)(currently limited to 1000, but can be changed):
curl https://api.datos.gob.mx/v2/contratacionesabiertas?records.compiledRelease.parties.name=Servicio%20de%20Administraci%C3%B3n%20Tributaria&pageSize=1000&page=1 | jq -crM ".results" > opencontracting_SAT_1000.json
In order to understand this last command, we will detail each section:
Firstly,
curl
command is used.Secondly, this URL API based is included: https://api.datos.gob.mx/v2/contratacionesabiertas.
Next, we have the filter parameters:
records.compiledRelease.parties.name
: filters according to that field value, that is, the name of some sections in the contract.pageSize
: details how many results per requestpage
: allows browsing through pages, in case there is more than one.
Afterwards, we will use the
jq
command to extract only the results.Lastly, we specify the name of the file in which the results will be stored. It is important that the filename reflects the search query, to avoid mix-ups.
These files must be stored and treated just as in the previous section,
placing them in the Downloads folder so we can continue with our next step.