Compiled blog: Database

Showing posts with label Database. Show all posts

Sunday, June 2, 2024

Find Special/Escape Characters in Sqlite

Find TAB

SELECT * FROM table WHERE column LIKE '%' || X'09' || '%';
SELECT * FROM table WHERE column LIKE '%' || CHAR(9) || '%';

Find CR LF

SELECT * FROM table WHERE column LIKE '%' || CHAR(13) || CHAR(10) || '%';

Wednesday, August 30, 2023

Python aiosqlite - async SQLITE

Install the aiosqlite package: aiosqlite is an async version of sqlite3 that works with Python's asyncio library. You can install it using pip:

Import the necessary libraries: You need to import aiosqlite and asyncio for this task.

Create an async function to execute your query: This function will connect to a SQLite database, execute a query, and return the results. It uses the aiosqlite.connect() method to establish a connection to the database and the execute() method to run the SQL query.

Create tasks for each database: For each database you want to search, create a task using asyncio.create_task(). Each of these tasks will run the query_db function concurrently.

Execute the tasks: Use asyncio.gather() to run all the tasks concurrently and wait for all of them to finish. The results will be returned as a list of lists, with each sublist containing the results from one database.

Iterate through the results: You can now iterate through the results to process the data from each database.

import asyncio
import aiosqlite

async def query_db(database, query):
    async with aiosqlite.connect(database) as db:
        async with db.execute(query) as cursor:
            return await cursor.fetchall()

async def main():
    databases = ["database1.db", "database2.db", "database3.db"]
    query = "SELECT * FROM some_table"
    tasks = [asyncio.create_task(query_db(database, query)) for database in databases]

    results = await asyncio.gather(*tasks)

    for db_results in results:
        for row in db_results:
            # Process row
            pass

asyncio.run(main())

# This code will initiate connections to each of the specified databases concurrently and execute the same query on each. The results from each database are collected into a list of lists, which can then be processed as needed.Note: It's important to remember that although asyncio allows for concurrent execution of tasks, SQLite databases are stored on disk and therefore I/O bound. The actual speedup from using asyncio will depend on the speed of your disk and how large the databases are

Baza de date dexonline in format stardict (Dexonline Database in Stardict Format)

Acesta este un dictionar roman, bazat pe baza de date dexonline in format stardict. Formatul stardict este recunoscut de KoReader (singurul software open source pe care l-am gasit pentru e-reader-e).

Se descarcă baza de date de la dexonline si se transformă in format stardict.

TLDR

Ca sa obtineti un format stardict, trebuie mai intai obtinut un format gls din baza de date dexonline a datelor returnate de select-ul acesta:

select replace(l.form, "'", '') form, replace(d.htmlRep, "\n", "<br/>") html from Lexem l inner join EntryLexem el on l.id = el.lexemId
inner join Entry e on el.entryId = e.id
inner join EntryDefinition ed on e.id = ed.entryId
inner join Definition d on ed.definitionId = d.id where d.sourceId in (27, 21);

Formatul gls este destul de simplu. Pe prima linie se afla cuvantul cautat, apoi pe urmatoarea se afla definitia si dupa o linie libera. Deasupra datelor trebuie adaugat header-ul acesta (cu tot cu linii libere):


#bookname=dexonline.ro
#version=3.0.0
#stripmethod=keep
#sametypesequence=h
#date=2018-01-21
#description=Acesta este o adaptare a bazei de date DEX online sub licenÈ›a GPL.

iar dupa aceea trebuie trecut prin stardict-editor pentru a obtine cele 3 fisiere stardict. Eu am folosit anumite software si anumite proceduri ca sa obtin formatul gls, dar evident, din moment ce se lucreaza cu soft-uri libere si formate text, puteti folosi orice software va e la indemana.

Cum sa-l folosesc

Cele 3 fisiere dex.* trebuie copiate in folderul de dictionare al koreader sau al oricarui alt cititor care cunoaste stardict. Daca doriti, puteti reincepe procesul pentru a obtine din nou baza de date si dictionarul.

Softuri folosite

Artix Linux
vim
Sql Workbench/j
StarDict editor
dictd (din pachetele Artix Linux).
MySql

Pasi

Instalati softurile necesare.

Atentie la stardict-tools. Pachetul este configurat sa depinda de libmysqlclient, care nu mai exista acum in Arch sau Artix Linux. Trebuie sa editati fisierul PKGBUILD si sa eliminati de acolo aceasta dependinta. Probabil libmysqlclient este folosit de alte softuri din pachet, stardict-editor (care ne trebuie noua) va merge.

Descarcati baza de date `dexonline` de

aici. Instrucțiuni de instalare

Incarcati-o in `mysql`.

Un exemplu poate fi vazut aici

Folosind `Sql Workbench/J`, exportati datele in format CSV

In documentatia Sql Workbench/J sunt explicate anumite comenzi specifice, ca wbexport Le puteti gasi acolo. Eu am executat aceste comenzi:

wbexport -header=false -file=/tmp/rows.txt -delimiter=, -quoteChar="'" -quoteCharEscaping=none -quoteAlways=false -encoding=UTF-8;

select replace(l.form, "'", '') form, d.htmlRep html from Lexem l inner join EntryLexem el on l.id = el.lexemId
inner join Entry e on el.entryId = e.id
inner join EntryDefinition ed on e.id = ed.entryId
inner join Definition d on ed.definitionId = d.id

In urma acestei operatii, am obtinut datele in format text in fisierul /tmp/rows.txt.

Preparati fisierul text

Fisierul text obtinut trebuie transformat in format .gls. Puteti gasi informatii despre format aici.

Am editat fisierul cu vim si am facut urmatoarele schimbari:

:%s/\v'$//g (am eliminat ' de la sfarsitul fiecarei linii unde exista)
:%s/\v^([^,]+),'/\1, (am eliminat ' de dinaintea definitiilor)
:%s/\v^([^,]+),(.*)$/\1^M\2^M (am separat lexemele de definitii si am adaugat o linie noua dupa fiecare definitie)

Atentie la caracterele speciale: au fost obtinute apasand Ctrl si V si apoi Enter. Am inlocuit in principiu textele gasite cu linii noi.

Acum datele sunt in format .gls.

Creati fisierul dex.gls cu urmatorul continut:


#bookname=dexonline.ro
#version=3.0.0
#stripmethod=keep
#sametypesequence=h
#date=2018-01-21
#description=Acesta este o adaptare a bazei de date DEX online sub licenÈ›a GPL.

-continut-rows.txt-modificat-

Atentie la liniile libere. Prima linie din fisier trebuie sa fie o linie libera si dupa definitia fisierului (linniile care incep cu #) trebuie inserat continutul modificat al rows.txt.

Salvati fisierul si inchideti vim.

Creati dictionarul

Deschideti stardict-editor. In interfata, selectati jos in stanga formatul Babylon file (nu Tab file, care este selectat initial). Apoi selectati fisierul creat dex.gls cu optiunea Browse in partea din stanga sus. Apoi apasati Compile in partea din dreapta jos. Daca totul a mers ok, ar trebui sa obtineti un mesaj de genul acesta:

Building...
Over
wordcount: &lt;n>
Done!

unde n este numarul de cuvinte din dictionar.

Acum ar trebui sa aveti langa fisierul dex.gls cele 3 fisiere stardict (dex.dict.dz, dex.idx, dex.ifo). Aceste 3 fisiere reprezinta dictionarul in format stardict. Trebuie sa le copiati in functie de e-reader-ul folosit in folderul indicat de documentatia KoReader

Sursa: https://github.com/cosminadrianpopescu/dexonline-stardict

Wednesday, June 14, 2023

Database Hosting

Free backend hosting

https://free-for.dev/#/?id=dbaas

Service	Type	Storage	Limitations
Amazon DynamoDB	⚠️ Proprietary NoSQL	25 GB	⚠️ Payment method required
Amazon RDS	⚠️ Proprietary RDBMS		⚠️ Only free for 1 year
Azure SQL Database	MS SQL Server		⚠️ Only free for 1 year
👉 Clever Cloud	PostgreSQL, MySQL, MongoDB, Redis	256 MB (PostgreSQL)	Max 5 connections (PostgreSQL)
ElephantSQL	PostgreSQL	20 MB	5 concurrent connections
Fly.io	PostgreSQL	3 GB	⚠️ Credit card required, limited outbound traffic, no way to avoid accidental overage fees
Google Cloud Firestore	⚠️ Proprietary NoSQL	1 GB	⚠️ After the first year there's no way to avoid accidental overage fees
~~Heroku Postgres~~ (discontinued) 😭	PostgreSQL	10K max rows
IBM Cloud Cloudant	⚠️ Proprietary NoSQL	1 GB	Deleted after 30 days of no activity
IBM Cloud Db2	Db2	200 MB	⚠️ "users are asked to re-extend their free account every 90 days by email. If you do not re-extend, your free account is cleaned out a further 90 days later"
MongoDB Atlas	MongoDB	512 MB
OpenShift Developer Sandbox	MariaDB, MongoDB, MySQL, PostgreSQL	15 GB	⚠️ Expires after 30 days (can resubscribe for free but not extend subscription), pods are automatically deleted after 12 consecutive hours of runtime
Oracle Cloud	Oracle Database	20 GB each per two databases	⚠️ Payment method required
Redis Enterprise	Redis	30 MB
Scalingo	PostgreSQL	128 MB	Max 10 connections ⚠️ Payment method required after 30 day trial
👉 Supabase	PostgreSQL	500 MB	Paused after 1 week inactivity 2 GB transfer limit

https://railway.app/ provides free MySQL and PostgreSQL hosting.

planetscale.com

Clever Cloud also allows free 500MB MongoDB -> https://www.clever-cloud.com/pricing/#MongoDB

bio.io has offers three free PostgreSQL instances with 3GB data cap and moderate usage as well. https://bit.io/pricing

https://remotemysql.com/ - Free 100MB MySQL
https://www.infinityfree.net/ - Free 400 MySQL Database (with PHP hosting, not remote)
https://freedb.tech/ - Free 50MB MySQL

filess.io

Service	Type	RAM	Storage	Limitations
👉 Adaptable	PaaS	256 MB	Non-persistent? (1 GB database storage available)
AWS EC2	IaaS	1 GB		⚠️ Only free for 1 year
Azure App Service	PaaS	1 GB	1 GB	⚠️ 60 CPU minutes/day
Azure VM	IaaS	1 GB	2 GB (non-persistent)	⚠️ Only free for 1 year
👉 Fly.io	PaaS	256 MB (2 instances)	1 GB (persistent)	160 GB/month outbound traffic (broken down by region) ❓ Trial plan (2 instances and 1 GB storage) doesn't require card, but does it expire? ⚠️ Hobby plan (3 instances and 3 GB storage) requires credit card, no way to avoid accidental overage fees
Google App Engine	PaaS	128 MB	1 GB	Standard Environment only Limited to Go, Java, Node.js, PHP, Python, Ruby 1 GB/day outbound traffic; should be able to avoid overage fees
Google Compute Engine	IaaS	0.6 GB	30 GB	1 GB/month outbound traffic (not including China or Australia) ⚠️ After the first year there's no way to avoid accidental overage fees
~~Heroku~~ (discontinued) 😭	PaaS	512 MB	Non-persistent	Idling after 30 minutes inactivity, limited to 550 hours/month
IBM Cloud Foundry	PaaS	256 MB		⚠️ Sleep after 10 days, deleted after 30 days of no "development activity"
👉 Northflank	PaaS	??? (2 instances)	Non-persistent	⚠️ Credit card required for signup, won't be billed on the free tier
OpenShift Developer Sandbox	PaaS	7 GB	15 GB	⚠️ Expires after 30 days (can resubscribe for free but not extend subscription), pods are automatically deleted after 12 consecutive hours of runtime
Oracle Cloud	IaaS	1 GB each for 2 VMs (AMD CPU) 24 GB across up to 4 VMs (ARM CPU)	200 GB across 2 volumes	AMD CPUs limited to 1/8 of 2 CPUs per VM ⚠️ Payment method required
👉 Railway	Paas	512 MB	1 GB	Limited to 500 hours/month 100 GB/month outbound traffic
👉 Render	PaaS	512 MB	Non-persistent	No persistent storage Idling after 15 minutes of inactivity 750 hours runtime/month

Tuesday, February 25, 2020

List Of Data Repositories (with APIs)

ARTstor
Biodiversity Heritage Library
Chronicling America Library of Congress National Digital Newspaper Program
Connecting Repositories (CORE)
Data Archiving and Networked Services (DANS)
Data-PASS
Digital Library for Earth System Education (DLESE)
Early Canadiana Online
English Broadside Ballad Archive (EBBA)
Europeana
FLOSSmole
Great War Primary Documents Archive
HathiTrust
History Data Service (HDS)
Infochimps.org
Inter-university Consortium for Political and Social Research
Internet Archive
Jisc MediaHub
JSTOR
Koninklijke Bibliotheek
Core.ac.uk

APIs
Chronicling America (Library Of Congress) - A digitized collection of American newspaper articles from the 18th to the 20th century.
Connecting Repositories (CORE) - A collection of open access articles from various sources hosted by the Open University.
English Broadside Ballad Archive (EBBA)
History Data Service (HDS) - A collection of data from a wide variety of historical sources.
Digging into Data API List

Compiled blog

Pages

Sunday, June 2, 2024

Find Special/Escape Characters in Sqlite

Wednesday, August 30, 2023

Python aiosqlite - async SQLITE

Wednesday, August 2, 2023

Baza de date dexonline in format stardict (Dexonline Database in Stardict Format)

TLDR

Cum sa-l folosesc

Softuri folosite

Pasi

Instalati softurile necesare.

Descarcati baza de date `dexonline` de

Incarcati-o in `mysql`.

Folosind `Sql Workbench/J`, exportati datele in format CSV

Preparati fisierul text

Creati dictionarul

Wednesday, June 14, 2023

Database Hosting

Tuesday, February 25, 2020

List Of Data Repositories (with APIs)

Show IP and Country

Search This Blog

LinkedIn Profile

About Me

Useful Links

Blog Archive

Tags

2Performant

ProZ.com Jobs

TranslatorsCafe.com: Recent Translation Jobs

TranslatorsTown.com

Total Pageviews

Popular Posts

SmartCAT

Wikipedia

Google Translate

2performant

Compiled blog

Pages

Sunday, June 2, 2024

Find Special/Escape Characters in Sqlite

Wednesday, August 30, 2023

Python aiosqlite - async SQLITE

Wednesday, August 2, 2023

Baza de date dexonline in format stardict (Dexonline Database in Stardict Format)

TLDR

Cum sa-l folosesc

Softuri folosite

Pasi

Instalati softurile necesare.

Descarcati baza de date dexonline de

Incarcati-o in mysql.

Folosind Sql Workbench/J, exportati datele in format CSV

Preparati fisierul text

Creati dictionarul

Wednesday, June 14, 2023

Database Hosting

Tuesday, February 25, 2020

List Of Data Repositories (with APIs)

Show IP and Country

Search This Blog

LinkedIn Profile

About Me

Useful Links

Blog Archive

Tags

2Performant

ProZ.com Jobs

TranslatorsCafe.com: Recent Translation Jobs

TranslatorsTown.com

Total Pageviews

Popular Posts

Subscribe To

SmartCAT

Wikipedia

Google Translate

2performant

Descarcati baza de date `dexonline` de

Incarcati-o in `mysql`.

Folosind `Sql Workbench/J`, exportati datele in format CSV