0

Im using NestJS. I want to get all data from paginated API (i dont know the total page). Right now im using while loop to get all the data until the API returns 204 No Content, this is my code so far:

async getProduct() {
  let productFinal: ProductItem[] = [];
  let products: ProductItem[] = [];
  let offset = 1;
  let state = COLLECTING_STATE.InProgress;
  let retryCount = 1;
  
  do {
    const path = `product?limit=50&offset=${offset}`;

    products = await this.httpService
      .get(path, { headers, validateStatus: null })
      .pipe(
        concatMap((response) => {
          // if the statusCode is "204", the loop is complete
          if (response.status === 204) {
            state = COLLECTING_STATE.Finish;
          }

          // check if the response is error
          if (response.status < 200 || response.status >= 300) {
            // log error
            Logger.error(
              `[ERROR] Error collecting product on offset: ${offset}. StatusCode: ${
                response.status
              }. Error: ${JSON.stringify(response.data)}. Retrying... (${retryCount})`,
              undefined,
              'Collect Product'
            );

            // increment the retryCount
            retryCount++;

            // return throwError to trigger retry event
            return throwError(`[ERROR] Received status ${response.status} from HTTP call`);
          }

          // return the data if OK
          return of(response.data.item);
        }),
        catchError((err) => {
          if (err?.code) {
            // log error
            Logger.error(
              `Connection error: ${err?.code}. Retrying... (${retryCount})`,
              undefined,
              'Collect Product'
            );

            // increment the retryCount
            retryCount++;
          }
          return throwError(err);
        }),
        // retry three times
        retry(3),
        // if still error, then stop the loop
        catchError((err) => {
          Logger.error(
            `[ERROR] End retrying. Error: ${err?.code ?? err}`,
            undefined,
            'Collect Product'
          );
          state = COLLECTING_STATE.Finish;
          return of(err);
        })
      )
      .toPromise();

    // set retryCount to 1 again
    retryCount = 1;

    // check if products is defined
    if (products?.length > 0) {
      // if so, push the product to final variable
      productFinal = union(products, productFinal);
    }

    // increment the offset
    offset++;

    // and loop while the state is not finish
  } while ((state as COLLECTING_STATE) !== COLLECTING_STATE.Finish);

  return productFinal;
}

The endpoint product?limit=50&offset=${offset} is from third-party service, it doesn't have one endpoint to grab all the data so this is the only way, it has a maximum limit of 50 per offset, and it didn't have a nextPage or totalPage information on the response so i have to make offset variable and increment it after the previous request is complete.

How do I replace the while loop with the RxJS operator? And can it be optimized to make more than one request at a time (maybe four or five), thus taking less time to get all data?

dennbagas
  • 2,693
  • 2
  • 12
  • 24
  • Why do you using pagination when you won't need it? Just grab all at once from the endpoint. – akop Sep 26 '20 at 10:38
  • The pagination is from the API, it's not mine. The API didn't provide a single endpoint to grab all those data – dennbagas Sep 26 '20 at 10:42
  • Then give it a high limit value (like the max int value). – akop Sep 26 '20 at 10:43
  • I think you didn't get my point, my code above is working, i ask if i can replace the while loop with rxjs operator. And i wonder if i could optimize it to make more than one request at a time – dennbagas Sep 26 '20 at 10:52
  • To make only one call instead of 100, 1k or 10k calls is the optimization. Your loop is wasting your resources - it doesn't matter if the loop running parallel. – akop Sep 26 '20 at 11:02
  • That is the only way i can get the data. The endpoint i called is not mine, i cant do anything with it, its from a third-party endpoint and it can handle a lot of traffic, so let's ignore the resource problem if the resource you are referring to is from the endpoint side. Can you elaborate from my question please? – dennbagas Sep 26 '20 at 14:47
  • But you can post a very limit, like `product?limit=MAX_SAFE_INTEGER`. If you want to optimize the code, then you are wrong here. Have a look at https://codereview.stackexchange.com/ – akop Sep 26 '20 at 15:35
  • @dennbagas guess this answer does what you need? https://stackoverflow.com/a/35494766/3772379 – Oles Savluk Sep 26 '20 at 16:04
  • @akop no i can't, maximum `limit` is 50 from the third-party endpoint, sorry i didnt tell it on the question – dennbagas Sep 26 '20 at 16:16
  • @OlesSavluk something like that, but the endpoint doesnt have the nextPage value, so i have to make a `offset` variable and increment it over the loop – dennbagas Sep 26 '20 at 16:17
  • @dennbagas that's correct. Use this example, but create local variable for `offset` and increase it with every successful request. Then proposed answer will work for you? – Oles Savluk Sep 26 '20 at 16:42
  • I think you need to use the `expand` operator here, along the lines of this answer: https://stackoverflow.com/a/58322826/7612287 – Ivan Sep 26 '20 at 22:34
  • @OlesSavluk can you show me where to create local offset variable and where to increment it? – dennbagas Sep 28 '20 at 10:18

1 Answers1

0

Based on answer from RxJS Observable Pagination, but increment offset every time request is made:

const { of, timer, defer, EMPTY, from, concat } = rxjs; // = require("rxjs")
const { map, tap, mapTo, mergeMap, take } = rxjs.operators; // = require("rxjs/operators")

// simulate network request
function fetchPage({ limit, offset }) {
  // 204 resposne
  if (offset > 20) {
    return of({ status: 204, items: null });
  }

  // regular data response
  return timer(100).pipe(
    tap(() =>
      console.log(`-> fetched elements from ${offset} to ${offset+limit}`)
    ),
    mapTo({
      status: 200,
      items: Array.from({ length: limit }).map((_, i) => offset + i)
    })
  );
}

const limit = 10;
function getItems(offset = 0) {
  return defer(() => fetchPage({ limit, offset })).pipe(
    mergeMap(({ status, items }) => {
      if (status === 204) {
        return EMPTY;
      }
      const items$ = from(items);
      const next$ = getItems(offset + limit);
      return concat(items$, next$);
    })
  );
}

// process only first 100 items, without fetching all of the data
getItems()
  .pipe(take(100))
  .subscribe({
    next: console.log,
    error: console.error,
    complete: () => console.log("complete")
  });
<script src="https://unpkg.com/rxjs@6.6.2/bundles/rxjs.umd.min.js"></script>

Regarding possible optimization to make parallel requests - I don't think it will work well. Instead you could show data progressively, as soon as items are loading. Or change API as was suggested in the comments.

Oles Savluk
  • 4,315
  • 1
  • 26
  • 40